Does More Water Data Mean Better Water Outcomes?

Written by Stuart Hamilton

Water has always been important. The perception of its importance is closely linked to episodes of too much, too little, or the wrong quality. Climate change, urban growth, and agricultural intensification are just three examples of pressures that are contributing to an unprecedented global awareness of the importance of water. Everything we know about the effects of these pressures on water are the result of water data.

Is water neglect a direct result of inadequate water data?

Data connects water overuse, misuse, and abuse to undesirable outcomes. In the absence of water data, water uses and abuses have historically been largely indistinguishable. In the pre-data era, there was little thought given to using a river both for waste disposal and for drinking water.

The data era has been good for water sustainability. It has changed awareness, understanding, willingness to act, policies, legislation, and accountability. However, we are now transitioning from the data era to the “big data” era. There are more water monitoring technologies, faster data communication, fewer barriers to data access, more stakeholders collecting data, higher frequency data collection, and more locations where data are collected.

With more data comes more variance in data quality.

Does an evolution in the distribution of data quality matter? Maybe. If the best quality data is getting better at the same rate as the worst quality data is getting worse, then maybe not. But, and it’s a big but, more data means less scrutiny per data point. Budgets for data review and quality control are not keeping up with the pace of growth in data volume.

With less scrutiny, there is less of an understanding of the technology-specific problems that can occur. With less understanding of the nature and source of data problems, there is less investment in actions to prevent such problems from occurring in the first place and then to mitigate those problems when they do occur. Many parameters of interest for water monitoring are ones that we have no innate ability to independently estimate. We have to trust the number out of the box. But we no longer understand the box.

It may be the case that the best data are getting better in some places, for some times, and under some circumstances. But how can we distinguish that data from the data from every other place, every other time, and every other circumstance?

The inability to clearly establish the credibility of water data is dangerous.

The credibility of data is the only thing that connects perception of overuse, misuse, and abuse to willingness to act. Without willingness to act, better policies, legislation, and accountability will not only be unachievable, but existing beneficial policies may also be rolled back.

Retreat from beneficial policies is possible because of data denialists promoting a social (i.e. economic, political, or religious) agenda at the expense of the truth. Data denialists portray themselves as scientists and sceptics. They are not. All scientists are sceptics. Sceptics are wary of intellectual hubris and are only swayed by credible, defensible evidence supporting or rejecting any claim of special knowledge. A scientist creates and defends knowledge with a combination of compelling data, rigorous analysis, and personal humility. A data denialist is the opposite of a scientist. A denialist starts from a position of great hubris and seeks to discredit any, and therefore all, evidence that is counter to their agenda.

Data that aren’t produced to the highest standards are easy game for data denialists. Any fault discovered in the data can easily be used to discredit the results from years of hard field work. It is difficult to explain that 99% of the data are highly defensible if a momentary data fault can be exploited in a single Tweet to discredit an entire monitoring program.

The transition from the data era to the “big data” era is not a time to be complacent about data quality.

It is a difficult problem. It is an important problem. It is an urgent problem. More data will mean better outcomes when we solve this problem.