I have written previously about the measurement of measurement uncertainty (Dec 9, 2011).
The inverse of this problem is the measurement of the information contained in the data.
One way of thinking about this is to imagine that our sensors are robotic students who are assigned the task of learning everything they can about some environmental condition. We then ask them to tell us what they have learned and we evaluate what they tell us against some ratable test.
In parallel to the way that human students are assigned a task of learning everything on a course curriculum, the extent to which the information has been ‘learned’ can be tested against reference data. Human students are graded on the basis of the percentage of correct answers. The problem is that not all tests are commensurate. To mitigate for this we may ‘grade against the curve’ and assign letter grades so that all excellent results can be distinguished from good, fair, poor and failing results. We can also provide ‘weightings’ on different tests so that a quiz has a different weight than a final exam.
The concept of assigning a grade (a.k.a. quality code, symbol, or qualifier) to hydrometric data has been around for some time. The ultimate test comes with each field visit where we can ask of the sensor: what is the water level at this instant in time? In contrast, a professor would ask a large number of questions of the students to evaluate how fully they have absorbed the information since the last test. There is no point in asking the question: ‘what was the exact time and magnitude of the peak water level?’ because we do not know the true answer. We can instead ask the question: ‘since the last test, are the data free from anomalies or discontinuities?’ In other words, in the absence of evidence that the data are all true, do we at least have evidence to support our ‘belief’ the data are likely true?
We can see the metaphor of grading students breaking down with respect to the much more subjective grading of data. There are well-established protocols and traditions for measuring the information content of students.
Not so much for hydrometric data.
There are several problems with measuring the information content of hydrometric data:
- the result is closely tied to fitness-for-purpose and any given dataset can be used for multiple purposes;
- the timing and frequency of field visits are generally inadequate for a robust measure of information content; and
- evaluation of information content by circumstantial evidence (e.g. inspection for serial auto-correlation in the time-series) will often identify obvious faults but evidence that there are no obvious faults is not the same as evidence that the data are true.
Similarly, there are well-established protocols and traditions for aggregating grades for students (i.e. the Grade Point Average) both within a course, and within a collection of courses (e.g. a degree program).
I can find no internationally recognized standard method for aggregating grades on hydrometric data.
The widely practiced protocol for aggregation of data grades is that they can neither sum nor average therefore a GPA is not possible. Instead of a GPA it is commonly accepted that the ‘least grade wins’ rule is invoked within any collection of data. This would mean that if I fail a 15 minute pop quiz in a course, I would get a failing grade on the whole course even if I aced the 3 hour final exam. The failing grade on that course would win over all the other courses I took for the duration of the program and hence I would fail the program.
Data grades may only represent a lower boundary of the information content of the data not to be confused with the full potential for information in the data. Once we agree on methods for objective quantification of hydrometric uncertainty, that uncertainty estimate may only represent an upper boundary of the information content of the data. The true information in the data may not be well represented by either grades or uncertainty, especially in the context of epistemic errors. The truth will probably lie somewhere in between subjective data grading and objective quantification of uncertainty.