About the Data
Judging the Quality of Government Data
As consumers of government data, how can we tell if we can trust the numbers? Do the numbers accurately reflect what they are supposed to be measuring?
These days, most federal government data are derived from large computerized databases. These databases vary in how well they are designed, and the care with which they are implemented and maintained. This takes time and resources to do properly. It also requires having proper "validation" systems in place. That is, procedures which check to make sure that data are entered when they are supposed to be, that they are coded properly, and accurately reflect the people, places, dates, dollars, actions and so forth that they are designed to measure.
No data are ever perfect, but some data sources are more reliable than others. At one end of the scale, some government data systems are very reliable. However, others should have warning labels: "Use At Your Own Risk!" because they are quite inadequate and the information they maintain is often so inaccurate.
It is important to separate the issue of data quality from how it is used. A government data system which accurately records each time government agents arrest individuals will produce accurate counts on the number of arrests. However, if these counts are utilized by someone as if they measure the number of violations which occur, the data have then been improperly used. This is because arrests and violations are two quite separate things. First, government agents are unlikely to find and arrest everyone who has committed an offense, and second, not everyone who is arrested is in fact guilty. While good data don't guarantee proper use, bad -- that is, inaccurate -- information is of little value for any purpose.
The statistics that government agencies publish about their activities -- for example, how many employees they have, how much they spend, what actions they have initiated, and the results of these actions -- derive from internal administrative data systems they maintain. Using the Freedom of Information Act, TRAC requests that agencies furnish them copies of these data bases, along with all supporting documentation on the procedures used to maintain these systems.
Once this information is received, TRAC examines them to try to ascertain the quality of theses data. Whenever possible three general "tests" are applied:
When discrepancies are found, TRAC contacts the agency and attempts to resolve them.
- Based upon the government's description of its data system, how well is it designed and maintained? What systems are in place within the agency to validate its data, and to correct errors that occur? Is the agency insular [tends to ignores developments on data from outside that division] or is it open to professional standards and information from outside the agency or division?
- Are the actual data recorded in its data system internally consistent? For example, if one field indicates that a person was prosecuted, do other related fields also treat this record as a prosecution? If one field indicates the case is on appeal, are the disposition codes used consistent with it being an appeal rather some other type of action? Are the dates for events internally consistent -- that is events which take place later record dates which are later in time?
- Do the counts generated from the data provided closely match the agency's published statistics and numbers produced by other data systems which track the same or closely related events? If they don't, when don't they and why? For example, are they explainable because of somewhat different coverage of the two data systems, or how the two systems classify events? How large are these differences? Are they quite small, or are they substantial in nature?
TRAC then incorporates this information in the documentation it makes available along with its data. Where TRAC finds serious deficiencies which it cannot resolve so that the data give a seriously misleading picture of agency operations, the agency is notified of its findings and asked to respond. TRAC then makes this information public.