CENSUS BUREAU’S PLAN FOR STATISTICAL ADJUSTMENT
Measuring the Undercount: Dual System Estimation
In 2000 (as in 1990), the Bureau will attempt to measure the undercount by means of dual system estimation (DSE). DSE is a method for deriving an estimate of true population by combining the results of two separate surveys.
In 1990, the Bureau used the census and the 1990 post enumeration survey (PES) in its DSE. In 2000, the Bureau will use the census and the Accuracy and Coverage Evaluation (ACE), a revision of the 1990 PES.
The experts cited in Appendix E provide ample evidence that this manner of measuring undercount is seriously flawed. In the case of the 1990 PES, the Census Bureau’s own senior statisticians and demographers concluded, “About 45% of the revised estimated undercount is actually measured bias and not measured undercount.”9 In other words, in the 1990 PES, many of the people reported as missed or counted in error really just represented problems in the PES itself, such as faulty information, errors in matching between the census and the survey, etc. Although concern over local undercounts is valid, the survey’s measurement of those undercounts is not.
Proponents of the Bureau’s adjustment plan consistently dismiss scientific criticism, and often attempt to personally discredit the critics. Nevertheless, the scientific literature casts serious doubt on the ability of the Bureau’s plan to accurately measure the undercount at any level. A bibliography of relevant scientific criticism is found in Appendix E.
Distributing the Adjustment: Synthetic Estimation
After measuring the 2000 undercount through DSE, the Bureau proposes to distribute adjustment via synthetic estimation. The same fundamental approach was proposed and rejected in the 1990 Census.
Synthetic estimation uses estimates about the undercount of a demographic group in a large region to make estimates about that same demographic group in smaller areas, such as blocks or neighborhoods. Synthetic estimation assumes the undercount rate for this demographic group, in every block, is the same as the undercount rate for the whole region.
In the 1990 PES, the Bureau used information from regions containing several states to estimate the number and characteristics of people in all blocks within those regions.
For example, the 1990 synthetic estimation applied a single adjustment factor to all male Asian and Pacific Islanders between the ages of 18 and 29 who owned their home. In other words, the census assumed young, male homeowners of Chinese, Japanese, Philippine and Korean descent, in every neighborhood from Honolulu, Hawaii, to Bangor, Maine, had the same likelihood of being counted in the census.
Although the proposed 1990 statistical adjustment was rejected, the Bureau proposes to adjust Census 2000 in essentially the same way.
Method of Analysis
To determine the local effects of the Census 2000 adjustment, the Congressional members of the Board examined the 5,170 block clusters surveyed in the 1990 PES.10 Using data only recently made available to the public, census counts in the 1990 PES block clusters were compared to their synthetically adjusted counts.11
The ideal analysis would compare the census count and the synthetically adjusted count in each block to the true population in that block. However, the true population is not known. Therefore, we compare the census count and the synthetically adjusted count to a third number: a direct estimate of the population for each block cluster, based on the data from that block cluster. The third number is the “direct DSE.”
The direct DSE is an estimate of the population of each block cluster, calculated by the Bureau, based on the data collected from that block cluster in both the actual census, and in the PES. The direct DSE is compared to the synthetic adjustment for each block cluster, to determine how well the synthetic adjustments “fix” the apparent undercounts measured by the PES in each block cluster.12
The Bureau calculated the direct DSE for each of the PES block clusters, but has resisted its release to the public. Data necessary to calculate the direct DSE exist only for the block clusters surveyed in the 1990 PES.
Several statisticians and demographers have criticized the accuracy of data from the 1990 PES and the reliability of the resulting DSE, citing lack of independence, errors in matching, etc. For a bibliography of these criticisms, see Appendix E.
Nevertheless, it is pertinent to compare statistical adjustments to the local measurements of undercount upon which they are based. Such a comparison shows whether statistical adjustment succeeds in adding people to the areas where the survey determined they had been missed.
If the adjustments do not correct the undercounts and overcounts in the sample area, they can hardly be counted upon to correct undercounts and overcounts in the rest of the nation. If statistical adjustment fails to add a large number people to block clusters where large undercounts are supposedly measured, and if they add people even to areas where overcounts are supposedly measured, then they are obviously not the solution to the problem of faulty census counts. Even if the measurements of undercount were accurate, such adjustments would be grossly inaccurate.
U.S. CENSUS MONITORING BOARD STUDY
The Census Bureau provided the Census Monitoring Board with detailed data on the 5,170 block clusters included in the 1990 Post Enumeration Survey (PES). A description of the data and the Board’s analysis follows.13
Each of 5,170 PES block clusters was located and identified by Federal Information Processing Standard (FIPS) codes denoting its state and county, as well as Bureau codes identifying census tract and cluster number. The data included these figures for each 1990 PES block cluster:
- The E-Sample (E). The number of people counted in the cluster during the 1990 census. This count excludes whole person imputations. Used by the Bureau for analysis, and to generate synthetic estimates.
- The erroneous enumerations (EE). The estimated number of persons in the E-Sample who were erroneously enumerated or for whom there was not sufficient information for matching.
- The P-Sample (P). The number of persons counted in the 1990 Post-Enumeration Survey (PES).
- Matches (M). The estimated number of P-Sample persons who could be matched to census persons.
- The adjusted count (SynDSE). The synthetic estimate generated by the 1990 PES. It is the E-Sample, adjusted by the Bureau’s proposed statistical adjustment using synthetic estimation.14
- The direct dual system estimate (DirDSE). An estimate of the population of each block cluster based on the E-Sample and P-Sample from only that cluster. It has been generated internally by the Bureau, defined as:
In addition, the Bureau provided a separate file, the PES Block file, with block-level data about the areas surveyed in the PES. Specifically, the PES Block file provided the number of persons imputed into each cluster. Although the PES Block file overstates the number of imputations,15 the Board computed E2: E, plus the number of imputations in the cluster (according to the PES Block file). E2 approximates the census count reported for each area surveyed in the 1990 PES. The 5,170 block clusters were indexed by i where i = 1 to 5,170.
E2i = Ei + Imputationsi
The census coverage rate for the ith block cluster was defined as the census count, E2, divided by DirDSE, and then multiplied by 100 for expression as a percentage. This is the percentage of each block cluster’s estimated true population reported in the census.
For comparison, the adjusted coverage rate for the ith block cluster was also calculated. It was defined as the adjusted count, SynDSE, divided by the DirDSE, and multiplied by 100 for expression as a percentage. It is the percentage of each block cluster’s estimated true population reported after adjustment.
Comparison of these two variables in local areas – census coverage rate and adjusted coverage rate – is the basis of this analysis. Specifically, this analysis compares census coverage before and after adjustment in block clusters with varying coverage rates (and, therefore, varying undercount rates). To do so, PES clusters were sorted according to coverage rate.16 The groups were denoted as J.
Groups 1 through 5 include all PES block clusters with less than 98 percent coverage: undercounted clusters. Groups 7 through 9 include all PES block clusters with 102 percent coverage or more: overcounted clusters. Group 6 includes clusters with coverage rates between 98 and 102 percent: defined for the purposes of this analysis as accurately-counted clusters.
To compare the local undercount to the local adjustment, an average undercount rate, as a percent of the estimated true population, was calculated for all clusters in each group. (All the clusters in the jth group are indexed by I (sub j)). The average census undercount was defined as the sum of all census counts, E2, in a group, divided by the sum of all DirDSE in the group, and subtracted from one. The value was multiplied by 100 for expression as a percentage. A positive value indicates an undercount. A negative value indicates an overcount.
For example, an aggregated average of the coverage rates of the 463 block clusters with coverage rates between 80 and 89.9 percent was obtained, subtracted from 1.0 and the result multiplied by 100 to yield an average undercount rate of 13.76 percent. The same procedure was repeated for each group (Appendix D).
Finally, the average local adjustment was calculated as a percent of the estimated true population. That is, the adjustment for the ith block cluster via synthetic adjustment was defined as the value of SynDSE minus E2, divided by DirDSE. Again, the values were grouped and averaged, and the value multiplied by 100 for representation as a percentage.
This is the average addition to block clusters in a group, expressed as a proportion of the estimated true population. For example, in the 996 block clusters with coverage rates between 102 percent and 109.9 percent (overcounted areas), the average addition is 0.17 percent.
Comparing the coverage rate before and after adjustment in the 1990 PES block clusters, as well as the local undercount and the local adjustment, clearly illustrates statistical adjustment’s failure to correct large undercounts in local areas.