Archive


home
in espanol
mission statment
news
board of directors
A.
Dr.
Joe
reports to congress
contact us
                

 

STEP-BY-STEP INSTRUCTIONS TO REPLICATE ANALYSIS OF THE 1990 POST ENUMERATION SURVEY

This write-up describes how the Congressional Members of the U.S. Census Monitoring Board computed the variables and graphs in our September 30 report to Congress, Unkept Promise: Statistical Adjustment Fails to Eliminate Local Undercounts, as Revealed by Evaluation of Severely Undercounted Blocks From the 1990 Census Plan.  The full text of Unkept Promise, and the data file necessary to replicate our analysis (the XYZ file), are available on our web site at www.cmbc.gov.  See Unkept Promise for detailed discussion of these variables, the analysis, the results, and our recommendations.

The XYZ file provides data for each of the populated 1990 Post Enumeration Survey (PES) block clusters.  The XYZ file includes 5,180 records, with 13 fields for each record.  Refer to the codebook (following) for a description of the file and the field names.

Our analysis can be replicated using spreadsheet software such as Microsoft Excel.  However, we recommend a statistical software package such as SPSS or SAS.  We used SPSS for Windows 9.0.

STEP 1: COMPUTE COVERAGE RATES FOR EACH PES CLUSTER
The variables listed below in caps are not included in the XYZ file.  They were computed from the data in the XYZ file, according to the directions below.

  • CENCOV: census coverage rate.  (E2/DirDse) x 100

  • ADJCOV: adjusted coverage rate.  (SynDse/DirDse) x 100

STEP 2: SORT THE PES CLUSTERS INTO COVERAGE GROUPS
We grouped (recoded) the PES block clusters according to their census coverage rates (CENCOV).   The grouping was done to make the data easier to display, and to mitigate the effect of random variation in individual block clusters.  The coverage groups are defined below.

CoverageGroup(J) CensusCoverageRate NumberOfClusters
1 <50% 42
2 50 - 69.9 46
3 70 - 79.9 111
4 80 - 89.9 463
5 90 - 97.9 1,538
6 98 - 101.9 1,591
7 102 - 109.9 996
8 110 - 119.9 240
9 120%+ 143
Groups 1 through 5 include all block clusters with less than 98 percent coverage: undercounted clusters.  Groups 7 through 9 include all block clusters with 102 percent coverage or more: overcounted clusters.  Group 6 includes clusters with coverage rates between 98 and 102 percent: defined in Unkept Promise as accurately-counted clusters.  (These are the groups published in Unkept Promise.  We conducted additional analyses using various criteria and values to sort clusters.  Analysts may wish to do the same.)

STEP 3: AGGREGATE THE PES CLUSTERS
The final computations were carried out using an aggregated XYZ file.  That is, the values of all block clusters in a given coverage group were summed.  The resulting data set has 10 records: the nine coverage groups defined above and one record of missing or undefined values.  Only 10 clusters are included in this last record.  We removed them from the analysis, leaving 5,170 clusters represented in the aggregated file.

The aggregated file can be easily recreated using the software described above.  (For example, in SPSS the AGGREGATE command provides the needed utility.)

  • Aggregate the XYZ file, so all clusters in each coverage group are grouped into a single record;

  • Sum the quantities E2, SynDse, DirDse for each coverage group;

STEP 4: COMPUTE VARIABLES FOR EACH COVERAGE GROUP
In the aggregate XYZ file, we computed the following variables:

  • CENCOV: average census coverage rate.  E2/DirDse x 100

  • ADJCOV: average adjusted coverage rate.  SynDse/DirDse x 100

  • CENUND: average census undercount.  (1 - (E2/DirDse)) x 100

  • ADJUND: average adjusted undercount.  (1 - (SynDse/DirDse)) x  100

  • AVGADJ: average adjustment.  ((SynDse - E2)/DirDse) x 100

STEP 5: GRAPH THE VALUES
These variables were used to create the figures in Unkept Promise (pp. 4, 5, 7, 23-25).  We graphed:
  • CENUND and AVGADJ (bar).

  • CENCOV and ADJCOV (line).

  • CENUND and ADJUND (bar).


CODEBOOK FOR THE XYZ FILE
The XYZ file provides the following data for each of the populated 1990 Post Enumeration Survey (PES) block clusters.  The U.S. Bureau of the Census provided these data to the U.S. Census Monitoring Board.

The XYZ file includes 5,180 records, with 13 fields in each record.  Under the variable label and field name, a description is provided.

See page 16 and Appendix D of the September 30, 1999 Report to Congress, Unkempt Promise: Statistical Adjustment Fails to Eliminate Local Undercounts, as Revealed by Evaluation of Severely Undercounted Blocks from the 1990 Census Plan, for more detail and a description of the analysis by the Congressional Members of the U.S. Census Monitoring Board.
Variable Label (Field Name)
State (State)
2-digit FIPS code.
County (County)
3-digit FIPS code
Census Tract (Tract)
Cluster Number (Pesc)
E-Sample (E)
The number of persons in the E-Sample. This count excludes whole person imputations. This number is used as the "census count" for comparison purposes and to generate the synthetic estimates.
Imputation Total (II)
The number of whole person imputations in each block cluster. This was obtained form the block level file provided to the CMBC by the Census Bureau and merged with the XYZ data.
Erroneous Enumerations (EE)
The estimated number of persons in the E-sample who were erroneously enumerated or for whom there was not sufficient information for matching.

P-Sample (P)
The number of persons in the PES sample.
Matches (M)
The number of P-sample person who could be matched to census persons (E-Sample).
Census Plus Estimate (DirCP)
The direct census plus type estimate for the block cluster based on data only from the block cluster: DirCP = E - EE + P - M
Direct Dual System Estimate (DirDSE)
The direct dual system estimate for the block cluster based on data from only the block cluster: DirDSE = E x (E-EE)/E x P/M
Synthetic Estimate (SynDSE)
The synthetic dual estimate obtained by multiplying the number of E-sample persons in each poststratum by the 357 poststratum adjustment factor:  
SynDSE=  ADJFACi
E + Imputations (E2)
This is the census count used in the CMBC report. For each block cluster,
E2 = E + II

The Data used in the analysis, including the newly-released location and population totals of all populated 1990 PES block clusters, are made available here in Excel and dBase formats for those who would like to get a "hands on" feel for the work done.

In dBase Format

In Excel Format