Issue Briefs

January 18, 2000

Why Sampling is not Polling

There is a considerable amount of controversy over the potential adjustment of the 2000 Census. It is therefore important to understand what the census is attempting to do, how sampling can help to improve the accuracy of the census and how both the census and sampling are different from polling.

Most polls are collected when somebody needs information quickly and cheaply. A good poll is based on a scientific sample, i.e., persons are selected randomly and interviewers are told to interview these randomly selected people and no one else. A good poll also has a carefully written questionnaire that the interviewer follows to the letter. One of the best-known uses of polling is to predict the outcome of an election. Polls are also used to obtain opinions on subjects of national interest such as whether healthcare reform is needed or the federal government should do something about global warming. They are also used to test reactions to certain products or marketing campaigns.

Polls usually have rather small sample sizes. A survey with 1,000 respondents will typically have a "sampling error" of "plus or minus 3 percentage points" which is an everyday way of saying that had the survey included everybody, the likely result would be within 3 percentage points of outcome obtained from the sample. This is sufficient precision for most purposes, even though the 1,000 respondents are a tiny fraction of the total population.

Errors due to sampling are not the largest source of error in a survey, however. One very important problem that most polls face is the fact that many people simply will not participate. For a good poll, taken by a high quality survey research organization, the interviews are likely to contact over 1,500 potential respondents in order to get 1,000 to be interviewed. Many others refuse even to answer their telephone. This hypothetical survey is likely to have another 2,000 to 3,000 numbers classified as "non-contacts," meaning that the interviewer did not get past an answering machine, or did talk to an actual person who told the interviewer to call back later. In sum, it might take 3,000 to 4,000 residential telephone numbers to get 1,000 actual respondents. Well under half of all potential respondents even agree to be interviewed. After the interviews are completed, and the polltaker prepares to publish the survey results, (s)he needs to find a way to account for the differences between those who do and do not agree to be interviewed.

The biggest problem in a poll is respondents providing misinformation. Many respondents never think about the questions the interviewer asks, and the questions are not generally on their minds when the telephone rings. As a result, respondents frequently give answers that are "off the top of their heads" and have not been thought about very much.

Often times the behavior that the survey is concerned about differs from the attitudes expressed in the survey. Democrats turn into Republicans when it actually comes time to vote - many others who state a candidate preference don't even cast their votes. To cite another example, respondents who state great optimism when asked about the future of the economy may become very conservative and risk-averse when actually developing their own investment strategies. The divide between stated attitudes and actual behavior is perhaps the greatest problem facing survey takers these days.

The 2000 Census

The census is quite different from a poll. Every ten years the Census Bureau attempts to gather information about each household in the United States. The questions asked on the census are not opinion questions, but rather questions that pertain to the number of people in the household and the age and sex of each member. While every household in America should respond to the census, many do not get forms, do not return their forms or are otherwise not correctly counted. Because some people are omitted, or left out, of the census, while others are erroneously counted, the Census Bureau needs a method to test the accuracy of their data.

The use of sampling is intended as a quality control check on the entire census. The Census Bureau has selected a demographically representative sample of 314,000 households1, and it hired a team of expert interviewers to determine who should have been counted at each of the households. The Census Bureau is using this sample in two ways - to calculate rates of omission and counting error. Omissions are people who should have been counted, but in fact were not. The Census Bureau believes that there were about 8 million of these in 1990. Counting errors occur in three main ways. First, some people get counted twice at the same address, and these are called duplicates. Second, other people get counted at a place that is not their main residence, e.g., at a vacation home. Third, census enumerators sometimes get incorrect information at addresses where no one answers the door. The neighbors who do supply the information may get it wrong.

The Census Bureau subtracts the number of counting errors from the number of omissions to estimate the net undercount - the difference between the number of people actually counted and the number that should have been counted. It will calculate separate estimates of the net undercount for 64 different groups of people. These groups are defined by variables such as race, Hispanic origin, residential location, and how difficult the census was to take in their local neighborhoods.

In this manner, the Census Bureau will be able to assess its level of error when taking the 2000 Census. In particular, it will be able to assess whether or not the differential undercount - the difference in the net undercounts (of minorities and non-Hispanic Whites) was as large as it has been in all the other most recent censuses. Because this quality check has nothing to do with opinions, preferences, or voting patterns, it should not be referred to as "polling." Instead, it is a quality control check on the accuracy of census data collection. As such, it follows the same logic as the quality control check of a manufacturing process intended to calculate the percentage of items that are defective.

Eugene Ericksen is a Professor of Sociology and Statistics at Temple University and a Special Consultant to National Economic Research Associates (NERA). He has taught undergraduate and graduate courses on statistics and survey research. His research has investigated census taking, as well as the general topics of survey and statistical methodologies.

1A sample size of 1,000 is typical for many opinion surveys. The Census Bureau has decided to survey 300 times this number to ensure more accurate results.



To top.

U.S. Census Monitoring Board
Presidential Members
4700 Silver Hill Road
Suite 1250 – 3
Suitland, MD 20746
Phone: (301) 457-9900
Fax: (301) 457-9901