Steven Ruggles

Regents Professor of History and Population Studies
Director, Institute for Social Research and Data Innovation
University of Miinnesota
ruggles@umn.edu
(612) 624-5818

Census Simulation

This page documents a simple simulation I prepared with the assistance of David Van Riper to assess the success of the Census Bureau's "database reconstruction" exercise.

Database reconstruction is a process for inferring individual-level responses from tabular data. The Census Bureau found that their hypothetical reconstructed population matched someone in the "real" population on age, sex, race, and whether Hispanic in 46.6% of cases. Remarkably, the Census Bureau does not appear to have calculated how many matches would have been expected through chance alone.

To investigate the matter, we constructed a simple simulation. We estimate that randomly chosen age-sex combinations would match someone on any given block 52.6% of the time, assuming the age, sex, and block size distributions from the 2010 census. This means that, the Census Bureau would have found a match on age and sex 52.6% of the time even if they had never looked at the tabular data from 2010, and had instead just assigned ages and sexes to their hypothetical population at random.

To estimate the percentage of random age-sex combinations that would match someone on a block by chance, we generated 10,000 simulated blocks and populated them with random draws from the 2010 single-year-of-age and sex distribution. The simulated blocks conformed to the population-weighted size distribution of blocks observed in the 2010 census. We then randomly drew 10,000 new age-sex combinations and searched for each of them in each of the 10,000 simulated blocks. In 52.6% of cases we found someone in the simulated block who exactly matched the random age-sex combination.

The Census Bureau found that 44% of the population is unique within blocks with respect to age and sex, a figure entirely consistent with my finding that a randomly chosen age-sex combination would match someone on any given block 52.6% of the time. The simulated population was similar to the real census population with respect to the frequency of unique respondents: we found that 47.7% of the simulated population was unique within the block with respect to age and sex.

Figure 1 shows the relationship between block size and the percent of randomly-selected age-sex combinations found:

Our calculation does not factor in race or ethnicity, but because of high residential segregation most blocks are highly homogenous with respect to race and ethnicity. If we assign everyone on each block the most frequent race and ethnicity of the block using data from the census, then race and ethnicity assignment will be correct in 77.8% of cases. Using that method to adjust the random age-sex combinations described above, 40.9% percent of cases in the hypothetical database would be expected to match on all four characteristics to a respondent on the same block. That is not greatly lower than the Census Bureau's reported 46.48% match rate for their reconstructed data. This suggests that despite the Census Bureau's substantial investment of resources and computing power, the database reconstruction technique does not perform substantially better than a crude random number generator combined with a simple assignment rule for race and ethnicity. In particular, if we generate age-sex combinations randomly and assign every case the most frequent race and ethnicity on their block, it yields yield a match rate on age, sex, race, and ethnicity that is 88% as accurate as was obtained through the Census Bureau's database reconstruction.

There are two available versions of the program:

  • censim1 (4/18/2021) was based on 1,000 simulated blocks, and we searched each block for 1,000 random draws of age-sex combinations. We used seven bins for size of block, and calculated the size of each bin as its midpoint. The top bin was set at 1,431, the average blocksize for blocks of over 1,000 population. These bins are taken from Figure 1 in J. Abowd. 2010 Reconstruction-abetted re-identification simulated attack. Appendix B in Declaration of John Abowd, State of Alabama v. United States Department of Commerce. Case No. 3:21-CV-211-RAH-ECM-KCN (2021). This is the version of the simulation cited by Ruggles in his testmony for the same case, available below.
  • In this version, randomly chosen age-sex combinations appeared on any random block 54.9% of the time, and 45.2% of the population was unique on their block.

  • censim3 (4/23/2021) used a cumulative distribution of exact blocksize, based on all blocks in the 2010 census, except that the 0.5% of the population with the largest block sizes (blocks larger than 2362) was grouped into a single bin, and assigned the average block size for the bin (3135). We generated 10,000 random draws and populated 10,000 randomly selected blocks rather than 1,000.
  • In this version, randomly chosen age-sex combinations appeared on any random block 52.6% of the time, and 47.7% of the population was unique on their block. Because the block size distribution is more detailed and the number of simulated blocks is larger, this version is more accurate than the original version.

    Source code for version 1: censim1.f
    Source code for version 3: censim3.f
    2010 age/sex distribution, arranged as a cumulative probability distribution (the first 96 rows are for each age of males, and the second 96 rows are for each age of females). Tabulated from the 10% microdata file retrieved from https://usa.ipums.org/usa/. agesex.txt
    Population-weighted distribution of block size, version 1 (seven blocksize bins distributed across 1000 blocks) blocksize.txt
    Population-weighted distribution of block size, version 3 (cumulative probability distribution of blocksize) blocksize3.csv
    Ruggles testimony in AL v. Dept. of Commerce (uses version 1): Ruggles_Report.pdf


    Steven Ruggles

    Regents Professor of History and Population Studies
    Director, Institute for Social Research and Data Innovation
    University of Miinnesota
    ruggles@umn.edu
    (612) 624-5818

    web page hit counter