Back to case StudiesBack to Case Studies

U.S. Census Burea

2020 Census Disclosure Avoidance System

The 2020 Decennial Census used differential privacy to preserve the anonymity of U.S. residents

Spotlight Recipient

Location

Suitland, MD

Overview

During the 2020 Census, the U.S. Census Bureau used differential privacy in a powerful new 2020 Census Disclosure Avoidance System (DAS). The purpose of the 2020 Census DAS was to design a system that could withstand modern re-identification threats and improve privacy for U.S. Census participants. This initiative directly mitigates the growing threats to privacy that are caused by the increase in computing power and the proliferation of personal data online. It strikes a balance between preserving the privacy of census respondents while maintaining the availability and utility of published census data.

The Census Bureau previously took steps to anonymize data. This included injecting “noise” into the data to halt re-identification attacks. Historically, this was an effective way to maintain anonymity, but access to so many open source databases now allows nefarious actors to overcome previous data anonymization methods.

*The 2020 Census DAS was implemented to address security concerns posed by the combined threat of modern computing capabilities and our data-rich environment. The 2020 Census DAS was implemented to address security concerns posed by the combined threat of modern computing capabilities and our data-rich environment.1


The Challenge

The U.S. Constitution mandates the U.S. Census to take place every ten years. The census impacts election redistricting, federal and state funding allocation, research, and other government functions. When conducting the census, there is an expectation, and legal obligation, to protect the confidentiality of census respondents’ data.

Modern computers and today’s data-rich environment have rendered the Census Bureau’s traditional confidentiality protection methods almost obsolete. Over the last few years, Census Bureau researchers simulated a re-identification attack on the published 2010 Census data. They were able to reconstruct individual responses for the entire population – without names or other identifiers. They were then able to match those reconstructed responses with publicly available commercial data that included names. They found that about 52 million people, or 17 percent of the 2010 Census population, were correctly re-identified. That was a best-case scenario. Using higher quality data, the number of people correctly re-identified would rise to about 179 million people, about 58% of the population.

*The 2020 Census DAS tackles the complex problem of balancing census data anonymity against data accuracy. The 2020 Census DAS tackles the complex problem of balancing census data anonymity against data accuracy.


About the Intervention

The 2020 Census DAS tackles the complex problem of balancing census data anonymity with data accuracy. The Census information must be usable for government actions and research while precluding bad actors from using the data to identify census participants.

The Census Bureau developed a new system that uses cryptographic principles to obstruct attackers from identifying the individuals behind published 2020 Census statistics. The system improves upon “legacy” methods of anonymizing the data. Starting with the 1990 Census, the Census Bureau began infusing “statistical noise” (controlled amounts of error) into data deemed most at risk for exposure. This was a relatively blunt technique designed to strike a balance between preserving overall data accuracy and reducing the risk of re-identification. As with all noise infusion techniques, these “legacy” methods produced data distortions. The nature and extent of those distortions are protected information to preserve the confidentiality of the underlying data. The Census Bureau is going to great lengths to ensure that the data is fit-for-use while maintaining confidentiality.


Impact & Future Plans

This system signals a concerted effort by the Census Bureau to protect the privacy of all 330 million people counted in the census. The first set of 2020 Census data products impacted by the new Disclosure Avoidance System will be the redistricting data released in August-September 2021. The system will be adapted to produce more detailed data products expected in 2022 and beyond.

Through the 2020 Census DAS, the Census Bureau is keeping pace with the challenges and opportunities posed by today’s technology to protect the privacy of the population while serving the nation’s critical information needs.


  1. ¹ Antwon McMullen, A notice of Visit from the US Census Bureau: Chicago/USA, July 29, 2020 Photography, July 29, 2020, https://www.shutterstock.com/image-photo/notice-visit-us-census-bureau-chicagousajuly-1786175240