Integrated Public Use Microdata Series International:
census microdata for social and economic research

Constructing anonymized census microdata samples
Emerging Standards

Within the European Community standard practices are emerging for the construction of anonymized census microdata samples. Such "public use" microdata samples are widely recognized as statistical products by national statistical offices associated with Eurostat (Secretariat Report, Skopje, 2001) and by many countries in the IMF's General Data Dissemation System (http://dsbb.imf.org/category/popctys.htm). The Minnesota Population Center through the IPUMS International project is promoting the development and dissemination of statistically anonymized census microdata samples for a large number of countries and time periods (McCaa and Ruggles, 2002; see also the IPUMS International web-site: http://www.ipums.org, please select "International").

Holvast (Thessalonika, 1999) identifies three strategies for safe-guarding statistical confidentiality of microdata: legal, organizational and technical. All must be used in combination to attain the highest possible level of statistical confidentiality and at the same time promote the highest levels of scientific usage of the data. While technical safeguards are likely to consume the greatest effort, it is important that these be designed within a framework of legal and organizational safeguards.

Legal safeguards

Legal safeguards are already in place for the IPUMS International project, but members of the census microdata consortium are encouraged to identify ways of strengthening them. Specifically, to obtain microdata samples, users must license the data, agree to abide by the rules of usage, and sign a non-disclosure agreement (see license.html--please note that this license is a draft, subject to modification as additional means of strengthening the agreement are identified). As spelled out in the agreement, violation will result in revocation of the license, recall of all microdata, motions to professional organizations to censure the violators, and possible prosecution for violation of national laws.

Organizational safeguards

Organizational safeguards are being designed to both publicize and enforce legal and technical safeguards. Foremost, of course, is restricting access exclusively to bonafide users. The IPUMSi webpage will detail these efforts, much as the US Census Bureau has done with its "privacy" page. Then too, the IPUMS International web-site, as is currently the case for IPUMS-USA (www.ipums.org), will deliver custom tailored datasets and documentation. We hope to make these so attractive to users that they will prefer obtaining microdata directly from the source. As the databank is enhanced by the addition of new samples for more countries, of improved variables, and of higher densities, users will want to acquire data exclusively from an authorized distributor and abide by the license rules. Finally, extract requests will be recorded on the host server so that when users return to enhance a request they can readily fine-tune a prior request without having to start from scratch.

Technical safeguards

Technical safeguards of statistical confidentiality will require the greatest attention, research and testing. The goal is to provide the highest degree of statistical confidentiality and a maximum of scientific utility to the anonymized data. There is no single, automatic solution, but instead this requires scientific analysis by experts who are both knowledgeable about the contents of the microdata and their likely uses by the research community. To increase international comparability in the anonymization process, recommendations for the 2000-round of censuses by Eurostat and the UN Statistics Division should be taken into account. In addition, the proposed general design of variables and codes of the IPUMS International project should be considered, along with various international harmonization methods for variables dealing with education (ISCED), occupation (ISCO), etc.

It is expected that such work will be a "collaboratory" with some degree of interaction between the National Statistical Agency, the Minnesota Population Center, the principal national demographic research center and other interested national institutions and users.

Bibliography

Blien, U., Wirth, H., and Muller, M. 1992. "Disclosure Risk for Microdata Stemming from Official Statistics." Statistica Neerlandica, 46:1, 69-82.

Dale, Angela and Mark Elliott. In press. "Proposals for 2001 SARS: An assessment of disclosure risk," Journal of the Royal Statistical Society, Series A.

Franconi, Luisa and Giovanni Seri. 2001. "Microdata Protection at ISTAT: a User Perspective," Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Skopje March.

Holvast, Jan. 1999. "Statistical Confidentiality at the European Level," Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Thessaloniki, March.

Ruggles, Steven. 2000. "The Public Use Microdata Samples of the U.S. Census: Research Applications and Privacy Issues." Available at http://www.ipums.org/~census2000.

Secretariat. 2001. "Report of the March 2001 Work Session on Statistical Data Confidentiality," Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Skopje.

Tambay, Jean-Louis and Pamela White. 2001. "Providing greater accessibility to survey data for analysis," Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Skopje March.

Thorogood, David. 1999. "Statistical Confidentiality at the European Level," Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Thessaloniki, March.