Latin American Demographic History
in the Age of the World Wide Web:
National Census Samples as Historical Sources

Robert McCaa
QUITO, 7 - 11 de JULIO DE 1997
published in: Dora Celton (ed.) Fuentes útiles para los estudios de la población americana. Quito: Abya-Yala, 1997, pp. 379-384.


Note: I am indebted to Albert Palloni of The Center for Demography and Ecology for providing copies of the computerized collection of Latin American censuses archived at the University of Wisconsin. Virgilio Partida Bush of the Comisión Nacional de Población generously shared the collection of Mexican census datasets which he assembled. The 1992 Chilean census was obtained by purchase from the Instituto Nacional de Estadística de Chile.


Demographic historians of Latin America focus much of their research effort on the protostatistical era--the century of conquest and colonialization, the era of the Bourbons, and even the nineteenth century. Few population historians of Latin America devote attention to the first decades of the twentieth century, much less the recent past, say from 1960.

The Bible for our field, Nicolas Sanchez Albornoz's The Population of Latin America (University of California Press, 1974; revised Spanish edition, 1994), offers not only a comprehensive overview of the region's demographic past but also looks into Latin America's future. Much of Sanchez Albornoz's interpretation on the course of population change in the twentieth century is based on the research of demographers, not historians. As a research enterprise, we have left the field to social scientists formally trained in demography. The historian's neglect of the twentieth century is apparent as well in the pages of The Latin American Population History Newsletter. At the founding of the Newsletter in 1977 it was decided that for editorial purposes population history ended in 1930. Over the next 17 years, 18 of its twenty-four issues featured articles exclusively on the colonial era, three on the nineteenth century, three bridged the colonial/nineteenth century divide, and only two dealt with the twentieth century. Upon becoming editor of the Newsletter a decade ago, I extended its time frame to 1960, nevertheless the slighting of the twentieth century continued.

Now it seems to me that by the time the next millenium begins, Latin American population historians should focus more attention on the twentieth century and that our students should be trained in the use of sources peculiar to Latin America's recent past. There are at least two reasons why we ought to do this.

First is the intellectual rationale. Over the past hundred years Latin Americans, regardless of nationality, have experienced one of the most transformative demographic processes in the history of the human species, the demographic transition from high mortality and fertility to low, from a high pressure system to a low one.

Second, is the matter of practicality. An extremely valuable source for demographic history is in danger of destruction. Computerized national census samples created by the United Nations Latin American Demographic Center (CELADE) and the statistical offices of most Latin American countries in the 1960-1990 rounds of censuses may simply disappear if they are not preserved in the near future. The samples were drawn scientifically at the level of the household to permit the investigation of demographic phenomena beyond the conventional published tables. These microdata samples proved their worth to demographers, but few researchers have attempted chronological comparisons using several samples from one or more countries.

Thanks to the extraordinary advances in computer storage and processing a one percent sample for a population as large as that of Mexico in, say 1990 when the Mexican National Statistical Institute (INEGI) enumerated almost 90 million inhabitants, may now be stored on a single CD-ROM. Indeed, all the samples currently available for Mexico fit on a single disc the size of a greeting card.

Moreover these samples are not stripped-down, emasculated sources. Census samples provide information on dozens of key demographic, social, and economic variables for hundreds of thousands of individuals and households. Census samples are the closest we are likely to get to digitized mirrors of the original census enumerator's sheets. The 1990 sample for Mexico consists of 802,810 individuals in 160,356 households with 56 fields of information: 29 reporting the characteristics of individuals, 18 of housing, 3 of households, and 3 identifying geographic location. In aggregate, the data amount to 90 million characters of information (110 bytes per individual). Accompanying the data are numerous files which document the sample, define the variables and specify the labeling of all the codes. For example, the relationship to head of household variable, in addition to head, spouse, and head's sons and daughters, offers more than 70 other codes, ranging from "adoptado" to "vigilante".

Even the 1960 sample with "only" 58 characters per individual amounts to more than 30 million characters. Unfortunately this sample is not fully documented as yet, but the data have been rescued, and the technical documentation is likely to be recovered in the near future. In sum census samples are the most prodigiously detailed historical data on Latin Americans available for the twentieth century. The Mexican samples--except for 1980--have been salvaged by the National Population Commission of Mexico, but for many other Latin American countries these materials are still at risk.

Alberto Palloni of The Center for Demography and Ecology at the University of Wisconsin has collected the 1970, 1982 and 1992 samples for Chile, and the 1973 and 1985 datafiles for Colombia. Hector Perez Brignoli of the Universidad de Costa Rica has located Costa Rican microdata samples for 1963 and 1973 and the entire census for 1984 (Perez Brignoli, e-mail communication February 24, 1997). Prof. Joseph Potter of The Population Research Center of the University of Texas is currently attempting to recover the 1960 sample for Brazil after successfully reconstituting files for the 1970 and 1980 censuses. I am sure that there are others of whom I am unaware who are working with census samples.

Census samples are the most complex and richest materials available to social science researchers. Now, they are in danger of vanishing--not as a result of deliberate acts of destruction--but because of inaction. Digitized materials must be upgraded as new computers make old ones obsolete, and as aging media deteriorate.

Fortunately the cost of salvaging and preserving these materials is negligible. The primary obstacles are three. First, there is no sense of urgency by historians, archivists, or other scholars. Second, in some instances, bureaucratic or legal obstacles may thwart initial inquiries, because although the censuses do not contain names or other information to permit individuals to be identified, privacy issues may need to be clarified to allow third parties to salvage these sources (Conning and Silva, 1993). Third, the older samples are stored on computer media that are disintegrating and much of the machinery required to read the media is broken or being junked.

Preserving these materials is only a beginning. We must also make them more accessible. The Historical Census Project at the University of Minnesota has worked for more than a decade digitizing nineteenth century microdata census samples of the United States (Ruggles and Menard, 1995). As the recovery of these materials progressed, the importance of integrating the various censuses into a single database format with uniform coding schemes became apparent. Recently, the project was awarded a major grant to standardize the census files and distribute the data over the Internet. The Integrated Public Use Microdata Samples (IPUMS) are distributed by the Minnesota Historical Census Project to users on request free of charge via the IPUMS World Wide Web page.

From the IPUMS home page, the researcher may request a dataset by year, region or state, sub-population, sampling density, or a range of other variables as needed, such as individuals born in a particular foreign country or with a specified occupation. Before the data could be placed on the Internet, much energy was devoted to documenting and standardizing data codes for every census. All original data were preserved, but new standardized variables were constructed so that various censuses could be integrated into a single dataset without the need for individual researchers to fret over the coding of each variable. These data are readily available to any researcher with a few mouse-clicks over the Internet. This approach allows the researcher to extract small specially tailored datasets suitable for analysis on ordinary microcomputers. Thus, the social historian interested in nursing may request all individuals with that occupation or all families with nurses listed in them. The regional historian interested in migration to, say, Arizona, may request data for this subpopulation, and so on.

The IPUMS project incorporates all computerized United States manuscript census samples from 1850 to 1990. For most Latin American countries digitized samples become available only from the 1960s. Nevertheless these constitute substantial collections in their own right. To facilitate research, the IPUMS paradigm for documenting, standardizing and distributing sample census data offers valuable lessons.

What can historians entice from these materials that demographers have not? Consider the recent history of marriage in Mexico, a subject of interest to me. The average age at first union for Mexican females seems to have remained almost invariant since 1930, or at least that is the story gathered from the published census data. The average has stood at 21 years from 1930 to 1990, fluctuating less than one-half year over more than five decades. Because the data are published by federal entity much of the demographic research on marriage age, emphasizes regional variations, that females marry a year or two later on average in the Northern states and a year or so younger in the South (Quilodrán, 1993, 1996).

Notwithstanding the apparent stability in age at first union for females, averages computed from census data using Hajnal's singulate mean age at marriage method (SMAM) show strong differences by levels of educational attainment. When one controls for educational level much of the regional variation disappears. From 1970 to 1990 the singulate mean age at marriage for Mexican females climbed more than a year from 21.3 to 22.4. Paradoxically, the average age at marriage by educational level changed relatively little (Table 1--see below). The average for women with no education inched upward from 19.7 to 20.4 years. For women with some education, the change was barely measurable, rising from 21.4 to 21.6 years. Women with post-secondary education rose by one-half year to 25.0. A decomposition analysis reveals that almost ninety percent of the overall increase in female marriage age from 1970 to 1990 is explained by increased educational attainments.

For Mexican males, the effect of education on the timing of marriage is not as strong as for females. Men with no education married on average at 23.6 years in 1970, edging upward to 23.8 years in 1990. For those who did not go beyond primary schooling marriage occurred around the twenty-fourth birthday, regardless of how many years of primary school they completed. In 1970 the figure was 24.2 years, declining to 23.8 in 1990. With additional years of secondary, vocational, or higher education, average marriage age for males jumped to 26.4 years, but this represented a decline from 1970 when the figure was 26.7 years.

Latin America, as one of the great regions of the world, offers unusually rich research opportunities for studying the population history of the recent past, thanks to the regularity with which censuses were taken and to widespread practice of preparing individual-level samples for subsequent analysis. Then too, as Latin American population historians we should also attempt to preserve complete census datafiles for those countries for which they still exist.


Table 1. Average Age at Marriage: Mexico 1970, 1990

By Educational Levels and Sex
Educational Level1970199019701990
No Schooling23.623.819.720.4
All Levels24.324.621.322.4
Sample Size n=145,121259,450149,846279,202

Source: My computations from Instituto Nacional de Estádistica, Geografía e Informática, 1% census samples, 1970, 1990.



