title: Integrated Samples of Colombian Censuses, 1964-2001

Col-IPUMS: Integrated Public Use Microdata Series of Colombian Censuses, 1964-2001
A revised proposal submitted to the National Institutes of Health (abridged)
funded: January 1, 2000 – December 31, 2002
©Robert McCaa and Steven Ruggles

Introduction

This is a revised proposal seeking funds for constructing and integrating public use microdata series for Colombia’s five national censuses, 1964-2001 (Col-IPUMS). The resulting databank and its thoroughly integrated documentation will be disseminated via the World Wide Web and other electronic media, similar to the Historical Census Project system for distributing US-IPUMS (http://www.ipums.umn.edu). Our initial submission was reviewed favorably and narrowly missed being funded; however, the reviewers were able to identify areas where our plan could be improved. While the most frequently cited concern was a lack of detail in the discussion of the censuses (a "small weakness" in the words of one reviewer), the most serious objection, according to another, was our choice of Colombia instead of some other Latin American country, such as Ecuador. We address these issues as well as other reviewer concerns in this revision. In addition, we have made other alterations based on knowledge gained in the interim thanks to continuing, extensive collaboration with Colombian researchers and statistical personnel as well as the Centro Latinoamericano y Caribeño de Demografía (CELADE).

The most serious criticism of our earlier proposal was the choice of Colombia as a research site instead of some other Latin American nation. Site-selection is of utmost importance to the success of what we hope will be a prototype for the region, and indeed other regions around the globe. Although we presented only three reasons for choosing Colombia, as a matter of fact, we arrived at our decision only after consultation with demographers and census authorities in all but one of the larger countries of Latin America (those with more than twenty million inhabitants in the mid-1990s--Brazil, Mexico, Colombia, Argentina, and Peru, but not Venezuela) and four smaller ones (Chile, Ecuador, Paraguay, and Costa Rica). Brazil was quickly dropped from consideration because its scale of difficulty is substantially greater than for any other Latin American country, while our Brazilian expertise is substantially less. A Spanish-speaking country serves as a more useful prototype.

Issues of data quality, scientific salience, and chronological continuity were so fundamental to our decision that we did not discuss these in our initial proposal, but we take advantage of this re-submission to do so now.

Data quality. We sought a country where post-enumeration surveys are common and coverage rates 85% or higher. There is widespread agreement that on the whole Colombian censuses are of good quality, approaching the highest standards in Latin America, such as those of Argentina, Costa Rica, and Chile (Dirk Jaspers, Susan De Vos and Joe Potter, private communications). Colombia is one of the few countries in Latin America where both pre- and post-enumeration surveys are regularly carried out. Beginning with the census of 1973, the Colombian statistical agency, DANE, has conducted—and published—post-enumeration surveys. Under-enumeration rates for Colombia fall in a range of 11.5-15% (Potter and Ordoñez, 1976; DANE 1990, 1996). While political violence, as noted by one reviewer, certainly affects enumeration coverage and quality, this problem is limited primarily to peripheral regions. Although it is true that Colombia does not rank among the best countries in the world in terms of census coverage, its record is superior to that of most Latin American countries, including Ecuador. Initially, Ecuador was a favored site for this project, and, indeed, the PI’s interview with officials of the Instituto Nacional de Estadística y Censos in July 1997 revealed unbridled enthusiasm for our project. However, INEC’s spare archives, disorganized library, scarce resources, and paucity of studies evaluating census procedures or undercount persuaded us that Ecuador was not the most auspicious site for developing our prototype. A comparison of Ecuadorian and Colombian census publications reveals further qualitative differences, to the advantage of Colombia. Where INEC publishes Ecuadorian census figures with little discussion, explanation or qualification, DANE publications demonstrate much technical sophistication and great concern with census quality, interpretation and inference. The DANE archives are impressive in terms of their organization, accessibility, and completeness. For example, a query in the computerized catalogue of the central DANE archive for documentation pertaining to the census of 1973 yields 298 reports totaling more than 5000 pages.

Scientific salience. Both qualitative and quantitative indicators point to Colombia as a top-ranked site for this project. If the ethnic and ecological diversity of Colombia makes it an exceedingly interesting laboratory for studying compelling scientific issues of late twentieth-century Latin America (see below), quantitative indicators, measured by citations in Population Index, place Colombia near the top as well. In scholarship, Colombia ranks fourth in Latin America, behind Mexico, Brazil and Peru in a cluster with Argentina, Costa Rica and Cuba, but well ahead of Chile, Ecuador and other Latin American candidates. We resisted the allures of Chile and Costa Rica as countries that are too small, and Argentina as too well-studied and relatively less interesting due to its more homogeneous, wealthier, Europeanized population.

Chronological continuity of available microdata (at least one census every decade since the 1960s). Mexico, our long-held preference, ultimately failed the continuity test. In vain we searched computer data libraries throughout the Western Hemisphere to locate microdata for the 1980 census of Mexico. Finally, we were forced to accept the often-repeated dictum of our advisers that no copies of the 1980 tapes survived the 1985 earthquake. Then too, for 1960, INEGI officials, in a misguided attempt to anonymize the microdata, deliberately obliterated household and family structure. We were forced to conclude that, as a prototype, Mexico is not as well suited as Colombia. Nevertheless, because of the large Mexican-born population in the United States, we plan to undertake an integration of Mexican censuses in the near future.

Colombia satisfies the criteria of quality, salience, and continuity as well as other less tangible considerations. As noted below, the eagerness of the Colombian statistical authorities and academics to collaborate in the undertaking was an important factor (see the attached notarized agreement, dated Dec. 4, 1998, between the University of Minnesota, DANE, Universidad Externado and Universidad Nacional) as well as the notion that this project might assist a people determined to reconstruct a country troubled by guerrilla warfare and criminality. Finally, it should be noted that we see Colombia as only a beginning. If this project is successful, over the next decade we expect to extend the Col-IPUMS paradigm to Canada and Mexico as well as other countries in Latin America and beyond.

The third set of reviewer criticisms relates to the task--its magnitude, our ability to complete it and a concern with duplicating what CELADE may have already accomplished. To improve our plan, indeed to return to a more ambitious model of international collaboration which our previous proposal sacrificed for the sake of economy, we propose to expand our pool of consultants as well as extend our time in the field.

First, regarding CELADE, we seek to capitalize on its long history of using Latin American censuses, including those of Colombia, by working at CELADE for approximately one month at the beginning of this grant, and by drawing on the expertise of a CELADE consultant for several weeks each year of the grant (see the attached letter from Dirk Jaspers). CELADE has confirmed that its copy of the 1964 Colombian census microdata sample is recoverable. The availability of the 1964 census will permit a significant extension of the Col-IPUMS project, at modest cost. Dr. Jaspers also assures us that Col-IPUMS will not duplicate CELADE’s long abandoned effort at integrating Latin American censuses (OMUECE). That effort failed after the 1970 round of censuses due to the high computing costs in those years and CELADE’s lack of commitment to continuing the project into the 1980s.

Second, as the reviewers helpfully noted, the magnitude of the proposed integration is indeed great and as with any large-scale, novel enterprise there are risks that it may not succeed. However, the Col-IPUMS effort is at least an order of magnitude less than what we have already accomplished with the U.S.-IPUMS, 1850-1990. The need to revise our proposal provides the opportunity to improve the odds of success. The PI, will work for five months on site in the DANE archives and library in Bogotá beginning July 1, 1999, thanks to a recently awarded Fulbright Fellowship. During that time, our Colombian partners have agreed to host a seminar/workshop on the integration of Colombian censuses with the PI, in-country consultants, and other Colombian specialists as participants. This will respond to one reviewer’s advice that the PI allocate additional research time in-country, but without additional cost to the project.

For reasons of economy our initial plan proposed to use consultants sparingly. Now, we propose to form a partnership with two distinguished Colombian social scientists who have long worked with Colombian microdata. They and their collaborators will not only contribute their many years of expertise to the project but they will also become Col-IPUMS intellectual co-authors assuring widespread acceptance—by themselves, their students and their colleagues. Although usage is not expected to be a problem (US-IPUMS transmits 120 gb of US data per month), we will promote the use of Col-IPUMS by inviting a select group of national and international scholars to a data-release conference at the University of Minnesota near the conclusion of the project. Finally, CELADE will mirror the Col-IPUMS site for Latin American users or others, should over-loading become a problem at Minnesota.

Specific Aims

This project seeks funding to create Public Use Microdata Samples (PUMS) for the Colombian censuses of 1964, 1973, 1985, 1993 and 2001. Microdata from the Colombian censuses has never been made publicly available, but the original data used to create the aggregate tabulations for these census years do survive in machine-readable form. Thus, it will be possible to create useable PUMS files and associated documentation for Colombia at modest cost. We plan to model the Colombian samples on the Integrated Public Use Microdata Series (IPUMS) created at the University of Minnesota, and to the extent feasible to make the coding schemes entirely compatible with the existing U.S. census microdata (Ruggles and Menard 1995). These scientifically drawn samples, once integrated into a single databank with uniform codes and documentation and released into the public domain, will constitute the richest source of quantitative information on long-term change for any Latin American population. The integrated Colombian PUMS will be distributed over the Internet using the system developed at the University of Minnesota for United States census samples (Ruggles, Sobek and Gardner 1996; Sobek and Ruggles 1999).

This project is a collaboration between the University of Minnesota and the Departamento Administrativo Nacional de Estadística (DANE), the Colombian census authority. DANE, with assistance from CELADE in the case of the 1964 census, is providing the machine-readable raw data files for each census year. We are contracting with DANE for several technical aspects of the project to tap the expertise of DANE cartographers, systems analysts, and statisticians. The project will also gain priority access to the 2001 Colombian census. We will be able to plan our integration using the 2001 census schedule, technical definitions, and administrative boundaries as the standard.

The work to be carried out at the University of Minnesota will involve the following steps:

1. Designing record layouts, coding schemes and constructed variables that maximize comparability and minimize information loss;

2. Adapting US-IPUMS software for reformatting, recoding, constructing new variables, checking for consistency, and allocating missing and inconsistent data;

3. Processing more than one hundred million records (nearly a half million for the census of 1964, more than one million for 1973; 25 million for 1985, 40 million for 1993 and approximately 50 million for 2001).

4. Preparing an integrated set of documentation for the entire series of datasets, including a general user's guide, procedural histories, technical information and error estimation. While English will be the language used for documentation and web-based user-interface with the database, images of the original Spanish documentation will be linked by hyper-text into the materials to facilitate detailed scrutiny by researchers fluent in the original language of the Colombian censuses.

5. Adapting the Historical Census Projects web-based user-interface used to distribute United States census samples for the Colombian databank.

In the long run, the investigators envision the creation of a single integrated series composed of public use samples of recent censuses for the larger Latin American countries for which the microdata still exist. For example, Mexico, the obvious first choice for the IPUMS prototype, was rejected for the initial trial, because the 1980 census tapes were destroyed in an earthquake and the surviving 1960 tapes, while readable, suffer a serious technical defect—the destruction of household ordering. The principal investigator possesses samples for 1960, 1970 and 1990. Mexican officials are now eager to develop an integration agreement—thanks in part, to the knowledge that the Col-IPUMS project will soon be underway. The pilot project for Colombia will establish standards and procedures for a wide-range of the most common variables found in Latin American censuses. Solutions to inconsistencies in the Colombian series will be designed to anticipate future difficulties with censuses for other Latin American countries.

Background and Significance

Social scientists have increasingly recognized the need to study society as a process in time. If we confine our analyses to the state of society at a single moment, we cannot hope to understand the sources of social change. Sociologists, demographers, economists and public health analysts have developed a variety of quantitative data sources to study social change, including retrospective surveys, and longitudinal surveys. Although such data sources are essential, their analytical usefulness is limited.

Censuses are the most consistent general source of information about populations throughout the Americas, and Colombia is no exception. Quantitative studies of change in health, mortality and demographic processes in the broader society have always relied on published tabulations of statistical data, but these have substantial limitations. Census officials only attempted the most rudimentary cross-classifications of their data, and much of the information collected was never tabulated at all. Morever, aggregate tabulations in published census documents are poorly suited to multivariate analysis.

In the United States, the Census Bureau has addressed the limitations of tabular data by producing individual-level public use microdata samples for each census year since 1960. In Latin America, the United Nations sponsored Centro Latinoamericano de Demografía (CELADE) encouraged Latin American countries to construct similar samples or provide original census tapes to CELADE for that purpose. Many samples were so drawn, but their use was restricted to CELADE researchers who made only limited efforts to integrate the various censuses either chronologically (OMUECE, for censuses carried out between 1960 and 1976) or spatially (13 countries represented by two censuses, five by only one, and seventeen by none). Nevertheless, these census samples proved to be a valuable resource, since they allowed CELADE researchers to design sophisticated multivariate models tailored to their specific research questions.

In the United States, public use samples have proven their worth in studies of social change, because they substantially reduce the problems of incompatibility in the published data for different census years. In addition, the public use samples have allowed researchers to move beyond simple tabular analysis and apply increasingly complex, nuanced multivariate techniques. The existence of these data have significantly expanded the power of quantitative social science research.

In Latin America, the potential impact of public microdata samples are even greater because the CELADE census samples were never released to the public. In place of CELADE’s partial, cross-national integration of selected variables, with scant documentation, and jealously guarded dissemination, we propose a total integration of all available microdata for a single country with full documentation and open public-access. It must be noted that CELADE’s recently developed whole-census system REDATAM (implemented by Dr. Jaspers) is no substitute or competitor for the IPUMS paradigm. For reasons of confidentiality, REDATAM offers a powerful analytical engine, but cannot allow researchers to gain access to entire census data files for individuals with detailed geographic encoding. Planners will appreciate the finely grained statistics for small places provided by REDATAM, just as scientists will favor the analytical flexibility of stratified samples disseminated by Col-IPUMS.

The original census data tapes included a wide range of information—much of it concerned with public health, such as the availability of sanitation services, source of water supplies, type of cooking fuel, or housing construction materials—that was rarely studied (De Vos and Arias 1996). Coupled with responses to questions on fertility and mortality, these data offer exceptional opportunities for pinpointing the correlates of mortality change (or the lack of them) at the local, regional and national levels.

In Colombia, the availability of original census tapes means that it is still possible to construct public use microdata samples, using the most advanced specifications currently devised. This is a major undertaking, made more difficult by the fact that each census has a different format, different coding schemes, and different documentation. Although all Colombian censuses are remarkably similar in terms of questions asked and responses permitted, standardizing them on a uniform set of codes is a major undertaking. In fact, the only variable that is readily comparable across census years is age, and even there the censuses differ in treatment of missing, illegible, and inconsistent data. Even sex is not uniformly coded in the Colombian case. As the number of valid options increases, so does the confusion caused by inconsistent coding schemes. Published documentation for the censuses is minimal by international standards, and much essential information exists solely in the memories of DANE technical staff or remains unpublished in DANE archives. The limited published documentation is for the most part organized differently from one census to another, and the treatment of comparability issues is often cursory.

Even if the Colombian census files were readily available to researchers, their incompatibility in their present form would mean that multi-sample studies would require a large initial investment to prepare the data for use. In contrast, in the United States, thanks to the ready availability of the Census Bureau PUMS and their subsequent integration by the Historical Census Projects, there is an outpouring of valuable research. In Latin America, only a few researchers have been able to obtain census microdata through individual arrangements with national statistical offices. Such raw data are difficult to use because of inadequate documentation. In the rare instances in which researchers have attempted to use more than one census year simultaneously, they face daunting comparability problems. Given the complexity of the datafiles and the often-subtle differences among them, the potential for error is large.

This proposal seeks funding to transform Colombian census data into public use microdata samples for 1964, 1973, 1985, 1993, and 2001; integrate the resulting samples into a single consistent format; and prepare documentation oriented to the use of the samples as a series. In the long run, we anticipate expanding this prototype to other larger Latin American countries, such as Mexico (McCaa 1997). In the meantime, the Colombian integrated microdata series will provide compatible individual-level data for a large stratified sample of the Colombian population in five census years spanning four decades. The series will constitute a resource of great utility for the study of long-term social change in Colombia, and a model for other Latin American countries. Even should the Minnesota paradigm for integrating census microdata not catch on elsewhere, the Colombian databank will still be of great value for understanding the dramatic transformations occurring in Latin America at the end of the twentieth century.

In the case of the American microdata files, concerns about confidentiality have always been an issue of great concern. We will adopt similar procedures for the Colombian data. To respect Colombian confidentiality regulations (articles 74 and 75, decree law 1633 of 1960), we will eliminate all references to name, addresses, and geographic location below the district level. In addition, we will apply appropriate topcodes to variables such as income to ensure that individuals cannot be identified. Our challenge is to attain the highest possible sampling density, with the greatest possible geographical detail, without compromising confidentiality. Our proposed ten percent sample density is conservative. DANE will approve the highest density compatible with confidentiality safeguards, but actual testing will be required to arrive at that figure.

Colombia was chosen as the site for the initial trial for three reasons. First, the past quarter-century's machine-readable census tapes with a wide range of demographic variables still exist. Second, Colombian researchers and the Colombian National Statistical Bureau readily agreed to collaborate in the PUMS project. Third, the Colombian data offer extraordinary research opportunities for a number of contemporary policy issues—topics such as the demography of political violence (the most important public health issue in Colombia in recent decades), the social correlates of violence-related physical disabilities, mortality differentials by region and race, immigration to and from the United States, movement of women into the workforce, the spread of public education, urbanization, fertility transition, and the rise of the nuclear family.

The range of potential topics that can be addressed with these data is far too great to describe within the page limitations of a grant proposal. The following paragraphs are intended only to suggest some of the most obvious topics of investigation.

1. Women in the Workforce. The place of women in the workforce is currently one of the most controversial subjects in the study of Latin American women and gender issues more broadly. Some critics would deny any validity at all to Latin American censuses regarding women's labor. They argue that questions on work were designed with males in mind, based on an advanced economy model where jobs were stable, hours standardized, tasks routinized, and work calendars unvarying (Wainerman and Recchini de Lattes 1981; Gomez 1981; León 1985; Aguilar 1985; Bustos and Palacios 1994; Safa 1994). While such criticisms are justified to some extent, solutions to many of these objections may be found in the multiple questions on work presently available in microdata census samples, not only in the case of the United States (Sobek 1997), but also for many Latin American countries, including Colombia (McCaa 1998).

The principal investigator's analysis of the Colombian census microdata for 1973 and 1985 based on an ad hoc "integration" reveals a twenty percentage-point increase in formal labor force participation for women in only twelve years. The pattern is particularly striking for married women, for whom one factor alone—higher levels of education—accounts for over half the increase (results derived from a rough integration of a small set of key variables for two censuses with no regional or geographical detail). Comparing these results with IPUMS-derived data for the United States from 1880 to 1990 shows that by 1985 Colombian married women had attained participation patterns by age that closely paralleled those for married women in the United States in 1970. In both cases some 40% of married women aged twenty-five through forty worked in the formal labor force.

In the United States, over the three decades from 1940 to 1970, a great transformation occurred in the rate of married women in the workforce. In Colombia, only twelve years were required for a similar change to occur. Even more surprising in the Colombian case is that after education the second most important factor for explaining change for married women was not declining fertility, poverty (using an index based on public services available to the household), or even spouse's position in the workforce, but rather the husband's own educational attainments. A logistic regression model of these variables reveals that the greater a husband's education, the more likely that his wife worked in the formal labor force—even after taking into account her own level of schooling (McCaa 1998).

Microdata on formal labor force participation according to classic definitions offer valuable insights on women's entry into capitalist wage-labor markets. Then too, microdata analysis permits the researcher to take into account the entire household economy, such as the presence and work situation of a spouse, children, or other individuals, whether related or not. The integrated public use samples that we develop will enhance the original data by constructing composite household indicators of labor force participation. The hierarchical organization of the proposed census series, with individuals identified within household contexts, is well suited to the study of the household economy.

Analysis of the determinants of female labor force participation and child labor is a particularly compelling issue in Colombia, yet their study through time and across space is impossible with aggregate data. Integrated public use microdata series allow researchers to take control of the data, to move beyond the frustrations of changing technical definitions to focus on substantive issues of model development and hypothesis testing. Microdata are particularly salient where intellectual orientation and ideology ride roughshod over empirical evidence.

2. Demography of Violence. The Colombian death rate from violence—at some 80 per 100,000 population in the 1980s—ranks as one of the highest in the world for a country not at war (Pecaut 1997). One of the goals of the project is to standardize place-codes at the district level so that the demographic effects of violence may be studied at local as well as regional levels. While sampling error will be too great to study individual localities, with a uniform coding scheme it will be possible to develop appropriate aggregates. Integrated census microdata can be used to measure the effects of violence on types of communities, families, households, and individuals through time (Murillo-Castaño 1991; Ruíz and Rincón 1996). We have contracted with the Colombian statistical bureau (DANE) to ensure consistency in the coding of small places.

The integrated database includes variables on orphanhood, widowhood, and child mortality. The 1993 census was designed to measure a wide range of physical disabilities. Whether these questions will be continued in the 2001 census has not yet been decided. The Colombian census question on employment includes a special response for the disabled since 1993 and a decision has already been made to retain this option in the 2001 enumeration. The allegation that one-in-forty Colombians is a refugee may be tested with census microdata at both the national and local levels, but sustained research on this subject awaits resolution of inconsistent geographical codes for minor civil divisions.

3. Emigration and Immigration. In terms of international population movements, Colombia is primarily a country of emigration, with the United States constituting the principal destination country. Since 1985, Colombian censuses request information on mother's number of children resident abroad. Questions on retrospective migration also tap into movements to and from the United States. Coupling these data with the US-IPUMS databank will make it possible to compare Colombians resident in the United States with those resident in Colombia. The hierarchical structure of both datasets facilitates the study of individuals in their family and household contexts, so that it will be possible, for example, to study the correlates of unmarried Colombian fathers or mothers, whether they reside in the United States or Colombia.

4. Fertility. From the early 1960s to 1997, the Colombian total fertility rate declined from an average of 6.8 children to 3.0. This astonishingly rapid transition has elicited a great deal of scholarly attention (Puyana 1985; Flórez Nieto 1996), so much so that one might conclude that little work remains to be done—but this would be wrong. The Colombian PUMS, once suitably integrated, will permit the study of differential fertility patterns during a period of great fertility decline, and the relative impact of occupational class, region, education, size of locality, family structure, and a host of other variables at the individual, family or community level. The richness of these data will greatly enhance our ability to analyze the determinants of fertility decline in a developing country, and this may in turn lend insight into fertility control in late-developing regions. For women of child-bearing ages, Colombian censuses consistently report children ever-born, children currently alive, and the last born child's date of birth and survival status. In addition, the integrated microdata series will incorporate a set of fully compatible links between mothers and their children, and this will eliminate the most onerous aspect of one of the most widely used methods of fertility research, the own-child method.

5. Life Course Analysis. Changes in the timing of major life-course transitions—such as leaving school, leaving home, starting work, marrying, and establishing a separate household—have been studied in Colombia using retrospective survey data (Flórez Nieto 1989; Flórez Nieto and Hogan 1990; Flórez Nieto, Echeverri Perico, and Bonilla Castro 1990). While these studies have yielded valuable insights on the dramatic changes taking place in Colombia, our understanding of these processes would be vastly enriched by the analysis of all Colombian regions. Gutierrez de Pineda's path-breaking study of the Colombian family with her four typologies of the Andean, Black, Santanderan, and Antioqueñan cultural forms has never been put to the empirical test of nation-wide probability samples (Gutierrez de Pineda 1968). The integrated microdata series will provide the opportunity to test her models through a cohort analysis of the timing of change and differences by regions and among sub-populations, such as migrants and non-migrants, and more educated or less.

6. Aging and population projection. Demand for more sophisticated population projections increases as populations age (Banguero and Castellar 1993). A new multi-dimensional method for projecting populations and households requires input data that are readily derived from integrated public use samples (Vaupel, Yi and Zhenglian 1997). This summer, the principal investigator will attend the Max Planck Institute for Demographic Research workshop on family/household modeling and applications to evaluate this new method of projection and to ensure that the Colombian PUMS are properly designed to provide the necessary data.

These topics are intended only as representative examples of the sort of research that can be carried out with the Colombian integrated microdata series. Other key areas of investigation include the transformation of class structure, urbanization, internal migration, nuptiality, education, and the availability of public services such as water, electricity, sewage, and the like.

The large size of public use samples increases their versatility by permitting analysis of small population subgroups, metropolitan areas and other civil divisions. Policy analysts have traditionally focussed on short-run change, but there is increasing recognition of the need to distinguish long-term secular trends from temporary fluctuations. Public use samples will also allow Colombian policy analysts to set their investigations of department and local conditions in a comparative national context.

In summary, the Colombian census enumerations include a great deal of information on demography and socioeconomic structure that can only be exploited through analysis of microdata. At present only the broad outlines of the social transformations underway in Colombia are understood. Published sources provide limited information on topics such as fertility behavior, urbanization, household composition, and occupational structure. The proposed integrated microdata series for Colombia would allow the construction of comparable cross-tabulations on a wide range of topics that were not covered by census publications or were incompletely tabulated. Perhaps even more important is the potential for pooled multivariate analyses opened up by the availability of compatible microdata. Used in combination, the four data sets spanning a quarter century of cataclysmic social and economic change will comprise our most important resource for the study of the evolution of Colombian society. Violence has isolated Colombia from the international scholarly community almost as effectively as a blockade. This project will help circumvent that blockade.

Preliminary Studies

Robert McCaa, the principal investigator, has been working on Latin American demography for almost thirty years. His first publication was an edited volume of census tables for the 1940 Chilean population census which had been published sporadically in the Chilean Statistical Bulletin over a period of years (McCaa 1972). Over the past fifteen years, he has recovered several large historical censuses, primarily for Mexico (McCaa 1984, 1988, 1989, 1991, 1997a), including a study of one of the most sophisticated censuses from ancient times, a sixteenth-century listing from Mexico written in Nahuatl (McCaa 1996).

Motivation for creating and integrating Colombian censuses comes from his experience—and frustration—over the past seven years as a user and a producer of large census microdata sets. An attempted comparison of ethnic intermarriage in New York City and Buenos Aires was blocked by an inability to gain access to the Argentine microdata. The data for New York City convinced him of the power of these data for addressing important historical issues and policy matters (McCaa 1993). Saving Latin American census tapes from destruction became an urgent concern; developing integrated public use samples, a goal. McCaa has already undertaken exploratory integration of portions of two Colombian censuses (McCaa 1998). He and his students have work in progress using Mexican and Chilean census samples (McCaa and Mills, in press).

Steven Ruggles also has extensive experience with census data. He has directed projects to create new public-use microdata samples for the United States census in 1850, 1860, 1870, 1880, 1900 and 1920, as well as an oversample of the Hispanic population in 1910 (NSF-SBR-9210903; NIH-R01 HD34572; R01 HD25839; R01 HD36451; R01 HD29015; R01-HD32325). In addition, he has directed a series of projects to create the Integrated Public Use Microdata Series (IPUMS), which incorporates census microdata from the U.S. Censuses for the period 1850 through 1990 (NSF- SES-9118299; SBR-9422805; SBR-9617820; NIH- R01 HD34714). He has published a variety of papers related to this work (e.g Ruggles 1991a, 1991b, 1991c, 1993, 1995a, 1995b; Ruggles, Hacker and Sobek 1995; Ruggles and Menard 1990, 1995; Ruggles and Sobek 1997; Ruggles, Sobek and Gardner 1996). He has also carried out extensive research using census microdata (e.g. Ruggles 1988, 1994a, 1994b, 1997).

As a result of these projects, we have become intimately familiar with the intricacies of public use samples, for both the United States and Latin America. In the course of constructing the US-IPUMS, our staff has invested thousands of hours in the reconciliation of variables such as relationship to head of household, occupation and birthplace.

Prof. Susan De Vos is one of the most accomplished researchers in the United States in the use of Latin American census microdata. She brings years of experience to the project of struggling with incompatible census definitions, both with Colombian census data, in particular, and Latin American censuses in general. As her letter of acceptance indicates she has a profound knowledge of the intricacies of changing Colombian census definitions and the careful work required to produce a well integrated databank of Colombian censuses.

Prof. Joseph Potter conducted the first technically sophisticated evaluation of enumeration error of a Colombian census, that of 1973 (Potter 1976). His expertise will be particularly valuable to this project, not only because of his work with Colombian data, but also for his wide ranging interests in fertility, mortality and health issues.

Prof. Carmen Elisa Florez is one of Colombia’s most distinguished living demographers, and certainly the most active researcher in the use of Colombian census microdata. In addition to consulting on the census of 1985 and working in various technical capacities at DANE, she has published a large number of technical, methodological and analytical studies using Colombian microdata. Her current project on occupational change in Colombia will provide valuable expertise in the difficult area of standardizing occupational codes.

Dirk Jaspers Faijer has almost twenty years experience at CELADE, having risen through the ranks from "associate expert" (San Jose, Costa Rica) to "chief of division for training on population" (Santiago, Chile). Lately, he has been responsible for the development of CELADE’s REDATAM initiative. Meanwhile, the CELADE tape library is under his direct supervision. He enthusiastically welcomes our project, and the opportunity to develop a mirrored web-site at CELADE.

Research Plan

The principal tasks required for creation of the Colombian Public Use Samples and integrating them into microdata series are given in Table 1, together with an estimate of the hours required for each task. This section describes each of these tasks in turn, with particular attention to design strategies to minimize problems of comparability.

1.0 Preparation and Design. Detailed planning of the integrated public use microdata series is a major undertaking, and design issues will be the focus of our work during the first year of the project. The following sections are intended to raise the most important design problems and our general approach to them.

To set the context for this discussion, Table 2 shows the availability of the most important variables in each of the microdata samples. Thirty-two questions are common to the four enumerations. Eighteen additional questions are available in two, three or four census years, and fourteen others are present in only a single census.

1.1 Survey and consultation of potential users. The success of this project will depend on the usefulness of the data series to a broad range of social scientists. To ensure that the design of the series meets the needs of potential users, we plan an extensive program of consultation before it is finalized. We have formed an advisory group of prominent users of Colombian census data. In Fall 1999, before NICHD funding begins, DANE, the Universidad de los Andes, Universidad Nacional and the Universidad Externado will host a workshop on integrating Colombian censuses directed by the PI with our Colombian consultants as participants (see Dr. Bodnar’s letter). From the workshop we hope to produce a preliminary design of the integrated microdata series. To obtain feedback from other potential users of the data, we have already presented papers at the International Congress of the Americanists (Quito, July 1997), Studies on Women, Gender and Development (Bogotá, May 1998), and at the Max Planck Workshop.

Although such meetings are effective for the discussion of broad strategy issues, they are less useful for addressing the hundreds of detailed design problems that we must resolve. Moreover, we want to seek advice from the broadest possible range of potential users. We therefore plan to identify researchers most likely to use the data series, and send each of them a detailed prospectus and set of questions on the design of the database. Our discussions to date with interested social scientists have shown that there is great enthusiasm for creation of the data series, so we anticipate that many academic investigators will be willing to provide extensive written advice without financial compensation. In addition, throughout the grant period we will continue to seek advice from the Colombian Census Bureau, Centro Latinoamericano y Caribeño de Demografía, Inter-University Consortium for Political and Social Research, and users of the US-IPUMS samples.

During this early stage, our team of consultants will author comparability essays and craft code-specific comparability tables for complex census variables, such as occupation (see attached letter by Carmen Elisa Florez). These essays will provide the intellectual underpinnings for our integration decisions, specifically for the following topics: administrative geography (DANE), occupation and other socio-economic variables (Carmen Elisa Florez), housing and services (Susan De Vos), family and nuptiality (Lucero Zamudio—see attached letter), fertility and education (TBA), Indian populations and migration (TBA), and census quality and coverage (Carmen Elisa Florez and Susan De Vos). Our consultants’ knowledge of Colombian censuses in general and the evolution of specific census concepts will be particularly valuable to the project to insure the highest quality integration of individual and household variables across time with minimal distortion of original concepts and definitions.

Our expert consultants also constitute an important group of potential users. Their comparability essays will be written from the perspective of an experienced user’s future research plans. Our consultants will not only provide valuable insight on integration design, they will also help resolve tediously technical integration problems. At the same time, they will increase the efficiency of the Minnesota team. This will make possible the integration of five censuses, instead of only four, as originally planned without increasing expenses in Minnesota.

1.2 Record layout and general coding design. Following conventional practice for US-IPUMS and the Census Bureau PUMS before them, the Colombian integrated public use microdata series will consist of numeric codes arranged in a column-format hierarchical structure. Variables common to the household—such as geographic indicators and housing questions—will appear on a household record. Each household record will be followed by a series of person records specifying individual-level characteristics.

The design of the record layouts will stress column-compatibility rather than compactness. In general, all variables available in multiple census years will appear in the same columns in every year. When a variable is not available for a given year, the columns will be filled with a missing data value. This means that the integrated versions of the public use samples will be substantially larger than the originals. We anticipate a record length of approximately 335 bytes, which is more than twice the average of the existing data sets.

The great advantage of column compatibility is that it simplifies the construction of multi-year data files and minimizes the potential for user errors. In view of the rapid decline in the cost of mass data storage, it makes sense to focus on efficiency of use rather than efficiency of computing resources.

The Colombian censuses employ differing numeric classification systems in every census year, and reconciliation of these classifications is a major part of this project. For many variables, it is impossible to construct a single uniform classification without an unacceptable loss of information. Some census years provide more detail than others, and if we reduced all census years to their lowest common denominator we would sharply reduce the power of the data series.

To avoid such problems and still maximize compatibility, we will design composite coding systems for most variables. The first one or two columns of each variable will be entirely compatible. An additional one or two columns will provide added detail for particular census years or groups of years. This approach maximizes ease of use and minimizes information loss at modest cost in space.

To minimize the potential for recoding errors, the principal investigator and his research assistant will independently construct two parallel recode dictionaries for each variable in each census year. Any discrepancies between the two will be resolved in conference.

1.3 Design of housing and public services indices. The Colombian censuses, like those for most Latin American countries, provide particularly rich detail with respect to type of housing (5 variables) and availability of services, such as electricity, water, sewage, and so on. Researchers are only just beginning to exploit this information (De Vos and Arias 1996). This project will enhance the use of these variables, not only by standardizing their coding schemes, but also by constructing housing and service composite indices.

1.4 Design of geographic codes. Colombian census microdata, unlike those for the United States, specify the location of each household down to the enumeration area. To preserve confidentiality this code will not be reproduced in the Col-IPUMS databank. Nonetheless, finely detailed location codes on the machine-readable records make it possible to guarantee conformity of minor civil boundaries of departments, municipalities and even districts within municipalities. DANE cartographers are the most knowledgeable experts on changes in Colombian administrative boundaries over the past four decades. It is cost-effective to sub-contract DANE specialists to construct a standardized dictionary of place codes for departments, municipalities and districts. Two cartographers will be assigned to this task full-time during the first year of the grant. The result will be a finely-grained, cartographically uniform set of geographic codes for the entire Republic of Colombia consistent with boundaries for the 2001 census.

In response to one reviewer’s concern, the PI took advantage of a recent research trip to Bogota to discuss the problem of standardizing geographical codes with the current Technical Director of Censuses, Dr. Yolanda Bodnar. Dr. Bodnar demonstrated DANE’s computerized cartographic system developed for the 1993 census and continuously updated since. In a recent email communication [12 Jan 99], Dr. Bodnar assured me once again that making the files compatible with earlier censuses is an important, timely, and practical task, and that the cartographic unit of DANE is eager and fully equipped to do the job. Further testimony is provided in Dr. Bodnar’s letter included with this proposal.

From our study of how Colombian census concepts have evolved over the past several decades, we are persuaded that neither geographical codes (nor occupation as noted above) constitute insurmountable obstacles when compared with the difficulties which we have already overcome to integrate similar variables in the historical censuses of the United States.

1.5 Design of constructed variables on family relationships and household composition. Colombian census authorities collected data on households and relationships within households, but this information has never been reconstructed according to any of the various traditions of scholarship. We will construct household composition variables on the household record, using both European and North American scholarly standards, as is the case with US-IPUMS (Table 3). We will also construct the 19-category Laslett/Hammel household classification for all census years.

Despite its limitations, the Laslett/Hammel scheme remains the most widely used classification for the analysis of households.

Individual-level variables describing interrelationships among family members are even more important for most users than household-level classifications. Such variables make it possible for researchers to create specialized measures of living arrangements tailored to their specific needs, such as living arrangements of the elderly or of single parents. These measures also facilitate the construction of specialized own-child fertility measures and measures of marriage characteristics, including, in the Colombian case, consensual unions. Although the IPUMS system of family relationships was designed solely with the culture and data of the United States in mind, its demonstrated flexibility in accommodating a wide variety of family systems from North America to Europe and Asia is rapidly establishing the IPUMS method as the standard. We anticipate few difficulties in applying the IPUMS system of family variables to Latin America in general or Colombia in particular. Nevertheless, we will be listening carefully to what our panel of advisors recommend in this regard.

At the moment we plan to construct six variables to indicate the membership of each individual in a specific primary family, secondary family, and/or subfamily (FAMUNIT and SUBFUNIT); the size of each of these units (FAMSIZE and SUBFSIZE); and the type of the unity (FAMREL and SUBFREL). We will construct these variables for all census years. In addition, we will create three pointer variables that give the location within the household of each individual's spouse (or consensual partner as determined from the marital status variable), mother, and father (SPLOC, MOMLOC, and POPLOC). The pointer variables allow users to easily attach characteristics of these kin, and sophisticated users find them to be convenient tools for the construction of measures of fertility and co-residence. Finally, we will include several of the most commonly requested variables on own-children: number of own children, number of own children under five years old, and age of eldest and youngest own child.

The Colombian censuses rarely provide for more than ten kinds of family relationships. The information available for sorting out such ambiguous family relationships varies slightly from census to census. A small percentage of ambiguous relationships among family members is the inevitable result. For the sake of consistency, many investigators will want to use family interrelationship variables based entirely on information available in all census years. There are certain applications, however, for which the greater precision available in some years is required. The Colombian samples series, following the guidelines developed by the US-IPUMS project, will accommodate both needs through the use of flags. The variables MOMLOC, POPLOC, SPLOC, FAMUNIT, and SUBFUNIT will be accompanied by flags indicating: (1) if the link or unit membership would be the same even if minimal information were used; (2) if the link was only made because of extra information available in the particular census year; or (3) if the link is contradicted by extra information available in that census year.

1.6 Design of all other coding systems and constructed variables. The classification issues discussed are the most problematic ones, but there are a wide variety of other variables that will require significant work. For example, the development of a consistent scheme for the classification of occupations is a non-trivial undertaking, and the project’s consultant on occupations, Dr. Carmen Elisa Florez, has many years experience working with this variable and will be of great assistance in efficiently resolving this problem (see enclosed letter).

The data series will include three types of constructed variables not previously mentioned. First, we will provide tools to help users manipulate the data. These variables will include census year, record type, number of person records in sample unit, household sequence number, and person sequence number within households. Second, we will construct several household-level variables summarizing characteristics of the unit as a whole, including economic status, life-cycle classification, and the like. Third, we will construct a year of birth variable, to facilitate the tracing of cohorts through the unevenly timed intervals between Colombian censuses.

2. Software development. We anticipate that this project will require approximately 2,000 hours of computer programming. All the programs created for this project will be fully documented and the source code will be released to interested researchers.

2.1 Recoding and reformatting. Once we have designed the record layout and the classification systems for each variable in each census year, we will create software to carry out recoding and reformatting operations. Although these programs are conceptually simple, the large number of variables and samples make this a substantial task.

2.2 Constructed variables. The programs to create new variables on household structure, family interrelationships, and other characteristics will be fairly complicated. Fortunately, we have a strong base of experience with such software, and will be able to adapt many of our existing programs to meet the needs of the Colombian integrated data series.

2.3 Uniformity and consistency checking software. These programs will check for a variety of data-quality problems, undocumented codes, and statistical anomalies that may indicate recoding flaws, or problems of internal consistency. For example, we will look for married couples in which both partners have the same sex, for children with an impossibly high number of years of education, and so forth. Inconsistent codes will be replaced by logical editing or missing data codes.

2.4 Miscellaneous data handling programs. The project will offer data in several formats (SPSS, SAS, and plain text), which are described in the final section below.

2.5 Web-site extraction system. The Minnesota Historical Census Projects has developed a powerful web-engine for distributing integrated microdata samples via the Internet. This system will be adapted for the distribution of Colombian datasets.

3. Data processing. The combined data files of the integrated public use microdata series for Colombia will exceed three gigabytes. Once the software is fully debugged and tested, production of the data series will become an almost mechanical process. Because of the large scale of the public use samples, however, the production runs to create the integrated public use microdata series constitute a significant task.

3.1 The 1964 Census. A 2.0% sample is all that survives from the 1964 microdata (n=349,563).

3.2 The 1973 Census. The 1973 sample is the least problematic because as far as we are aware all that survives is a 3.5% sample, or some 777,000 person records. In other words for this year, as for 1964, there is no alternative to a sample already created by DANE.

3.3 The 1985 Census. The entire dataset of the 1985 census is in the possession of the PI, but the long form was administered to only some 9.4% of the population (almost 400,000 households). The short form, used for some 3.5 million households, retains almost all the household variables on the long form, but reports only the basic demographic variables for individuals. To enhance the power of our integrated sample, particularly in the case of households in small localities, we will encode aggregate place characteristics for the locality on the household records derived from the entire 1985 census file. This multilevel approach will eliminate sampling error for the encoded variables and strengthen the analytical power of the integrated dataset, particularly with respect to small localities.

3.4 The 1993 Census. The entire 1993 census is also in the possession of the PI in digitized form. In drawing our sample, we will use a multistage ratio estimation procedure to select cases within strata defined by geography, household size, household composition, and other key variables. The final sample will have equal weights across cases, except where there is a clear need to oversample a minority population. Our goal will be to maximize sample precision while maintaining ease of use and confidentiality. In general, we will model our procedures on the design used to create the 1980 U.S. PUMS sample, as described in Ruggles and Sobek (1998). Although the sampling fraction is still being negotiated with DANE, we are certain to obtain an agreement similar to that for the 1985 census: a 10% public use sample with encoded place characteristics. A higher figure is a possibility as long as all applicable confidentiality laws are respected.

3.5 The 2001 Census. When this project was initially proposed to DANE officials, they had no specific plans to publicly release census microdata for the 2001 census. Indeed, if this project is not funded, it is questionable whether such a public-use file will be created. A preliminary version of the 2001 census form appears in Figure 1. Funding from this project in years two and three will, in part, pay DANE personnel for extracting and processing the 2001 census public use sample and providing timely copies to this project. The project requests funding for the principal investigator to consult with DANE officials at roughly six-month intervals to ensure timely progress on these tasks. In year three, DANE will verify the integrated databank using both the institution's own internal procedures as well as those developed in the US-IPUMS project.

4. Documentation. The creation of integrated documentation is the highest priority of this project. By comparison with the usual standards of social science research, the existing documentation for Colombian censuses has significant limitations. Without exception, the documentation for each census year is organized differently, and some materials exist only in archival form or in the memories of DANE technical staff. The combined documentation in print adds up to several hundred pages, yet there are no indexes. Simply learning how to look things up in each census year requires a substantial investment of time. Moreover, the discussions of comparability issues range from inadequate to nonexistent.

We plan documentation consisting of three sections: a general user's guide, a section on comparability issues and procedural histories, and a section on technical characteristics and error estimation. The resulting volume will be several hundred pages long, linked together with hyper-text and placed on the web. Our budget does not include funds for translating documentation into Spanish.

4.1 Preparation of basic user's guide. The user's guide will contain the essential information for routine use of the data series. It will include the following sections:

1. General description of the public use samples

2. Guidelines for use of the data series

3. Record contents

4. Glossary of terms

5. Summary of sample designs and error estimation

Most of the information in the basic user's guide will be drawn from existing DANE codebooks and technical publications. Besides the basic codebook, this volume will focus on technical characteristics and comparability issues that users will need to refer to most frequently. To keep the basic user's guide to a manageable scale, we will relegate the detailed discussions of technical issues to the other two sections.

4.2 Section on procedural histories and comparability issues. This section will provide a comprehensive treatment of changes in the census that affect comparability of the samples. We will include capsule procedural histories for all census years and complete enumerator instructions organized by variable. We will focus especially on problems of comparability that stem from differences in enumeration procedures and on changes in post-enumeration editing and processing. Since this task will require close consultation with DANE personnel and research in the DANE archives, the project provides for a total of six trips over the period of the grant for the principal investigator to conduct the required research on site in Colombia, in addition to five months Fulbright-funded in-country research prior to the initiation of the NICHD grant.

(The 2001 census form is reproduced at one-half actual size on pages 45-47.)

4.3 Preparation of section on technical characteristics, recode descriptions, error estimation, and marginal frequencies. This section will contain additional details on many of the topics covered briefly in the user's guide, including data on verification results, approximate standard errors, and allocation statistics. We will also provide full documentation on the conversion of the original data into their integrated format, with particular attention to sources of imprecision in occupation, geographic, and family relationship codes. Finally, we will include a set of frequency distributions for key variables.

4.4 Hypertext web-based documentation. Experience with the distribution of U.S. IPUMS shows that most researchers favor web-based documentation. The project will convert all the documentation of the databank into a hyper-text document and make it available via the web. We will also create downloadable documentation in Adobe Acrobat PDF format.

Work schedule and release of data.

Most of our effort during the first year will be devoted to design of record layouts, coding schemes, and constructed variables. The design work for all census years except 2001 will be complete early in the second year of the project, and at that point we will shift our attention to documentation. The final version of the 2001 sample will not become available until January 2002, but DANE has already provided us with preliminary documentation and will continue to do so, so we will have sufficient time in the final year of the project to complete the work on 2001. Software development will begin one year after the start of the grant period. Production runs for all census years will be carried out in the final nine months of the grant.

The data series will be released simultaneously at the end of three years by the Historical Census Projects from our web site, by the Inter-University Consortium for Social and Political Research, the Centro Latinoamericano y Caribeño de Demografía, and by DANE. Indeed, CELADE’s enthusiasm for our project is such that they wish to mirror our web-based distribution system, to assist dissemination in Latin America. We intend to release the data in several different format. As with the US-IPUMS, users will be able to select sub-samples, based on geographical, individual or household criteria, as well as to specify sample densities and variables desired.

We also plan a compact edition of the data series. This version will maximize comparability at the expense of details peculiar to only one or two censuses. It will include only the common-format component of all composite variables, and will eliminate all variables not available in multiple census years. The compact edition will be considerably simpler and smaller than the main version of the data series, and thus will be more efficient for users who do not require fine detail. Finally, we plan to release merged data files containing data from all four census years. These files will be designed primarily for teaching purposes and for exploratory data analysis. They will contain a small representative sample (about 100,000 cases) of records drawn from the compact edition of each census year. We will explore the practicality of releasing the merged files on CD-ROM.

References

Aguilar, Neuma. "Research guidelines: how to study women's work in Latin America," in June Nash and Helen Safa (eds.) Women and Change in Latin America. South Hadley, MS: Bergen & Garvey Publishers, 1985, 22-34.

Banguero, Harold; Castellar, Carlos. La poblacion de Colombia, 1938-2025. Cali, Colombia: Universidad del Valle, 1993.

Bustos, Beatríz; Palacios, Germán (comps.), El trabajo femenino en América Latina: los debates en la década de los noventa. Guadalajara: Univ. de Guadalajara, 1994.

Departamento Administrativo Nacional de Estadística (DANE). La Población de Colombia en 1985: estudios de evaluación de la calidad y cobertura del XV Censo nacional de población y IV de vivienda. Bogotá, D.E., Colombia, 1990.

Departamento Administrativo Nacional de Estadística (DANE). XVII Censo nacional de población y VI de vivienda. Propuesta general: Colombia 2000. Bogotá, D.E., Colombia, 1998 [documento inedito DTC0013/20/07/98].

De Vos, Susan; Arias, Elizabeth. "Using Housing Items to Indicate Socioeconomic Status: Latin America," Social Indicators Research, 38 (1996), 53-80.

Flórez Nieto, Carmen Elisa. "Changing women's status and the fertility decline in Colombia," Population: Today and Tommorrow—Policies, Theories and Methodologies, Proceedings of the International Population Conference, New Delhi, 1989. Delhi: B.R. Publishing Corportation, 1989, I:189-200.

Flórez Nieto, Carmen Elisa; Echeverri Perico, Rafael; Bonilla Castro, Elssy. La transición demográfica en Colombia: efectos en la formación de la familia. Tokyo: Univ. de las Naciones Unidas; Bogotá: Ediciones Uniandes, Univ. de los Andes, 1990.

Flórez Nieto, Carmen E.; Hogan, Dennis P. "Demographic Transition and Life Course Change in Colombia," Journal of Family History, 15:1(1990), 1-21

Flórez Nieto, Carmen E. "Social change and transitions in the life histories of Colombian women," in The fertility transition in Latin America, edited by José M. Guzmán, Susheela Singh, Germán Rodríguez, and Edith A. Pantelides. Oxford, England: Clarendon Press, 1996, 252-72.

Gómez, Elsa. La formación de la familia y la participación laboral femenina en Colombia. Santiago de Chile: Centro Latinoamericano de Demografía, 1981.

Gutiérrez de Pineda, V. Family y cultura en Colombia. Tipologías, funciones y dinámica de la familia. Manifestaciones múltiples a través del mosaico cultural y sus estructuras sociales. Bogotá: Tercer Mundo, 1968.

León, Magdalena. "La medición del trabajo femenino en América Latina: Problemas teóricos y metodológicos," in Elssy Bonilla C. (comp.), Mujer y familia en Colombia, Bogotá: Plaza & Janes Editores, 1985, 205-222.

Manrique de Llinas, Hortensia, ed. La poblacion de Colombia en 1985: estudios de evaluacion de la calidad y cobertura del XV Censo Nacional de Poblacion y IV de Vivienda. Bogota: DANE, 1990.

McCaa, Robert. "Gender and the labor force: what can we learn from national census microdata for 659,780 Colombian households—1973,1985?," Seminario Internacional, Programa de Estudios de Género, Mujer y Desarrollo, Bogota, Colombia. May 6-9, 1998.

McCaa, Robert. "Latin American demographic history in the age of the World Wide Web: National census samples as historical sources," in Dora Celton (ed.) Fuentes útiles para los estudios de la población americana. Quito: Abya-Yala, 1997, pp. 379-384.

McCaa, Robert. "Families and Gender in Mexico: a Methodological Critique and Research Challenge for the End of the Millennium," in IVConferencia Iberoamericana Sobre Familia: Historia de Familia, Bogotá: Universidad Externado de Colombia Centro de Investigaciones Sobre Dinámica Social, 1997, pp. 71-83.

McCaa, Robert. "Matrimonio infantil, cemithualtin (familias complejas), y el antiguo pueblo nahua," Historia Mexicana 46:1(Jul-Sept, 1996), 3-70. (English text)

McCaa, Robert. "Gender in the Melting Pot: Marital Assimilation in New York City, 1900-1980", Journal of Interdisciplinary History 24:2 (Fall, 1993), 207-231.

McCaa, Robert. "La posición de los padres, la inclinación de los novios, y las reglas de la feria nupcial de Parral, 1770-1814, Historia Mexicana, 40:4(abril-junio, 1991), 579-614.

McCaa, Robert. "Isolation or Assimilation? A Log-linear Interpretation of Australian Marriages, 1947-1986", Population Studies, 43:1 (March, 1989), 155-162.

McCaa, Robert. "Women's Position, Family and Fertility Decline in Parral (Mexico), 1777-1930", Annales de Demographie Historique, 233-243; reprinted: Actas del Primer Congreso de Historia Regional Comparada, Ciudad Juarez: Universidad Autónoma de Ciudad Juarez, 1989, 205-218.

McCaa, Robert. "Calidad, Clase, and Endogamy in Colonial Mexico: The Case of Parral, 1788-1790", Hispanic American Historical Review, 64:3 (August, 1984), 477-502.

McCaa, Robert. Chile: XI Censo de población (1940). Recopilación de cifras publicadas por la Dirección de Estadística y Censos. Santiago: Centro Latinoamericano de Demografía, 1972.

McCaa, Robert and Heather M. Mills. "Is education destroying indigenous languages in Chiapas?," in Anita Herzfeld (ed.) Native Language Resistance and Survival in the Americas, in press.

Murillo-Castaño, Gabriel. Violence and migration in Colombia. Washington, D.C.: Hemispheric Migration Project, Center for Immigration Policy and Refugee Assistance, Georgetown University, 1991.

Potter, Joseph E.; Ordoñez G., Myriam. "The Completeness of Enumeration in the 1973 Census of the Population of Columbia," Population Index, 42:3 (July, 1976), 377-403.

Pecaut, D. "Presente, pasado y futuro de la violencia en Colombia," Desarrollo Económico-Revista de Ciencias Sociales 36(144 Jan Mar 1997), 891-930.

Puyana, Yolanda. "El descenso de la fecundidad por estratos sociales," in Elssy Bonilla C. (comp.), Mujer y familia en Colombia, Bogotá: Plaza & Janes Editores, 1985, 177-204.

Rincón, Manuel. "Lineamientos metodológicos generales," en DANE, Censo 2000. Seminario Internacional. Colección de ponencias. Cartagena de Indias, Colombia. 26 al 31 de enero, 1998.

Ruggles, Steven. "The Demography of the Unrelated Individual, 1900-1950." Demography 25 (1988), 521-536.

Ruggles, Steven. "Comparability of the Public Use Files of the U.S. Census of Population, 1880-1980" Social Science History 15 (1991), 123-158.

Ruggles, Steven. "Integration of the Public Use Files of the U.S. Census of Population, 1880-1980." American Statistical Association Proceedings of the Social Statistics Section, (American Statistical Association). 1991, 365-370.

Ruggles, Steven. "The U.S. Public Use Microdata Files as a Source for the Study of Long-Term Social Change." IASSIST Quarterly 15 (1991), 20-27.

Ruggles, Steven. "The Origins of African-American Family Structure." American Sociological Review 59 (1994), 136-151.

Ruggles, Steven. "The Transformation of American Family Structure." American Historical Review, 99 (1994), 103-128.

Ruggles, Steven. "Sample Designs and Sampling Errors in the Public Use Microdata Samples." Historical Methods, 28 (1995), 40-46.

Ruggles, Steven. "Family Interrelationship Coding in the Integrated Public Use Microdata Series." Historical Methods, 28 (1995), 52-58.

Ruggles, Steven. "The Rise of Divorce and Separation in the United States, 1880-1990," Demography 34 (1997), 455-466.

Ruggles, Steven, J. David Hacker, and Matthew Sobek, "Order out of chaos: General design of the Integrated Public Use Microdata Series." Historical Methods 28 (1995), 33-39.

Ruggles, Steven and Russell R. Menard, "A Public Use Microdata Sample of the 1880 Census of Population." Historical Methods 23 (1990), 104-115.

Ruggles, Steven and Russell R. Menard. "The Minnesota Historical Census Projects." Historical Methods, 28 (1995), 6-10.

Ruggles, Steven and Matthew Sobek. Integrated Public Use Microdata Series: User’s Guide (Minneapolis: Historical Census Projects) 5 vols., 1998.

Ruggles, Steven; Sobek, Matthew Joseph; Gardner, Todd. "Distributing Large Historical Census Samples on the Internet," History and Computing 8:3 (1996), 144-159.

Ruíz, Magda; Rincón, Manuel. "Mortality from accidents and violence in Colombia," in Adult mortality in Latin America, edited by Ian M. Timéus, Juan Chackiel, and Lado Ruzicka. Oxford, England: Clarendon Press, 1996, 337-58.

Safa, Helen. "La mujer en America Latina: El impacto del cambio socio-económico," in Bustos and Palacios (comps.), El trabajo femenino en América Latina: los debates en la década de los noventa. Guadalajara: Univ. de Guadalajara, 1994, 27-47.

Sobek, Matthew Joseph. A Century of Work: Gender, Labor Force Participation, and Occupational Attainment in the United States, 1880-1990. unpublished Ph.D. thesis, University of Minnesota, 1997.

Vaupel, James W.; Yi, Zeng; Zhenglian, Wang. "A Multi-dimensional Model for Projecting Family Households—With an Illustrative Numerical Application," Mathematical Population Studies, 6:3(1997), 187-216.

Wainerman, Catalina; Recchini de Lattes, Zulma. El trabajo femenino en el banquillo de los acusados. La medición censal en América Latina. Mexico City: Terranova/Population Council, 1981.

Zamudio, Lucero; Rubiano, Norma. La nupcialidad en Colombia. Bogotá: Universidad Externado de Colombia, 1991.