Now, Where Were We? Recovery, Digitization, and Parsing of Historical 1970 Address Records at the US Census Bureau

John Sullivan, Census Bureau
David Bleckley, University of Michigan

The Decennial Census Digitization and Linkage project (DCDL) will longitudinally link individual’s responses in the 1960-1990 decennial censuses. Supplementing information like name and age with a residential address is likely to improve linkage rates. However, address information is only available in a digital format for the 1990 decennial census. This paper discusses the address information used in the 1970 decennial census and our efforts to produce a digital version of this information for use in the DCDL project. In the 1970 decennial census, the Census Bureau created files called Address Coding Guides (ACG) to assign geographic codes to housing units. The Bureau used the ACG internally to assign various geographic variables to residential addresses, allowing for the creation of geography-specific population statistics. The ACG was not scheduled for long-term retention and shortly after the 1980 census, the Bureau’s copies of the ACG computer tapes and those held by the National Archives and Records Administration were destroyed according to the established disposition plan. Fortunately, a version of the ACG, was located on computer printouts at the Census Bureau’s National Processing Center (NPC). However, to make the information from the thousands of printed pages useable in the DCDL linkage process, the information needed to be converted from physical paper to a digital computer file, first, being scanned by NPC staff and then being processed by an optical character recognition (OCR) and parsing pipeline. We build on Lafia et al. (2023) to design and implement an OCR pipeline to produce a tabular data file from the physical print outs of the ACG records. We iterate through various workflows, measure their accuracy, and execute the optimal method to create the final tabular data file. This paper describes our efforts to locate the 1970 ACG and presents results from its recovery through our OCR pipeline.

No extended abstract or paper available

 Presented in Session 106. Data Infrastructure Resources Past and Present