Summary File 4 - 2000 Census

MCDC Filetype: sf42000

Note: Work in Progress!

Contents

Status & Holdings

Accessing the data

Technical Documentation


Status and Holdings

The Missouri files were released to the public on May 7,2003. The MCDC has downloaded and converted data for Missouri and Vermont (the later being our test state since it was small and among the first released.) These datasets are now available on the MCDC web site using uexplore.

Summary File 4 is being released now (May, 2003) on a flow basis and is scheduled to be available for all states by September of 2003. Because of the enormous size of these files the MCDC does not plan to process any other state files. We will probably do at least a partial processing of the SF4 national file when it becomes available in late summer, early fall.
To find out more about this data product see the Bureau's
SF4 web page. It includes a complete set of detailed table descriptions.


Accessing Summary File 4 Data

There are a number of ways to approach access to this extremely daunting collection of data. Some are a lot easier than others, and some provide a lot more power and control but have learning curves associated with them.

  • Use the American Fact Finder application at the Census Bureau. There are many ways to access these data there. A good way to start is by clicking on the 2000 Summary File 4 link in the Data Sets section. AFF provides access to detailed tables, "quick tables" extracts [not available anywere else, at least not on the sf4 files released by the Bureau], data maps, metadata with detailed information about the tables, etc. There is an online tutorial (under "Help") to help you get started with this software.

  • Use the uexplore utility to access sf4 datasets in the MCDC's sf42000 data directory. For a more detailed discussion of using uexplore to access the full sf42000 datasets, see below.

  • A critically important aspect of using SF4 data is the dramatic degree to which the data are suppressed. The Abstract from the Tech Doc describes the situation as follows:

    In Summary File 4, the sample data are presented in 213 population tables (matrices) and 110 housing tables, identified with ‘‘PCT’’ and ‘‘HCT,’’ respectively. Each table is iterated for 336 population groups: the total population, 132 race groups, 78 American Indian and Alaska Native tribe categories (reflecting 39 individual tribes), 39 Hispanic or Latino groups, and 86 ancestry groups. The presentation of SF 4 tables for any of the 336 population groups is subject to a population threshold. That is, if there are fewer than 100 people (100 percent count) in a specific population group in a specific geographic area, and there are fewer than 50 unweighted cases, their population and housing characteristics data are not available for that geographic area in SF 4. For the ancestry iterations, only the 50 unweighted cases test can be performed. [This last note translates into an effective threshold for ancestry groups that varies according to the sampling factor in each geographic area, but since the overall sampling factor was about 1 in 6 the average population threshold for an ancestry group is about 300 persons. The smallest total population we found for any ancestry summary in the state of Missouri was over 200.] .

    The MCDC has created a set of index reports that can help you quickly determine if there is any data available for the subpopulation / geographic area(s) in which you are interested. For example, if you wanted SF4 data regarding the Hispanic/Latino population by county for Missouri you would want to know which counties met the 100-persons threshold. If you go to http://mcdc2.missouri.edu/pub/data/sf42000/IndexReports/moChariterIndices/ and click on mohispanic you will get a report showing what geographic summary levels are available for the various Hispanic subgroups. The report would tell you, for example, that there are data for 57 counties in Missouri with summaries for the total Hispanic population. It would also tell you that if you wanted to get data for the detailed Hispanic subcategory Mexican that there are only two counties in Missouri with such data available.
    Another index report is http://mcdc2.missouri.edu/pub/data/sf42000/IndexReports/mogeoindex.pdf. This one is sorted by geographic area and tells you which "chariters" (population subcategories) are available for each area. You could use this report, for example, if you were interested in doing an analysis of a specific city or county and wanted to know what kind of detailed you would be able to find.



Technical Documentation

The Census Bureau has provided extensive technical documentation in the form of a 775-page pdf file. The
complete document can be accessed at from the Bureau's web site, or you can access the MCDC version where it has been partitioned so that each chapter is its own separate pdf file, with an index page for easier access.

For persons using the uexplore application to access the datasets as stored in the MCDC's data archive, the best codebook file to use (at least for the data cells) is the SAS labels file (see below.)


Questions about this page or or about Summary File 4 in general should be addressed to John Blodgett at OSEDA.