Summary File 1 — Standard Extract, 2000 Census
This is the standard 225 (+/-) variable extract
from the full 8000+ - variable full SF12000 filetype. Where sf12000
is made up of large, complex tables, sf12000x is made up of a collection of key indicator variables. The variables on
sf12000 have names like p18i5 while the names on this collection are mnemonic like age5_9 and pct_vacant.
Summary File 1 is the first-published and most often used complete count summary product from the 2000 decennial census.
As of now (Sept. 2001) this file is also the most detailed of any data file based on the 2000 census that has
been released. This will change later this year when SF2 (Summary File 2) is released, although the additional detail of that
set of tables will be mostly of interest to those wanting to do in depth studies of racial or ancestral groups. For most data
needs, SF1 will continue to the most practical source of information.
If you are not familiar with the general concepts behind Summary File 1, we suggest you look at our
Readme file for our sf12000 filetype or at the
technical documentation pointed to from that page.
The good thing about the sf12000 files is that they contain some very detailed tabulations for a lot of different geographic
summary levels, all the way down to the census block. The bad thing about this is that the complete files are very complex
and can be difficult to use. The information is there, but it often needs to be extracted and simplified for common uses.
The purpose of these standard extract versions is to boil down the information to something that we hope most data users
will find adequate for a great many applications.
Alternative Data Sources
For users needing more detail than what is contained in these extracts, the alternative source is the data in the
full sf1 directory
For a similar file of key indicators, there are the datasets in the
, but these
are limited to governmental units (i.e., no summaries for anything more than states, counties and cities.) (On the other hand,
we have that data for every state, county and city over a certain size in the whole country in a single data set.) It should
be obvious that we modeled our standard extract on these Bureau-defined standard demographic profile sets.
Data Files: What Goes Where
If you have looked at the contents of the full sf12000 data directory, you will be familiar with the way that data was
organized and broken down into multiple datasets for a geographic area. Different kinds of tables were stored in different
datasets because some tables did not have any data below the level of census tract. The sf12000x sets are much simpler. In
general, there are only a few data sets for each state, and you do not have to look in more than one data set to get all the
extract variables available for a given geographic area.
As with most of the files in our data collection, the first 2 characters of the file/dataset name are the postal
abbreviation of a state (or "us" for a national file). The rest of the name indicates what geographic entities are summarized
within the dataset. We have a basic set of three such datasets per state.
For Missouri we have the following data files:
- moi.sas7bdat : the "i" stands for "inventory". In the census terminology, an inventory summary is one for a complete
geographic area, such as a census tract or a block group. The inventory file for Missouri has summaries for the state 040),
counties (050), county subdivisions (townships - 060), complete places (cities - 160), places within counties (155), census tracts (140), block groups (150),
congressional districts (106th congress - 500), MSA's (Metropolitan Statistical Areas - as defined at the time of the 2000 census -390), and 5-digit ZCTA's -- both complete
and within county (871 and 881). The 3-digit codes in the previous sentence are the summary level codes that you can
use to extract data for just those geographic types. For example, to select census tract data you should specify on the Filter Specifications
and Sort Criteria page of the uexplore/xtract application:
SumLev Equal to(=) 140
or, to select county, tract and block group summaries for Greene County (county FIPs code 29077) your filter would be:
SumLev In List 050:140:150
County Equal to(=) 29077
- moh.sas7bdat: the "h" stands for "hierarchal". While there is generally more interest in data by complete census
tracts than there is for the portion of a tract within a specific community (township and/or place), for certain users
and applications the finer hierarchal geography may be needed or preferred. So we keep these, but on a separate dataset
because there are a great number of them. The summary levels we keep on the moh dataset include 070, 080, 091 and 158. To see what
these are about see the Summary Level Sequence Chart
from the SF1 Technical Documentation.
Typically, you would use one of these hierarchal levels if you wanted data to be restricted to the portion within
a given place. So, for example, to extract data at the block group level (split by township) for the city of Ashland, MO. you
would extract from this moh data set and your filter would be:
SumLev Equal to(=) 091
PlaceFP Equal to(=) 02242
You can look up the FIPS place codes for Missouri in various places, including the pages used in our
Summary File 1 Profiles at
URL geographic codes lookup for Missouri.
Since Ashland has fewer than 10,000 people you need to scroll down to the end of the Mo places and click on "Other MO Cities (less than 10K)".
(Change "places" to "counties" in this URL to view the FIPS county codes. These pages have codes for the entire U.S.)
- moblks.sas7bdat: There are over 279,300 geographic areas summarized on the full SF1 file for the state of Missouri.
Of these 261,992 -- 93.8% -- are census blocks summaries. So it just makes a lot of sense to put these on a separate
file (just as it probably would have made sense to do this on the full sf12000 datasets, but that would have created more
complexity in exchange for some efficiency in access and we opted for making it easier for us rather than the computer.)
When extracting from the moblks data set a good thing to keep in mind is that 35% of all blocks in Missouri have no
population. Thus, for many applications, you may want to ignore these empty areas. The xtract application makes it easy
to do this filtering by simply specifying that you want the variable TotPop to have a value greater than 0. A query to
select all census blocks in Adair county that have some population would look like:
TotPop Greater Than (>) 0
County Equal to(=) 29001
You need to be careful when extracting from this data set, since it is quite easy to create extracts that are simply too
large to process. Filtering is critical, as well as being specific about what variables you want to keep.
- us.sas7bdat: This is what the Bureau calls the SF1 "Advance National File", as released
in December, 2001. ("Advance" as distinguished from the "Final" version, which will have Urbanized
Areas and urban/rural pop and housing counts which this Advance file does not.) Here is where you go
when looking for data for the U.S. (totals - SumLev=010), or for any state(040), county(050), city(160), township(060), metropolitan area(380,381,385,386),
or ZIP code(860) in the entire United States.
There are two standard data product reports that use these data. The dp1_2k (Demographic Profile 1, 2000 Census) data product
is accessible from a variety of locations on the web, with the main page at
There are also a series of geographic comparison reports that display smaller amounts of these data items but for a
collection of geographic units within a geographic universe (state or county).
to see what these look like. Currently (Dec. 2001), we have these just for Missouri.
A closely related report is dp1_2kt, where the "t" stands for "trend". To access a trend report for an
area you can follow the menu pages to the dp1_2k report and then the link at the bottom of that report page which will take
you to the trend report (not available for all geographic summary levels or areas).
Displaying Block Level Reports
There are some who think it would be better if block level data could not be viewed online. We understand the reasons for
this, but we also recognize that there are sometimes legitimate needs to take a quick look at data at the block level. You
can do this with our dp1_2k application module, but only if you are willing and able to modify the URLs yourself -- they are
not on any menus. The URL to display the report for block 1002 in Boone county, census tract 0001.00 is
If you follow the menus and go to the report for the census tract here the URL would be the same, except that the last part
- "&bl=1002" would not be there. So the easy way to do it is to generate a report for a tract or block group, then edit the URL
to get data for a block within the area.
The 1990 Population Variables
While these data sets primarily contain data collected in the 2000 Census, we have made one rather important exception to that
rule, at least for Missouri data sets. We have attempted (and, in nearly all cases, succeeded) to link a population count for
each area as reported in the 1990 Census on STF1. The 1990 population count is stored in TotPop90, and we then caculated the
change and percent change variables, Change and PctChange, by subtracting the 1990 figure from the 2000 count. The 1990 population
figure is unadjusted -- it is as reported in the original 1990 census summary files/reports. Also, in the case
of political jurisdictions, we use the population of the area as it was defined in 1990. Thus, if the city of O'Fallon
annexed areas that contained 15,000 people in 1990 but were not then part of the city, the figure we report does not include
those 15,000 people. (In other data sets, such as the intercensal population estimates files from the Bureau, this is not
handled this way; instead they use adjusted figures and attempt to adjust the older numbers so that they pertain to the current
For smaller geographic units -- especially census tract, block group and block -- we have gone to some trouble to make sure
that the area being summarized has the same spatial definition -- the current, 2000 definition. In many cases, tracts, block
groups and blocks are totally redefined across the decade so it is difficult to get comparable data, to say what the change
was in a given neighborhood. But we have used a geographic equivalency file to help us create data sets where 1990 census data
has been retabulated into 2000 geographic units. These data sets are stored in the stf901 and stf901x2 filetype directories.
Data sets within these directories whose names end with "00" are those that have been allocated to 2000 geography. Thus, for
example, the data set motrs00.sas7bdat in the /pub/data/stf901x2 subdirectory contains data extracted from the original STF1
file from 1990 for Missouri, but those data have been retabulated so that they contain summaries for the 2000 census tract
geographic units. It is from these data sets, that we took the 1990 population figures that are now stored as part of these
sf12000x extracts. This has only been done for Missouri (we have no geographic equivalency files for other states, and building
such a file is not trivial.) For Illinois and Kansas, we did match against the original 1990 STF1 files and where a tract or block
group code matched, we assumed it was basically the same area and inserted the 1990 population count. However, this is not
as reliable as we had at first thought. We did it this way for Missouri earlier, before we had completed doing the reaggregation
of our 1990 data to 2000 geography. We were somewhat dismayed to discover that there were tracts in Boone county which had
not changed their codes, but which had dramatically changed the land area they represented. This is against Census Bureau
guidelines given to local tract boundary committees, but obviously they were only guidelines. Bottom line is that we cannot
say just how reliable these 1990 population counts may be for other states. Where a new tract or block group was created, there
will be a missing value for the 1990 population. This will always be the case at the census block level, since all census
block codes are new for 2000.
Want to see how we did it? You have to read SAS (it's much easier to read than to write), but you can view the source code used
to define these variables right here. We always keep the code in the Tools directory within the filetype directory. In this
case the file to look at is sf12000x.sas
. Here is a
sample of what you will see there:
*--households by type--;
femalehouseholder=p18i14; *-female headed family non-mc hhs;
fem_childunder18=p18i15; *--female headed hh no husband w kids;
From this you should be able to determine that the way we calculated the number of families with own children < 18 was by
summing the 8th, 12th and 15th elements of table p18. On the other hand, most of the other items here were already in
the original sf1 tables and we simply copied their values and gave them more mnemonic names. In order for this code to
make sense, of course, you need to know what in the world "p18i8" is. That is documented reasonably well in a file called
in the sf12000/Tools
This is one of the more important and frequently accessed data directories in the archive. It contains an extract of about 225 variables,
derived from over 8000 table cells on the full SF1 summary record (only a small fraction of those 8000 cells were used in the
extraction, of course.) It is our belief that a majority of data users will be able to answer whatever questions they need
answered from the SF1 data by using these data, without having to deal with the considerably more complex full SF1 collection.
We feel that the addition of a 1990 population count with derived change measures represents an important enhancement to the
utility of the data.