Notes Regarding the Public Law 94 Basic Trend Reports

Notice (April 21, 2001)

We have regenerated all reports in this directory in order to enhance their readability and usefulness. The major changes involved reformatting the "race interval" fields so that they are centered and have no extra blank spaces within them. We also modified the sort order so that the 2000 numbers are shown before the 1990 numbers. We also made some minor cosmetic enhancements to the column titles. Anyone using the Missouri School district csv files will find that they now include the DESE district code as well as the standard LEA code.

What and Where | Data Sources | Race Categories | Race Intervals | Population Groups | Data Columns | Sort Order | 1990 Data | Formats and Naming Conventions | Fields on CSV Files | Block level data, etc | Comparable 1990 data | Access data for entire US | Source Code | Credits


What and Where

The PL94 Basic Trend Reports are compact summary reports based on 1990 and 2000 Public Law 94-171 data. The collection created by the MCDC is at ./. We expect that most people will come to this page via the link on that one, but others may be sent directly here.

Data Sources

The data used to create these reports comes almost entirely from the Public Law 94-171 (aka "Redistricting") files as released by the U.S. Census Bureau. There are two editions of these files, one for 1990 and one for 2000. All data shown in these reports is identified by year and can be linked to one of the two decennial files. Some reports may only show data for a single year - usually when only one year is available. But the intent of the reports (as suggested by their name) is to provide data from the two censuses.

The one exception to the statement that all data comes from pl94 files is with the national level data for 1990. We did not have a complete collection of pl94 files for the U.S. So we used data from 1990 Summary Tape File 1C to generate the 1990 numbers for the U.S. level reports.

Getting the data for the upper limit counts (see discussion in following paragraph re race data intervals) from the 2000 Public Law files as distributed by the Bureau was not a simple matter of retrieving the count from a cell in one of the tables. These values had to be calculated by summing the set of 63-race-category cells that included the given race. Some people may try to say that the count of persons who were "white alone or in combination with other races, and non-hispanic and over 18" (for example) is not on PL94 for 2000. The fact is that while it was not explicitly reported, it was derivable.

The Race Categories

The thorniest issue that data providers and users have to deal with related to the PL94 data (and this is going to be with us for all 2000 data products) is how to deal with the change in race categories. In earlier decades the census questionnaire asked persons to check a box indicating their race. They were instructed to check only one such box. Everybody was either white or black or Japanese or American Indian, etc. But an OMB directive from 1997 has mandated that all federal surveys should now allow respondents to be able to specify multiple races. You can check as many of the race boxes as apply. For the sake of tabulating the data for PL94 there were 6 basic race categories (up from 5 basic categories in 1990; the "Asian and Pacific Islander" category used in 1990 has been split into two categories for 2000: "Hawaiian and Other Pacific Islander" and "Asian".) You can view the results of this new way of assigning race categories when you look at the size and complexity of the 4 "tables" on PL94.

While the additional race detail gathered by allowing the multiple responses is useful for someone studying racial mixing patterns, there is a real problem for most applications of the data. If it were a matter of a single table with counts of the 63 combinations it would be fine. But these 63 categories are not just a single table in the census tabulations. Many census tables (including all of them on PL94) use race as a table dimension. So we have tables of "Race by Age" and "Race by Sex" and "Median Household Income by Race of Householder", etc. Do we really want all of these tables to give us the full 63-category detail? Is more data better? Not in most cases. The data need to be further collapsed to make them useful for most applications.

Race "Intervals"

One way that has been suggested (initially by the Census Bureau, I believe) to deal with the new complexity was to summarize race data by looking at two basic counts associated with a given basic racial group. If you want to look at data for the Asian racial category (for example) you would get data for all persons who indicated that they were Asian and no other race, i.e. "Asian alone". The companion category would be all persons who checked the "Asian" box on their form, i.e. "Asian alone or in combination with other races". These two counts represent an interval in which the value of "Asian persons" falls. The reason we have an interval instead of a simple count as before, is because now being "Asian" is not a simple yes/no question. You can be completely Asian ("Asian alone" - the lower limit of the interval) or you can be at least partly Asian ("Asian alone or in combination..." - the upper limit of the interval.) Of course, you can also be in the partly Asian category; these are the people who make up the interval.

In producing products based on the PL94 data, the MCDC has decided to focus primarily on the use of these racial intervals for report race-stratified data. It turns out that this was not exactly a trivial thing to do. If you study the 288 cells of the 4 tables on the PL94 files as distributed by the Census Bureau you will not find any explicit counts of such things. To determine the number of persons who are "Asian alone or in combination.." you have to identify and sum the 32 cells in Table PL1 that indicate persons who are all or partly Asian. Trying to then determine how many of this total is of voting age and/or of Hispanic origin requires even more careful study and aggregation of data in the other PL tables. The process is tedious (even by our standards) and error prone. But we felt it was worth the effort, so we created our own transformed extract files where we created exlicit counts that were based on the "racial interval" approach.

Population Groups

Each line of data in the reports is identified by a geographic area, a year and a "population group". The first two categories are assumed to be pretty obvious. The population groups are mostly straightforward except for the race-based groups. This is where we run into the problems created by allowing respondents to the 2000 census to check multiple races. This results in their being no simple category as "race=white" or "race=black". We now have people who indicated they are both white and black; or perhaps they are white, black and American Indian. There are 63 such racial categories, comprised of various combinations of up to 6 basic race categories.

Our approach to reporting data here is to show intervals of data for the race-based categories for the 2000 data. The first number in these invervals (which we refer to as the "lower limit") is the count of persons who reported that race and that race only. The second number ("upper limit") is the number of persons who selected that race - "alone or in combination with other races". Which means they checked the box for that race on their form - period. Whether or not they checked any others is irrelevant when calculating the upper limit. The bottom line is that if respondents had been required to check only one racial category in 2000 instead of being able to choose several, the count of persons we would have obtained for each race category would have been somewhere in this interval. The good news for those wanting to look at racial trends is that these intervals are generally very small. In fact, they are so small, that the software that generates these reports checks to see if the lower and upper limits are identical and will print just a single value if it detects such a case. This will usually only happen for relatively small geographic areas.

Data Columns

There are really just two fundamental data items ("facts", as opposed to "dimensions") for each line of these reports. These are the counts of all persons (in the Population Subgroup and geographic area for the year), and the count of persons over 18 (voting age). Percentages of the total and voting age population are also given. Where the counts are intervals, so are the percentages -- we just calculate the percentage figures based on the interval limits.

Sort Order

The files are sorted by geography, then time (year) and then population subgroup. The code could be easily tweaked to give other orders (some might prefer, for example, to have it sorted by time before population group to facilitate seeing trends.) Some reports have geographic subcategories (usually counties) as the major sort field. These will print as subheadings instead of as part of the "Geographic Area" field.

1990 Data

The reports usually display comparable data from the 1990 census. These data have been extracted from 1990 PL94-171 files (although the same values could be extracted from 1990 Summary Tape File 1 files and we may do this for future products.) For geographic units such as counties and places (cities) the data are reported using the geographic definition of the entity at the time of the 1990 census. Thus data for O'Fallon, Mo for 1990 is as reported at the time of the 1990 census using the then-current boundaries of that city. For census tract levels (and below) we have used geographic correspondence files to allocate the 1990 data to 2000 geographic entities so that the data will be for comparable geographic entities. Note that 1990 data for small area geographies will not always be included. This is because we do not have the geographic equivalency files for all states and have therefore not been able to generate the corresponding 1990 counts.

We have placed the 1990 and 2000 data in separate csv files for convenience. The geocode field can be used as the key to link corresponding rows from the two files. When browsing the index if you do not see a 1990.csv file for a geographic universe/summary level (e.g. "Kansas_Tracts") then you can expect that there will be no 1990 data displayed in the corresponding report files.

Report Formats and Naming Conventions

The reports are usually generated in 3 display formats: plain text ("txt" file extension), html and pdf (Adobe print document format). In addition, for each report the data used to create the report is included in the directory as comma-delimited files, one for each decade. The file extension for these data files is "csv" (comma separated values, a Windows standard extension.) Many or most browsers today are configured to handle such files with a special desktop application such as Excel.

Note that the html files can be quite large and therefore may take some time to load (the file sizes are provided on the index page and should be considered when selecting files to browse.) The report titles will appear immediately, but there can be a long pause before the table itself displays. For fast access, use either the txt or pdf versions. Some reports are not available in html format, since their size would prevent many browsers from being able to load them. Also, the HTML formats display the column headers only at the beginning of the report, while the txt and pdf versions repeat them at the top of each page.

All files for the same report will have the same basic filename and just different extensions. The basic filenames are intended to be as desriptive as possible. They will generally be made up of the geographic universe name, and underscore, and the geographic summary level. For example, "Kansas_Counties" will be a report displaying data for the counties in the state of Kansas; "Missouri_Tracts" will have tract level data for the state of Missouri, etc.

Fields (Variables) on CSV Files

The files with ".csv" extensions are plain ascii "comma delimited" file ("csv" stands for "comma separated value" and is a standard Microsoft file extension.) The first line of these files contain the names of the fields (variables). If you open this file from most spreadsheet programs (including Excel) the first row is imported as-is and results in a column headers row to label the data. For other applications, such as SAS, you may need to specify to the program to ignore the first line of data. (Use the "firstobs=2" option on the infile statement to accomplish this in SAS.) The variables are arrayed within the csv records in more or less the same left-to-right order they appear in the report. Field names are intended to be fairly obvious but there are some exceptions.

The key naming convention has to do with variables related to race counts. For 2000 we use "race intervals" (see ./Notes.html#raceints">above) rather than a single count for each category. The variables black1 and black2 represent the lower and upper limits of the interval for black persons. Thus, black1 is the count of persons indicating there are black alone, and black2 is the count of persons reporting black alone or in combination with other races. Similarly, the variables white1 and white2 represent the lower and upper limits of the white race interval, etc. For the variables representing persons over 18 by race the names are formed by using "over18", a 2-character race abbreviation and the numeric suffix indicating lower or upper limit. Thus we have variables ovr18as1 and ovr18as2 representing the lower and upper limits of the interval for Asian persons over 18.

For the 1990 file we have used a naming convention consistent with the 2000 files, wherein we use the numeric suffix "1" for the single race count fields. So the count of white persons in 1990 is White1. There is no White2 variable for 1990, because there are no intervals prior to 2000 since persons were only allowed to choose a single race then.

A complete list of the variables on the csv files and their labels follows:

Field Meaning
state FIPS code.
GeoCode Varies with geographic level but contains the codes necessary to uniquely identify the geographic entity. This field can be thought of as the key that identifies the row.
desedist This special code appears only on the two Missouri Schools files. It contains the 6-digit school district code used by the Missouri Dept of Elementary and Secondary Education. The value of geocode on these schools file is the official federal school code ("LEA code") which is what is stored in the school code field in TIGER and the PL94 files.
SumLev Geographic Summary Level code Usually a constant within a file
AreaName Name of the geographic area
County May not always be present but when it is, contains the name of the county.
geo_id Ignore it. A numeric key linking to a master geographic reference file (2000 geography only) maintained by the MCDC.
TotPop Total Population
White1 White alone
White2 White alone or in combination
Black1 Black alone
Black2 Black alone or in combination
Indian1 American Indian or Native Alaskan alone
Indian2 American Indian or A.N. alone or incombination
Asian1 Asian alone
Asian2 Asian alone or in combination
HawnPI1 Hawaiian or Pacific Islander alone
HawnPI2 Hawaiian or PI alone or in combination
Other1 Some Other race alone
Other2 Some other race alone or in combination
Whitenh1 White alone, Non Hispanic
Whitenh2 White alone or in combination, Non Hispanic
HispPop Total Hispanic/Latino Population
Over18 Total population over age 18
WhOvr181 White alone, over 18
WhOvr182 White alone or in combination, over 18
BlOvr181 Black alone, over 18
BlOvr182 Black alone or in combination, over 18
InOvr181 AIAN alone, over 18
InOvr182 AIAN alone or in combination, over 18
AsOvr181 Asian alone, over 18
AsOvr182 Asian alone or in combination, over 18
HaOvr181 Hawaiian or PI alone, over 18
HaOvr182 Hawaiian or PI alone or in combination, over 18
OtOvr181 Other race alone, over 18
OtOvr182 Other race alone or in combination, over 18
WNOvr181 White Non-hisp alone, over 18
WNOvr182 White Non-hisp alone or in combination, over 18
HisOvr18 Hispanic Pop Over 18
MultRace Multi Racial (Total persons checking more than 1 race)
MROvr18 Multi Racial Over 18
LogRecNo Logical record number - links to original PL94 file record #
AreaLand Land Area in Square Meters
AreaWatr Water Area Sq Meters
Pop100 100 pct Population Count
IntPtLat Internal point latitude coordinate in decimal degrees.
IntPtLon Internal point longitude coordinate in decimal degress.
LandSQMI Land Area Square Miles
AreaSQMI Total Area Sq Miles

Block Level Data, ETC

The trend reports are just formatted presentations of selected data that we keep in the MCDC data archive. We provide a tool that permits users (with web access) to extract data from the data sets in the archive. The name of the application that does this is uexplore. This application is modeled after and works somewhat similar to the Windows Explore application that lets you navigate amongst the files and directories on your PC's hard drive. In order for you to access the data relevant to these reports you need to know the URL for accessing the cgi-bin application, uexplore. You also need to know what directories to look in for what files. The URL for uexplore is http://mcdc.missouri.edu/applications/uexplore1.html, but this really just takes you to an introductory page, not the actual application. To get there you have to make selections. Choose "Population" from the little "Go to" menu at the top, and then click on the link to "pl942000". The next page shows you all available data sets with pl942000 data in them. They mostly all have the same variables (see Metadata.html for a list of variables and labels) but differ in terms of geographic universe and units summarized. There are an amazing number of possible combinations. Readers of this page from Missouri will most likely be interested in the very small-area geographies such as census blocks and block groups. The data sets containing summaries at this level are called mobgs150.ssd01 and moblks.ssv01. To access these (or any other data set you see listed on the "Contents" page) just click on its name. The next page lets you choose between 3 possible applications; take the default (the xtract application). The xtract subapplication will permit you to create extracts from the data set you have chosen. There is an on-line tutorial for uexplore and for xtract that you might want to look at before going any further (there are links to it throughout the applications.) You will be able to get your extract in any of 5 common data formats, including comma separated value (csv) and dbf. The csv format is readily imported into most spreadsheet programs, including Excel.

Comparable Data for 1990

Interested in seeing trends for any of these geographic areas? Then you just need to redirect the uexplore application to another, related, directory. If you go back to the page you came in on (uexplore1.html) and select the category "pl9490" you will go to the directory of data sets containing the 1990 Public Law 94 data. The variables and the geography are different for 1990 than for 2000. But if you click on the subdirectory pl9490tx it will take you to a special collection of data sets that are related to these pl94trend reports. (The "tx" stands for trend extract). These data sets contain 1990 data, but that data has been specially processed to make it compatible with the 2000 pl94 data. For example, you will see in this directory a data set named moblks00.ssd01. This is one of the most valuable data sets in the entire archive. It has data that has been allocated/aggregated to the new 2000 census block geography using a special 1990 to 2000 census block equivalency file. Most blocks in 1990 went entirely to one block in 2000; but for those that did not we were able to estimate the portion of the 1990 block going to each intersecting 2000 block and to allocate the '90 data to the 2000 blocks accordingly. If you retrieve data from this data set, you should be able to match it up with data from the pl942000 directory, data set moblks.ssv01. Similarly, we provide the data sets mobgs00.ssd01 and motrs00.ssd01, containing complete block group and tract level summaries of 1990 data but summarized for 2000 census geographic units. These should match up 1-to-1 with the pl932000 data sets mobgs150.ssd01 and motracts.ssv01 to permit running trend analysis at the block group and tract levels.

Access Data for Entire U.S.

You can now access the pl94 data for any geographic level for any state in the country. It requires running an experimental version of the MCDC's uexplore/xtract software. You need to start the application using the URL http://mcdc.missouri.edu/cgi-bin/uexplore?/pub/data/@secure. This will present you with an index page showing all the datasets that the MCDC has created related to the 2000 PL94-171 files. The ones you will be interested in are those that begin with "us" or that are of the form "xxsums.sas7bvew". The former are national collections of data (e.g. usstates, uscntys, usplaces), while the latter are the detailed data for individual states. For example, the data set casums.sas7bvew has a complete collection of data for the state of California. You will typically want to create an extract by filering on the geographic summary level variable (SUMLEV) and on some geographic code such as county or place. To find out what the SUMLEV codes are you can browse the Technical Documentation for the pl94-171 files at the Census Bureau (specifically, follow the bookmark to Chapter 4, Summary Level Sequence Chart.) The MCDC did not keep summaries for the levels 710, 720 or 730 (because they can be readily aggregated from data at the 740 summary level.)
You can also contact the MCDC about a custom extract. We can pull data for anywhere in the country and put it into just about any format you might need. A fee will be charged for custom programming.

Source Code

These reports were generated using the SAS software package, Version 8.1. The code for generating the reports can be viewed via the web at http://mcdc.missouri.edu/data/pl942000/Tools/. This will take you to a directory listing. The pertinent modules are basic_trend_report.sas for an example of code that writes these reports using the macro in pl94trnd.sas, and cnvt1state.sas for the code that converts the original Census Bureau data files and creates the SAS data set(s) (for 2000) used as input.

The code used to aggregate 1990 data to create the pl9490tx data sets which are the source of the 1990 data displayed in the reports were created using setups in http://mcdc.missouri.edu/data/pl9490/Tools/. There are quite a few aggregation setups involved here and in the pl942000 Tools directory. One of the critical software tools used in that aggregation is our aggregation utility macro stored in http://mcdc.missouri.edu/pub/sasmacro/agg.sas.

Credits

The SAS code used to create these report was written by John Blodgett of the Office of Social and Economic Data Analysis, part of University Outreach and Extension, located at the University of Missouri in Columbia. The work was done under a contract with the Missouri Census Data Center. Questions and comments can be addressed to the author (see below).