General Information
These estimates were commissioned by the National Center for Health Statistics and generated for them by the U.S. Bureau of the Census. They are basically alternative versions of the estimates done by the Bureau in the "casrh" series - county, age, sex, race and hispanic origin. These are annual population estimates at the state and county levels for the 4 demographic categories mentioned (age, sex, race and hispanic origin). These numbers have been generated for the years 1990 through (at the time of this writing) 2005, with new values being generated with an approximate 1-year lag. (For example, the estimates for July 1, 2006 should be available circa August, 2007.) These estimates differ from the standard casrh estimates in 2 critical ways:
- The race categories are very different. The "bridged race" categories used on these files are:
. There is no separating the Asians from the Native Hawaaians or other Pacific Islanders, and there is no "Other" race category -- these have all been assigned to one of the other 4 categories. And, perhaps most significantly, there are no multi-race categories. Using bridging techniques all persons who indicated they were of multiple races were re-assigned to a single race group. Detailed methodology is available from the NCHS web site.
- White
- Black or African American
- American Indian, Eskimo, Aleut
- Asian & Pacific Islander.
- While the commonly-available numbers in the casrh series use 5-year age cohort categories, these estimates are for single years of age, except for the 85 and over category.
These numbers are derived from the same basic source as the other official Census Bureau population estimates, and where comparable demographic categories are used, the numbers should match. (For example, if you sum all the estimates for hispanic persons for a given county across the Age categories 0 through 4, you should get the same number that appears for that 0-4 cohort, hispanic, for that county.) Detailed methodology descriptions are available on both our web site (see DocumentationBridgedPostcenV2004.doc file in this directory) or at the official NCHS web page associated with these numbers.
In addition to the estimates from the 1990's and post-2000 time periods, we also have data from the NCHS web site that contains bridged race summaries based on the 2000 census data. We stored all this information in a single national dataset, that looks just like one of the estimates datasets, except that it has only a single numeric population count rather than a time series of estimates. A summary version of this 2k census dataset has also been created. The datasets are named usbridged2kcen and usbridged2k_sumry.
The Datasets
Each year we download a compressed file from the NCHS web site containing within it a huge txt file with the estimates for every county in the U.S. over the entire post-2000 time period for which the estimates are available. We run SAS conversion setups to (re)create a pair of datasets per state. They are as follows:
- A direct transcription of the raw input file. Each observation here represents a set of July 1 estimates starting with 2000 and going through the latest-available year (currently 2005) for a specific county, single year of Age, race, sex and hispanic origin. View the sample listing of the first 200 observations of the Missouri nchsbridged dataset. Note that it starts right out with data for the first county (Adair) in the state and has 16 rows/observations for each value of Age as it cycles through the 4 values of Race, the 2 values of Sex, and 2 values of Hispanic. The dataset has only 11 variables but a great many rows. It is summarized data but the detail is such that it almost resembles microdata.
- The second dataset is a direct derivative of the first and contains summaries and restructuring of the raw data. It has the same name as the first dataset but with _sumry appended; so the two datasets for California are canchsbridged20xx and canchsbridged20xx_sumry. We have placed a sample listing of part of one of our _sumry datasets in the nchsbri directory. The _sumry dataset has fewer rows and more variables than the original dataset. Important distinctions include
- It contains summaries at the state level, aggregated up from county level data. A SumLev variable has been added to the _sumry dataset, and it takes on values of
040to indicate a state level summary and050to indicate a county level summary.- The category variables Sex, Race and Hispanic are gone; these 3 dimensions are now represented by variables Total, Male, Female, White, WhiteNH (white and non-hispanic), ...., Hispanic and NonHispanic. The variable Hispanic has gone from being a single-character category code ('1' just meant non-hispanic) to being a numeric variable giving us the count of hispanic persons. Note that in creating these summaries we lose cross-category detail: you cannot use the _sumry dataset to get age or race crossed with sex; the only crossing categories here are the 4 race by non-hispanic categories. The critical exception to this rule is Age, which remains a categorical row-identifier variable. Thus, you can get age crossed with any of the other demographic items from this dataset.
- Where the original dataset has the time dimension going across the row with a different variable for each year, in the _sumry dataset things have been transposed so that there is a category variable Year and each row represents data for the specified year.
- The Age_cohort1 and Age_cohort2 category variables have been added to the rows. Age_cohort1 takes on the 5 distinct values 00, 18, 25, 45 and 65 corresponding to the 5 broad age categories 0-17, 18-24, 25-44, 45-64 and 65+. The values of Age_cohort2 take on the 2-character lower limit of the (18) 5-year cohorts as used on the casrh datasets. So, for example, when the value of Age (indicating the single year of age value) is between 25 and 29 the value of Age_cohort2 is 25. These additional category variables are provided to make it possible to get data aggregated to these levels by using the aggregation feature of the Advanced Options section in Dexter.
- For each combination of geography and year there is a summary row/observation where all the Age variables (i.e., Age, Age_cohort1 and Age_cohort2) are blank. These rows are summaries across all ages. So if you want to find out the total number of hispanic persons by county for your state in 2004, code a filter (in Dexter) specifying that you want
(enter the single underscore to indicate a blank value). Then be sure to keep the variable Hispanic, which will contain the estimated count of hispanic persons, regardless of age, for the geographic area indicated by SumLev and County for the year indicated by Year.
- Year Equals 2004
AND
- Age Equals _
Code Values
These files use category codes that you need to know to interpret the data. These include both custom demographic category codes as well as standard FIPs geographic codes. Here are variables, the codes used and their meanings.
- Age: 00=Less than a year old 01=1 year old ... 84=84 years old 85='85 and over' .
Note that these are stored as 2-byte character strings, not as numerics. Age can have a blank value on _sumry datasets to indicate a summary across all ages.- Age_cohort1 and Age_cohort2 (_sumry datasets only): See explanation above.
- Sex: 1=Male 2=Female
- Race: 1=White 2=Black or African American 3=American Indian, Eskimo or Aleut 4=Asian or Pacific Islander .
- Hispanic: 1=Non Hispanic 2=Hispanic .
Note: On the _sumry datasets the variable Hispanic is a numeric count of hispanic persons rather than a category variable.- Sumlev (_sumry datasets only): 040=State 050=County .
- County : these are 5-character FIPS county codes. You can view these codes for any state by going to the Cure for the Common Codes home page and clicking on the state. Note that there is no format associated with this code so when you do an extract you will get the code instead of the name. If you prefer to get the county name you can specify that the (MCDC custom) format code $county be associated with this variable by entering
county $county.
in the Format text box in Section V.c of the Dexter query form.Access the Data Via Uexplore/Dexter
Access the data in the /pub/data/popests/nchsbri data directory.
This file last modified Tuesday October 31, 2006, 10:40:47
|