This metadata pertains to the data set zcta_master dataset within the /pub/data/georef data directory of the MCDC data archive.
This dataset was created as a concatentation of the 51 state datasets stored in the zctams12 subdirectory of this (/pub/data/georef) data directory, from the setup (SAS code) in zcta_master12.sas in the Tools subdirectory. That conversion was originally done in early September, 2012. The geographic codes and area names identified in this dataset reflect (unless specifiically indicated otherwise) the then-latest codes. So we say that this ZCTA master dataset is "vintage 2012". There are four primary sources of data used to generate the dataset:
Note that sources 1 and 2 are also the basis of the mable12 datasets, which can be found in this data archive in the /pub/data/mable12 directory, and which are used as the source for the MCDC's MABLE/Geocorr12 web application.
state
zcta5
zipname
stab
sumlev
county
county2
placefp
placefp2
cousubfp
cousubfp2
cd111
cd111_2
puma2k
puma12
necta
nectadiv
cnecta
cbsa
metdiv
csa
cbsatype
pcturban
ua
uatype
fipco
fipco2
Same idea here for the secondary county code. fipco2 will appear in output files as the code associated with county2, with the latter
appearing as a county name.
cnty2k
div
reg
In this section the variables are all percentages that measure the degree of intersection of the State-ZCTA with various geographic codes. The variable name is always of the form pct<geographic-variable>, where geographic-var is the name of the geographic variable. So, for example, the variable pctcnty tells us what percentage of the ZCTAs 2010 population also lived within the county identified in variable cnty. Values are true percentages, not decimal fractions: "95.0", not ".950" . pctcnty
pctcnty2
pctcnty2k
pctcousub
pctplace
pctplace2
pctua
pctcd111
pctpuma2k
pctpuma12
pctnecta
pctnectadiv
pctcnecta
pctcbsa
pctcsa
pctmetdiv
pctcbsatype
Each of the variables in this section contain the name of an area whose code appears in the Geocodes section, above. The variable name here is just a concatentation of the geographic code variable (with the "fp" suffix dropped, if applicable) and "name". So, for
example, we have placename as the variable containing the name of the area whose code is stored in the placefp variable. placename
cbsaname
metdivname
csaname
nectaname
uaname
puma12name
When working with ZIP-based files (typically, address files of customers, patients, survey respondents, etc. -- we'll use the generic term "constituents" for entities associated with the ZIP codes) it can be very
helpful to characterize the constituents as to basic demographic or economic indicators. Are we deailing with persons living in areas that are very poor or very wealthy, where there are very many or very few Hispanics, with a very high or very low median age, with a
large group quarters population, etc. This kind of information is available from the Census Bureau's American Community Survey. In this section we report variables taken from a recent release of ACS key indicators. For the initial release of the dataset (September, 2012) we use data estimated by the MCDC by allocating census tract level data to ZCTAs. There are a small number of ZCTAs for which we found no matching data on the ACS data file. acsyears
esriid
intptlat
intptlon
landsqmi
areasqmi
totpop10
totpop00
totpopacs
medianage
pctunder18
pctover65
pctwhite1
pctblack1
pctasian1
pcthispanicpop
tothhs
medianhhinc
famhhs
medianfaminc
povuniverse
pctpoor
pctgrpquarters
pctincollege
pctbachelorsormore
pctforeignborn
tothus
occhus
pctrenterocc
medianhvalue
mediangrossrent
ziptotpop
nstates
psf
pctstate
FIPS state code. This code combined with the ZIP code (ZCTA5) are the keys for the dataset. In a few rare cases a ZCTA crosses state lines. In each of those cases there is a primary state where more than 95% of the ZCTA's population live, and then a "sliver" of the ZCTA that crosses into another state. The psf variable ("primary state flag") variable is set to a value of 1 to indicate if you are looking at a record for the primay state for the ZCTA. A value of 0 for psf means you are looking at the "sliver" portion that crosses into this state. All the geocodes in the observation (counties, places, CDs. etc) are within this state. See also stab, the state postal abbreviation vriable.
ZIP Census Tabulation Area (5-digit). We like to think of this as the "same thing" as a ZIP code but it really isn't. Close enough
for many/most applications though. These are proxies for residential ZIP codes as defined on Jan. 1, 2010 and "rounded" off to census blocks.
A name assigned to the ZIP code associated with the ZCTA. This name comes from the Snow directory file where it is the "City name" suggested by the USPS. This is the name that can be used as the last line of an address to get mail delivered to the ZIP. However, we
thing that name is not the best. ZIP code 63119 comes with a "city name" of "Saint Louis", which is close. This ZIP code is actually just east of the city of St. Louis and is almost entirely within the city of Webster Groves. Since we know what place (city) the ZIP (ZCTA) intersects with we are able to "improve" on the original name by assinging one base on the city (if any) that it intersects with. So you will find that the value of this variable for the 63119 observation is "Webster Groves, MO". If the ZCTA lies all or mostly within an unincorrated area (with no CDP assigned) then the original post-office city name is retatined.
State postal abbreviation. For example, "MO". Values are in upper case.
Geographic Summary Level. It will always be 871 to remind users that we are dealing with State-ZCTA summaries, not complete ZCTAs.
The county in which the ZCTA is all or mostly contained. Over 90% of ZCTAs fall entirely within a single county.
The "secondary" county for the ZCTA, i.e. the county which has the 2nd largest intersection with it. Over 90% of the time this
value will be blank.
The place (city) with which the ZCTA has the largest intersection. This can be an incorporated municipality or a CDP (Census Designated Place). It can also have a value of 99999 to indicate an unincoporated area with no CDP assigned. If a ZCTA is 70% unincoprorated and 30% within a city the value here will be 99999 and the city code will appear as the value of placefp2. "FP" in the
name is short for FIPS.
The place/city with which the ZCTA has the second greatest intersection. See placefp, above.
The county subdivision (township, town, minor civil divsion, census county division, etc. - what these entities are called varies by state) with which the ZCTA has the largest intersection. This is a FIPS code, not a name. The code is unique within state.
The county subdivision with which the ZCTA has the second largest intersection. Frequently blank.
The Congressional District for the 111th Congress, as elected in 2008 and effective on 1-1-2010. We have since had a 112th congress, elected in Nov. of 2010 and the states have all been reapportioned and redistricted for the 2012 election when we'll elect
the 113th congress. At the time we created this dataset we were not aware of a good national source of the new 2013 boundaries. So
this geocode is a little behind the times.
The secondary CD code for those ZCTAs that cross CD boundaries. Usually blank.
The Public Use Microsample Area as defined for use in the 2000 census PUMS datasets. This code is also being used to publish
American Community Survey data through the 2011 vintage data. Even though we now have new PUMA codes for "2010" the Bureau has not yet started using them in their ACS products, nor have they yet released any 2010 PUMS files using the new codes. So these "old" PUMA codes are still the most useful, as of September, 2012.
Public Use Microsample Areas for 2010. We use "12" instead of "10" because these new codes were not defined until 2012. See note re puma2k, above.
New England City and Town area.
New England City and Town area divsiion.
Combined New England City and Town area.
Core-Based Statistical Area. The new metro areas (since 2002 or so, replacing the old MSA/CMSA/PMSA system). There are 2 kinds of CBSAs: Metropolitan Statistical Areas and Micropolitan Statistical Areas. The former are based upon a metro core area of at least 50,000 population (latest census or official estimate), while the later (Micropolitan areas) have a core area of at least 10,000 (and less than 50,000). The codes appearing here were those in effect at the time of the 2010 census. They can and are updated throughout the decade but no changes have been published since December, 2009 (thru September, 2012). See http://www.census.gov/population/metro/data/metrodef.html for latest updates and explanations. Note that CBSAs are county-based - a county will never cross a CBSA boundary. (But a ZCTA can).
Metropolitan Divisions are subsets of CBSAs. Most CBSAs do not have subdivisions.
Combined Statistical Area. Sometimes adjacent CBSAs such as Baltimore and Washington, DC are grouped into these larger units.
The CBSA type variable is blank to indicate not within a CBSA or will have a value of "Metro" or "Micro" to indicate that the CBSA is either a Metropolitan or Micropolitan statistical area.
The formal definition or urban (vs. rural, all areas are one or the other) is updated each decade using the detailed population data gathered in the census. There is a lag in getting the new definitions published so that tables on originally published summary files do not have any data for the tables that are to report the urban/rural breakdown. We had to access the special TIGER system files that allowed us to classify each 2010 census block as either urban or rural for 2010. We then aggregated the block-level populations by U/R classification to derive this key measure.
The Urban Area code as defined for 2010. This 5-digit code can identify an Urbanized Area or an Urban Cluster. These are similar
areas, but are split into two types based on the size of the population cluster defined as the core of each UA. This code identifies
the urban area with which the ZCTA has the largest intersection. There is no secondary UA variable.
Contains a blank if
FIPS county code. Has the same value as County but we do not associate a format code with it so if you do a Dexter extract it will
appear in the output as a 5-character code rather than the name of the county.
This is the FIPS county code (3-digit) that was in effect in 2000.
Census Division
Census Region
Percentages
Name Variables
Name of the primary place (city) associated with the ZCTA.
Name of the CBSA (metropolitan or micropolitan area) associated with the ZCTA.
Name of the metropolitan division.
Name of the combined statistical area.
Name of the New England City and Town area.
Name of the Urbanized Area/ Urban Cluster.
Name of the "2012" PUMA. Names were assigned to PUMAs for the first time following the 2010 census. (Hence there is no
name variable for the puma2k variable.)
ZCTA Data from the American Community Survey
This is a constant that identifies the time period for the ACS data. The value is "2006-2010" in the initial release.
An identifier string intended to be used as a key to link data to ESRI shape files (i.e. the mapping files used in ArcInfo/ ArcGIS software.
Internal point latitude coordinate. (Estimated, for now).
Internal point longitude coordinate. Will be negative to indicate west longitude. (Estimated, for now).
Land Area Sq. Miles (estimate, for now).
Total Area Sq Mls (estimated, for now).
2010 Census Total Pop count
Total Population 2000 census. Estimated.
Total Pop ACS Period Estimate. For example, when acsyears="2006-2010" this is the average of the population estimate for the five years 2006, 2007,..., 2010. Just as all these other ACS-based indicators are period estimates.
Median age in years
% Under 18 years of Age
% 65 years and over
% White alone
% Black or African American
% Asian
% Hispanic or Latino of any race
Total households
Median household income
Family Households
Median family income
Persons for whom poverty status is determined
% Persons Below Poverty
% Living in Group Quarters
% In College or graduate school
% Bachelor degree or higher
% Foreign born
Total housing units
Occupied housing units
% Renter-occupied units
Median Home Value
Median Rent
Miscellaneous Variables
This is the ZCTA's 2010 total population across all states. Only of interest in those rare
cases where the ZCTA crosses state boundaries.
Almost always a value of 1 to indicate that the ZCTA intersects with only 1 state. A value of 2
says it intersects with two states.
Primary State Flag: a value of 1 says this is the primary state associated with the ZCTA. This means the
only state in 99+% of the cases. A value of 0 indicates a "sliver" case where the ZCTA has a small portion that
crosses a state line.
Pop2010 as % of ZipTotPop . Usually has a value of 100.