|
|
Recent Updates to the MCDC / OSEDA Public Data Archive
(To be Accessed Using the Uexplore/Dexter Web Software)
Rev. 05/14/2009
|
- 06/24/09 Substantial progress has been made on restoring Summary File 3 complete table files in the sf32000 data directory. We now have a whole new array of national data sets at various geographic levels including MCD/county subdivision, ZIP code (ZCTA), Urbanized Area (incl Urban Clusters), metro area (old pre-CBSA entities, including NECTAs), and places (of any size). We are also working on an experimental web app that will permit easy access to and display of the tables on these data sets. Watch for an announcement of this new data access tool on our home page some time in July.
- 06/04/09 Added the file laus_meric_mo_032009.csv to the bls_la directory (local area employment data from the BLS,
downloaded from the Meric web site).
The file (which is designed to become an Excel file_ shows Missouri (un)employment stats for the month of March 2009 along with
data for the previous month and for the same month a year earlier. We converted it to create
laus_meric_mo_032009.sas7bdat, a SAS data set version of the above file. We only keep data for the latest month. SumLev and fipco geocodes added. Has summaries for the US, the state, counties, major cities and CBSAs.We downloaded a csv file
- 05/19/09 We re-aggregated Missouri county level data in the sf32000 data directory (complete sf3 tables) to create data set
sf32000.moregns06 with summaries at the rpc (Regional Planning Commission), ded (Dept of Econ Development regions), dot (MO Dept of Transportion planning districts), umx (U of Missouri extension regions), aaa (Area Agency on Aging regions) and 2006 CBSA (metropolitan and micropolitan statistical areas). Only 89 observations on this data set but lots of useful info. We already had the sf32000x version of these data (the standard extracts, accessible via our dp3_2k profiles application).
- 05/18/09 We re-converted the 2000 Census SF3 tables at the state legislative district levels as released by
the Census Bureau in 2007. See these in the slds2007 subdirectory of the sf32000 data directory (filetype).
- 05/14/09 We completed processing a new cycle of "casrh" (county age, sex, race and hispanic) population estimates from the Census Bureau. These data sets all go in the popests data directory and have names such as mocasrh08, ilcasrh08, etc. There is also a US states data set and a US 1-dimensional sums data set. This is a complex set of estimates. Nothing has changed from the approach we have been using for the last several years. The previous year's data (vintage 2007) were zipped and moved to the popests/archives folder.
- 05/13/09 We created a new filetype: gnis - the Geographic Names Information Systems data from the U.S. Geological Survey. There are rumours that the federal statistical world will be converting to using USGS codes to supercede use of FIPS 55 codes. So we decided to grab some of their data and be prepared. We have only converted the raw data for 3 states - Missouri, Illinois and Kansas - but we have the raw csv files for the entire U.S. and can easily convert any state on request. We have focused on the "Federal Codes" data, which are mostly place-type data. Not all of the entities recognized by the USGS are recognized by the Census.
- 04/28/09 The latest release of the BEA's Regional Economic Information System (REIS) data were downloaded and converted,
totally replacing the 510-data set national collection within the beareis filetype directory.
- 04/14/09 We processed a new annual cycle of Missouri taxable sales data in the taxsales directory. The two downloaded
txt files are copies of the reports displayed on the dor web site, and there are the two corresponding data sets (one at the county level and one with state summaries). Each row/observation summarizes sales for a SIC-based type-of-business category.
- 04/08/09 We added a ustracts data set in the sf32000x filetype. Has the standard SF3 profile extract for every census
tract in the entire U.S.
- 04/07/09 Significant updates were made to the acs2007 Contents file. We created a Readme.shtml page for this subcollection.
- 03/28/09: We updated modules in the mable07 data directory. This will be of very little interest for Dexter-based access, due to the complexity of the data and the lack of detailed metadata. But it is important to know of the update because these are the data used by the MABLE/Geocorr web utility application. The updates affect the CBSA and related geographic items and an update of the latest population estimate from 2006 to 2008.
- 03/19/09: The latest set of state and county and CBSA population estimates thru 2008 from the Census Bureau were processed to
create several new data sets in the popests data directory. New data sets include uscom08, mocom08, mocomregns08, and uscomcbsas08. These are described in detail in the revised metadata accessible via
the Datasets.html page.
We also updated the geographic reference data sets related to Core Based Statistical Areas (CBSA, aka Metropolitan/Micropolitan Statistical Areas). There are 5 new data sets (cbsas, cbsacos, cbsapops, csacbsas and metrodivs) in the georef filetype directory.
- 03/06/09: We completed re-conversion of most of the sf32000 data sets. We converted the Missouri and US level data sets.
There is more work to be done but we think most users will find this replacement collection to have most of what they need.
We downloaded the latest bls unemployment data to get data through 12/08. Stored in the bls_la filetype.
A first pass effort was made to add more current data to the cntypage directory. The new cntypage.merged09 dataset has more
current data from the 2007 Census of Agriculture, the latest (12/08) unemployment data from the BLS, and the most recent poverty data from
the SAIPE data. More upgrades expected later this month (2008 pop estimates with components of change and, hopefully, the 2008 Kids Count data.)
- 02/24/09: This was the day the system crashed. A major disk failure brought the system down for 36 hours before it was mostly
put back together again. All except for one directory that we discovered was not being backed up: sf32000. The description for this subcollection on the /pub/data Contents page begins with "Summary File 3, 2000 census. Perhaps the most important data collection in the archive ..." . We shall be attempting to reconstruct at least the most important data sets in this collection but it is going to take some time. We not only lost the data files but the SAS code used to create most of those data files.
- 01/21/09: Two new data sets were added to the acspums collection. These were the 2005-2007 3-year samples for the state of Missouri, stored as mohrecs073yr and moprecs073yr. We tried to download and convert the US data sets but there were problems with the files. They were also extremely large and we have decided for now to forego processing them. We can easily download data one state at a time, as demand dictates.
- 12/22/08: We created the popests.uscomnst08 data set with the latest (July 1, 2008) population estiamates for the nation and its states with annual
components of change since 2000.
- 12/15/08: The Small Area Income and Poverty Estimates (SAIPE) program at the Census Bureau released new data for the
country this month. The new data are for calendar years 2006 and 2007. We downloaded data for all US states and counties and for US School Districts for both years. We ran our standard conversion setup which recreates two data sets within the
saipe data directory (accessible via Uexplore/Dexter software). These new data sets are named usstcnty20xx and usschldst20xx. So the latter now have data for years 2000 thru 2007. The corresponding Missouri-subset views (change the first 2 characters of the name from "us" to "mo") were also updated.
- 12/09/08: On the historic day when the Census Bureau released the first-ever ACS 3-year period estimates data the MCDC
makes available a series of data sets based on these data. For most users the key new data set is the
usmcdcprofiles3yr
data set within the acs2007 data directory. Here you
will find almost 1000 variables (counting derived percentages and MOE measures) for over 14,000 geographic areas nationwide (those with populations of 20,000+). This makes
it (in our humble opinion) one of the more useful data resources available anywhere.
In addition to the MCDC profiles data set just described we have also downloaded and converted the corresponding collection of
detailed, or "base", tables released as part of the new 3-year estimates data. We have used essentially the same converting, naming,
and partitioning conventions used with the 1-year data that we describe just below (the 11/19 entry). You can access these very large
data sets via Dexter where you will be allowed to extract at the table rather than the variable level.
- 11/19/08: We have been spending some time trying to fathom the Census Bureau's relatively new summary-file approach to providing power users with large-scale access to the ACS detailed (aka "base") tables. We are not going to bother, bore or scare you with the details of what we have had to do to arrive at our results. Instead, we simply present you with those results. They are entailed primarily within the basetbls subdirectory of the acs2007 data directory ("filetype"). Following the model of what we did for the previous year's data (in acs2006/basetbls) we divided the collection of over 1200 tables comprised of about 30,000 data cells into a set of 6 subcollections. We grouped the tables based on the first 2 digits of the table number (a topic code) as follows:
- 00 to 07: 00=Data re sampling rates, 01 = Age and Sex , 02 = Race, 03 = Hispanic or Latino Origin, 04 = Ancestry, 05 = Foreign Born, Citizenship , 06 = Place of Birth, 07 = Residence Last Year, Migration
- 08 (all by itself): 08 = Journey to Work, Worker Characteristics
- 09 to 16: 09 = Children, Relationship , 10 = Grandparents, Age of HH members , 11 = Households, Families , 12 = Marital status, 13 = Fertility, 14 = School enrollment, 15 = Educational attainment, 16 = Language spoken at home
- 17 to 20: 17 = Poverty, 18 = Disability, 19 = Income (Household, family) , 20 = Earnings (Individuals) ,
21 = Veteran status,
22 = Transfer Programs, Food Stamps ,
23 = Employment status,
24 = Industry, Occupation, Class of Worker
- 25 & 26: 25 = Housing Characteristics , 26 = Group Quarters
We did not convert quite all the tables. We omitted the B98 and B99 tables related to sampling info and imputation. Not because they are not important, but because we thought it unlikely that anyone would be needing to access these for large-scale processing, which is the reason for the existence of these tables here. We have also not converted any of the PR (Puerto Rico-only) tables.
We have -- for the first time ever -- downloaded, converted and made available the margin-of-error (MOE) data for these tables. We processed the moe data separately but in parallel with the estimates data. Each of the 6 ustabs data sets that we just described above, has a companion data set with MOE values in the same directory. So we have the data set ustabs00_07.sas7bdat with all the estimates data for all tables numbered in the 00 thru 07 topic series, and we have the data set ustabs00_07_moes.sas7bdat that has the corresponding margin-of-error data. The latter are created using a naming scheme and observation sorting and keying so that they can be readily linked. Variable names on the estimates set use our standard naming convention: the k'th cell of table B25001 uses the name B25001i_k_ (where "_k_" just represents the cell number) . The corresponding variable in the ustabs25_26_moes data set is called
B25001m _k_ . All we did was substitute the letter "m" (for "moe") for the usual "i" (for "item").
Uexplore/Dexter users wanting to access these data will find that while it is still not as easy as we would like it to be, it is a lot better than it was just a week ago. We have worked on our file labeling via updated Contents files and have created a Datasets data set in the new basetbls directory to assist users. We even have helpful keyvals defined for these big data sets (accessible via the detailed metadata links). We are providing table-level Dexter access to these very large data sets (capability added 11/24/08).
We have also added a Varlabs subdirectory under the acs2007/basetbls directory. It contains 6 data dictionary txt files that should aid users in getting a better handle on just what the tables are in each of the 6 sub-data sets and provides detailed data-dictionary info down to the data cell level.
- 10/11/08: The latest health insurance estimates (showing uninsured persons and rates for various groups) from the Census Bureau have been added in the saipe subdirectory.
The new datasets are sahie05 (data for the entire U.S. at the state and county level) and mosahie05 (just a Missouri subset of the national dataset).
- 09/30/08: Added 6 new data sets in the acspums directory. The usprecs07 and ushrecs07 data sets are now distributed as a pair of sets, the first containing data for the states of Alabama thru Mississippi and the other (ushrecsb and usprecsb) with data for Missouri thru Wyoming. We have 4 us data sets (h and p data, each split into two geographic-universe parts) and two Missouri-only subset data sets.
- 09/24/08: Numerous files and data sets were added in the acs and acs2007 directories. These were all files downloaded from the
Census Bureau web site, or SAS data sets that were created from the downloaded data. Basically, we just created data sets in the new
acs2007 directory that were comparable to those in the acs2006 directory created last year with the 2006 ACS data.
If you have only one data set to explore this year it might be the usmcdcprofiles_1yr dataset in the acs data directory. It has the
detailed profile data (almost 1000 variables, including percentages and MOEs) for about 5000 geographic areas (those that meet the 65,000
population threshold) for each of the 2 available years, 2006 and 2007. Our plan is to replace this data set each year, just adding more
rows/observations as the Bureau adds data years.
- 09/10/08: The latest round of population estimates by age (single year), race (4 "bridged" categories), sex and Hispanic origin for 2007 from the National Center for Health Statistics have been added to our collection. All stored within the
nchsbri subdirectory of the popests data directory. The new figures are within the datasets that have 20xx in their names.
- 09/08/08: A new subdirectory within the sf32000x data collection is named xxtrbgs.
It contains a collection of state-based files, two data sets per state, one containing complete census tract data and the other block group level data. We are going to be working on the menu system for our dp3_2k profile reports so that users will be able to display these data as with all the other geogrpahic levels. (Prior to this we only had tract and bg data for a handful of state.)
- 08/30/08: The latest housing unit estimates by county for the US (with 2007 figures) were added to the
popests data directory, in datasets ushuests2007 and mohuests2007 (the Missouri subset of the US dataset, which contains data for every state and county in the country).
- 08/19/08: The latest data from the IRS indicating migration flows based on 2006/2007 tax returns have been added to the
irsmig directory. In addition to the SAS datasets for the entire country, we have copies of the original xls files for Missouri.
- 08/13/08: The economic and social indicators for Missouri counties stored in the
cntypage directory were updated to include more recent data. Specifically, data from the Census Bureau's estimates program (including age, race and hispanic detail) were added, as well as economic data from the BEA REIS program, the latest official population projections and several other items. You can access all the data in the merged08 dataset. You can also access the data in formatted report form by going to the
OSEDA County SEIR page.
- 08/07/08: The latest county population estimates with demographic detail by age, sex, race and hispanic were added to the popests directory. These are the "casrh07" datasets, and there is one per state. The Missouri data are in the mocasrh07.sas7bdat file. The casrhalt subdirectory contains alternate versions of the data in the format as released by the Bureau.
- 07/11/08: The latest sub-county population estimates added to the popests directory.
The new datasets are ussc07.sas7bdat (estimates for cities, MCDs, counties, states for the entire U.S.) and mosc07.sas7bat, the Missouri subset of ussc07. Includes
estimates for July 1 of each year from 2000 to 2007.
- 06/27/08:
County Business Patterns data updated with data through 2006 have been downloaded and converted in the
cbp directory.
- 06/23/08:
Taxable sales data for Missouri for 2007 have been added in the taxsales directory.
- 06/16/08:
Data for June, 2007 was added to the bankdeps collection of bank deposit figures for branches in Missouri.
- 05/27/08:
Two filetypes - stf420 and stf9s5 - both within the 1990 decennial census group were moved from the old OSEDA server to the active archive here on mcdc2. There was updating of the Contents and Readme files and, in the case of stf420, metadata in the form of a Datasets.html page was created. Bother these files contain data related to commuting patterns, i.e. where people who live in a certain place work. The stf420 collection is rather unique in that
it makes use of a set of 20 places of work custom-specfied for each county (MCD in New England) in the country.
These were the last two filetypes being pointed to on the OSEDA box, thus completing the migration off that older system.
- 04/26/08: The long-awaited Missouri county population projections from the Missouri Office of Administration were released on April 25 and are available now in the data archive. They are in a new filetype/subdirectory, moprojs (Missouri projections). In addition to the raw data we have also created a directory of xls (Excel spreadsheet) files which include population pyramid graphics.
- 04/24/08:Almost all of the datasets in the beareis data directory were replaced with the latest versions as just released by the Bureau of Economic Analysis on 4-24-08. These data now go up to 2006 and some of the data from earlier years has been revised.
- 01/17/08: In the acs2006 directory we added key_indicators.xls, an Excel spreasheet containing a set of key social, economic and demographic indicators for Missouri data regions (PUMAs). The data here are essentialy the same as contained in the system data file of the same name (key_indictors.sas7bdat) but the spreadsheet file has been transposed (each column becomes a geographic area, and each row corresponds to a data item) and customized. Not all the data items are considered key indicators (they may be the numerators or the denominators used to derive the key indicator) and the ones that are key indicators are highlighted. We are in the process of creating a series of pdf file maps ("geopdfs") to display these key items.
 The usmcdcprofiles dataset was modified, with several new variables added. This was related to creating the new key_indicators dataset since we wanted the key indicators to be a subset of the data in this profiles data set.
- 01/16/08:Three new datasets were added to the County Business Patterns collection (cbp filetype). These were all metro
area ("CBSA" - core based statistical area) data, one for each of the years from 2003 to 2005.
- 01/14/08:A new round of median household income and poverty estimates for the nations, state, counties and school districts were added to the saipe data directory.
- 01/02/08: Dataset uscomnst07 was added to the popests directory. Contains the national and state level estimates through July 1, 2007 with components of change. The county level version of these estimates should be released in the spring (usually around late March or early April).
- 12/26/07: Data in the cntypage
filetype (listed in the Compendia category) were updated with the latest available items for the time-series data. See the Missouri County Summary of Social and Economic Indicators web site for access to these data as formatted reports.
- 11/26/07: The 2006 ACS data have been under construction for about 2 months but have had the "Under Construction"
sign taken down now. We have a complete set of standard extracts (key data items) for all available geographic entities in the single dataset usmcdcprofiles (the source data for a new set of profile reports that are similar to the set of 4 profile reports available on the Census Bureau's American FactFinder site). We also have a complete collection of the detailed (base) tables for the entire U.S. in the summary subdirectory.
- 09/27/07: The 2006 ACS PUMS datasets have been added in the
acspums data subdiretory.
- 09/12/07: The 2006 ACS data for Missouri, stored in a series of Excel files organized by geographic
summary level, were added to the archive in acs2006/profiles/Missouri subdiretory.
(Even though it is in the profiles subdirectory, these xls files also contain complete Base tables and Subject tables.)
We also processed the profiles sheets from national collections of these data to create the sreies of profile_data.csv files stored in the acs2006/profiles subdirectory. We intend to convert these to SAS datasets shortly.
- 08/20/07: We have added population estimates by age (single year), "bridged" race, sex and
hispanic origin for 2006 to our 20XX datasets in the popests/nchsbri subdirectory. These are the data used in
our Population Estimates by Age web application (which has been updated so that 2006 now appears and is the default selection on the Years select list).
- 08/11/07: Added the county level population estimates by age, sex, race and hispanic origin ("casrh") datasets to the popests data directory.
We received IRS migration data for the remaining states (see following note) and regenerated the usinmig0506 and usoutmig0506 datasets.
- 08/04/07: We have processed partial data from the IRS regarding county migration trends for tax years
2005-2006. We have data for states Alabame through Tennessee and will be adding data for Texas through Wyoming shortly
(awaiting data from IRS). These data are stored in the irsmig directory. We have updated the Migration_Profiles
menu pages subdirectory so that users can now generate profiles for this latest year - i.e. we have added year 2005-2006 to the
select list on the menu pages.
- 07/29/07: We have replaced most of the data sets in the beareis directory with the latest data as released by the BEA in 2007. The data are now reported through year 2005.
- 06/28/07: In the popests directory we added datasets ussc06 and mosc06 with sub-county (i.e place)
level estimates for the entire U.S. and for Missouri, respectively.
- 06/21/07: In the taxsales (taxables sales data by sic and county for Missouri only from
DOR) directory we downloaded the final sales reports and converted to SAS datasets. These replace the preliminary versions done in March of this year when 4th quarter 2006 data were not yet available on the DOR
web site.
- 06/04/07: The County Business Patterns data for 2005 were added to the
cbp filetype directory. These
data sets
provide counts of establishments with total number of employees and annual payroll by various NAICS categories
(totals, major industries and more detailed, down to 6-digit NAICS code). These are reported for the nation, the states
and for counties. Lots of suppression for the more detailed NAICS categories at the state & county levels.
This continues a time series of such data sets going back to 1999. The county level data sets have over
2 million observations each.
- 04/24/07: We have a new filetype, irsmig, with
data from the IRS regarding county to county migration derived by matching consecutive years of tax returns. We have converted the data based on tax years 2004-2005 (the most recent data available) and going back to data based on tax years 1999 and 2000. We have data for the entire U.S., with Missouri subsets. A link to this filetype has been added to this directory page in the "Other" category. In addition to the archive data sets (accessible by Dexter) we have kept the orgiginal Excel spreadsheet files for Missouri only, and have also created csv files based on the original xls files for the entire U.S. and contatenated these to create a single file per year or In and Out migration.
- 03/22/07: Processed the Census Bureau's new county level population estimates with components of
change, 2000-2006. In the popests filetype directory created datasets uscom06 and mocom06, as well as
aggregated-to-CBSA summaries in uscomcbsas06.
- 03/08/07: Completed a major restructuring of the combined base tables in the acs2005 data directory. We have taken the hundreds of "base tables" associated with the ACS and combined them according to type of table. The Datasets.html page has been revised to reflect the new datasets.
A problem was found with the imputation tables stored in the acs20005/basetabs subdirectory. These have been corrected.
- 02/27/07: We added two new filetypes in the Compendia section. cntypage contains the data that OSEDA used in creating the Social and Economic profiles for Missouri extension people. It has a variety of data, with summaries for Missouri counties and the state, as well as for the 9 MU Extension regions. The data have recently been updated to get the most recently available data.
mosenior has data for Missouri counties and the state as used in the Senior Report web site and publication. As the name suggests, the emphasis is on indicators related to the elderly population.
We have eliminated the Agriculture category from the main directory box (above) and replaced it with a new Other category. This category has two filetypes in it: the old ag2002 type and the new movoters type (see below).
- 02/22/07: We have a new filetype, bls_la, with data from the Bureau of Labor Statistics local area employment statistics. These datasets provide a long time series (back to 1990 at the county level, to 1976 for states) with basic monthly employment data. We have downloaded all the BLS data in the series but we have thus far created SAS data tables only for state and county levels.
We added dataset movoters.cntyvotrsbyyr in the movoters filetype directory. It contains counts of
registered voters by Missouri counties for even-numbered years from 1992 to 2006.
- 02/14/07: Added 3 datasets in the georef collection related to New England City and Town Areas (NECTAS).
The new datases are nectas (codes and names), nectadivs (these are the smaller NECTA divisions - there are only 9 of them) and nectamcds (basically an MCD to NECTA equivalency file).
We also created (again in the georef collection) dataset mopums00, which is a nice summary of information regarding the 2000 PUMAs (5% versions) for Missouri.
- 01/22/07: A new set of 2000 census SF3 tables for State Legislative Districts have been added,
with separate files for each state. The full sf3 table datasets are in the new
slds2007 subdirectory within the sf32000 data directory. The corresponding standard extract datasets are in the
new
slds2007 subdirectory within the sf32000x data directory.
slds2007
- 01/17/07: Added a set of 51 state-based block level datasets in the mable2k data directory. The files are named XXsldblks, where XX is the 2-character state postal abbreviation. Each row corresponds to a 2000 census block and the variables sldl02, sldh02 and cd109 contain up to date state legislative district codes and 109th congressional districts.
|
|
|
|
|
This file last modified Thursday October 01, 2009, 09:25:08
|
|