MCDC Data Archive via Uexplore/Dexter: Accessible via link in
the navy blue navigation box on the left side of most MCDC pages.
The MCDC data archive is comprised of over 20,000 datasets with over 80 GB of data organized into approximately 60 data directories which we call filetypes. There are a number of links to overview documents and tutorials from this Uexplore home page. Following these links will provide you with background information as well as detailed instructions on using the applications, especially the data extraction module, Dexter. For this introductory tour, we are
going to simply demonstrate the system by leading you through two relatively simple and typical data extractions.
Example 1: Extract Poverty Data for all Counties in Your State
The poverty data will be based on the 2000 census. This is sample (also known as "long form") data, summarized on a Census Bureau data product known as
Summary File 3. In our data archive, we have two fileytpes (data directories) associated with SF3:
- sf32000 contains the complete-table datasets. These datasets are usually quite large, typically with thousands of variables (columns) per observation (row).
sf32000x contains the standard extracts derived from the full detail of sf32000 datasets. Notice that the hyperlink for sf32000x (on the Archive Directory / Uexplore Home Page) is displayed in bold while the link for sf32000 is not. Bolded links signify collections ("filetypes") that we consider our "house specialties", ones that we and our users have found particularly useful and where, in most cases, we have been the most diligent in creating helpful metadata.
For this application, we'll use the filetype sf32000x, which happens to
be the first one listed on the archive directory page. We navigate to it by clicking on the link to the Decennial Census Data: 2000 major category (in the blue-gray-background navigation box at the top).
Click on the sf32000x link (on the Uexplore home page - which you currently have open in a separate browser window -- right?). This invokes uexplore and tells it what directory we want to explore.
There is a lot to see on this page (the directory page generated by uexplore for filetype sf32000x). It has the names of all the files in the
directory with brief descriptions of most of them. A few of the files are actually subdirectories (the ones with the folder icons) and clicking on those will invoke Uexplore again and display the files in that subdirectory. Two very important files at the top are displayed in
bold to catch your attention. The Readme.html file is what you might expect from a file with that name; it contains explanatory material about the collection. As a casual user, you may find it contains more information than you really need to know just now, but
it's worth a quick look. The second bolded file is Datasets.html. This is the one you want
to click on. (Do it!) It provides enhanced access to the datasets in this directory.
We use the term dataset to indicate a special kind of file, the kind that we can extract data from using Dexter. Datasets are rectangular data tables comprised of rows and columns (also referred to sometimes as observations and variables.) When you access a directory with
uexplore you see all the files -- not just the datasets -- for that directory. The files are arranged in alphabetical order by name (with a couple of exceptions), and their is minimal information telling you what to expect inside the files. But when you access a Datasets.html directory page, all that you will see are the actual data files (datasets) and they will be sorted in some logical order and there will be important metadata (data about the datasets) to help you find and access the data you are looking for.
The Datasets.html page displays the datasets in a table where each row represents a dataset and each column
contains attribute information or a link related to the dataset. Note the column labeled Geographic Universe.
It turns out that this and the Units column are frequently the keys to understanding the data in a
directory. We have dozens of datasets here, and what distinguishes them from one another is the geographic
universe they cover and the geographic units that are summarized within their rows. Each row in the dataset corresponds to a geographic area (a state or a county, for example) while the columns represent the characteristics or attributes of the geographic area (the total population, the median household income, the poverty rate, etc.) Because we are a Missouri data center, we list
datasets with a universe of Missouri first. These are followed by datasets that have a universe of the entire
United States. (Next comes data for bordering states Illinois and Kansas, followed by datasets with a universe of any other state, in alphabetical state order -- but this is probably way too much detail for a quick tour.)
Scroll down to the section where the Universe becomes "United States ..." (around row 23 - this may change as we add new datasets to the directory). Find the row
that has usstcnty in the first column. The Label column tells you that this dataset has
State and county summaries for entire United States
.
This is more than you need - you need only county level data and only for your state. But what this application
is about is making it relatively easy for users to select just the data they need from these datasets. We have decided that this is the dataset from which we want to extract data so -- click on its name (in the first column), which is a hyperlink. This will
invoke the uex2dex Dexter preprocessor. It displays the dexter input form, which is customized to reflect the specified dataset from which we'll be extracting. We need to fill out the form and then click on one of the Extract Data buttons to invoke Dexter.
Note the link to the Dexter Quick Start Guide at the top of the page. Taking this little quick tour is an alternative to going through this brief getting-started document, but we strongly encourage first-time users to follow it through. It is intended mostly for people who may be suffering from "Dexter shock" when they first encounter the application with its rather lengthy input form. It tries to show that Dexter can handle very basic and useful extractions that require very little input on the part of the user.
Another important thing to note on the Dexter query form is the line displayed just below the Quick Start Guide link which says "Section headers are links to online help". What does this mean? "Section headers" are the large bold titles with the yellow backgrounds that begin each of the five roman-numeral numbered sections of the page. These headers are links to an online help document -- the ultimate detailed Dexter documentation. There is really just one rather long help document but it is divided into five sections corresponding to the five parts of the form. Each of the section header links take you directly to the location within the help page that deals with how to fill out that section of the form. This will be very helpful to you when you are doing your own extracts. But now lets get back to the guided tour. (If you have followed any of the section header help links to see what they look like, now is the time to return to the Dexter query form page. In the future you might want to consider right-clicking on the section headers and specifying you want to open the link in a new window, so that you can be viewing the help and the extraction form more or less at the same time.)
Click on the detailed metadata
link at the top of this page. Peruse all the useful information on the page that displays next. Of particular
importance, look at the row of Key variables. Click on the link to the SumLev variable.
We see what values occur for this variable in this dataset, and how often. It tells us, for example, that sumlev takes on
the value 040 52 times and that the meaning of this code is "State". On the other 3219 rows of the
dataset sumlev takes on the value 050 and this indicates a county level summary.
Use your browser back key to return to the detailed metadata page and now click on the keyvars State values link. You should now see a page that has the 2-digit FIPS code for each of the U.S. states and Puerto Rico,
with the name of each state and a count of how many times it occurs on the dataset. This count should be
equal to the number of counties in that state +1, since the dataset has summaries for each county and the
state as a whole. Note (i.e. write down or memorize) the code for your state - you'll need to know it for the filtering step, below.
Return from the detailed metadata page to the Dexter extraction page. We are not going to go into detail
about all the features of Dexter here. Instead, we'll just tell you how to use the page to handle the
sample extract. The page is divided into five roman-numeral-labeled sections, of which only the first 3 are really
necessary. In Section I, we suggest you click on the HTML radio button in the second row. This tells Dexter
that you want it to generate your extract as a report in html format (as well as generating a comma-delimited file,
as pre-selected in the first row of radio buttons; clicking on the button labeled none in that row
would turn off that default.)
Section II is by far the hardest part of using Dexter. This is where you can specify which rows/observations of
the dataset you wish to have on your extracted output. For this dataset you can get data for any state or Puerto Rico
and you can get data summarized at the state or county level. But you don't want all that: you just want data
at the county level and only for one state. Build a "filter" by modifying the first two (of five) rows of
the filter specifications section. In the first row, leftmost column click on the little pull-down menu arrow
and you will be presented with a list of variables; click on the SumLev - Geographic Summary Level entry. From the
first row, middle column ("Operator"), click on the pull-down arrow and select Equal To(=). In the rightmost
column ("Value") it gets a little harder because there is no menu to pull down; you have to know the value and you have
to enter it. Type 050 to specify you want county level summaries.
The first row of the filter specifies the summary level we want. In the second row you can tell Dexter
what state you want (if you dont't do anything in the second row you'll get data for the entire country). In the first column (of the 2nd row) select the variable State from the pull-down. From the middle column select Equal To(=). In the rightmost column the value you enter depends, of course, on what state you are interested in. I would enter a value of 29
to get data for Missouri. I would enter a value of 06 if I wanted data for California (yes, the leading zero is
significant and required because these codes are stored as character strings, not numbers). After entering the
state FIPS code you have finished with Section II and can scroll down to Section III.
Choosing the columns that you want to keep on your output is not difficult conceptually, but it can get a
little tedious if the dataset has a lot of variables and you want to be picky about which of these you want. Such as
in our example. Dexter displays 2 side-by-side select menus, Identifiers and Numerics. You can choose multiple
values (or none) from either list. When there are many variables on the dataset (as in this case) you will see some additional
boxes on the page just below the Numerics select list. These can be used to filter the variable selection list. In this case, you can search through
the entire drop-down select menu looking for variables related to poverty, or you can type into the box
labeled "Filter by regular expression" the string poor|poverty and then click on the "filter" button.
This tells the program to reduce this
selection list by only displaying entries that contain (in their name or label, as displayed) the string "poor" or the string "poverty" (the vertical bar (|)is the
logical "or" indicator). You still have to select the variables of interest off the reduced select list.
Choose GeoCode and AreaName off the Identifiers list and Poor, PctPoor, VeryPoor and PctVeryPoor from the
Numerics list. Click the Extract Data button at the bottom of Section III. If all goes well you should get your results back in just a few seconds. The
initial output screen (assuming there are no errors) will provide links to a Summary page and then to each of the requested (in Section I) output files.
There are numerous other options in Sections IV and
(especially) V that can be used to control the details of the output. But they are beyond the scope of this introduction. See the online Help for details if and when you feel the need.
Example 2: Accessing Population Estimates Data for Counties Within a Metro Area
In this example you'll access the population estimates (popests) filetype (data directory) and look for the latest estimates at the county
level showing components of population change since the 2000 census.
If you don't know what a "Component of change" is don't worry about it. Suffice it to say that these components will be reprsented by variables on a dataset that we'll be accessing. You should be able to follow the exercise without having to actually know and understand the details of the data being extracted.
Begin by returning to the
Uexplore/Dexter home page . From there click
on the Pop. Estimates major category, and then on the first link within this section (there are only two choices and you want the
current estimates, not the historical ones from the 90's). Uexplore displays the popests data
directory.
The first file shown is Datasets.html and as before you'll want to click on that link to get
the enhanced view of the data in the directory. There are a number of choices available to us, based on the geographic
universe and the geographic/demographic detail of the estimates. Look for the dataset named uscom05
containing (per the Label column) "County estimates with components of change for 2000 thru 2005". [This was the most recent
dataset at the time this was written; if you see a later year on the list, choose it.] Click on the name of the dataset in the leftmost column to invoke Dexter (uex2dex, to be precise) for this dataset. On the Dexter form page that comes up click on the link to
detailed metadata at the top.
On the resulting detailed metadata page focus again on the Key variables section
. We want to
display data for all counties in the current Kansas City metropolitan area. There are two kinds of metro
area codes available on the dataset, named msacmsa and cbsa. The former represents the codes and definitions in
place at the time of the 2000 census, while the latter is the newer (current) Core Based Statistical Area
codes (which include both Metropolitan Statistical Areas as well as Micropolitan SA's).
Does this seem a little overcomplicated and confusing? Welcome to the wonderful world of Census Geography.
Click on the cbsa link
to view the complete list of codes and their meanings. Scroll or search (using your browser's text search feature, usually triggered by typing ctrl-f) to find the value for Kansas City. The answer turns out to be 28140. Write this down (or select it
and copy it to the clipboard (ctrl-c should do it) if you like, in which case you can paste it (using ctrl-v) into the text box later on the Dexter page).
Return to the Dexter page for the dataset (use the browser Back button twice). Choose csv
and HTML as output formats in Section I.
In Section II
tell Dexter you only want to see data for Kansas City by specifying the filter:
cbsa (sel. from col. 1)  :Equal To(=) (sel. from col. 2) 28140 (typed/pasted in col. 3)
We do not have to specify that we just want SumLev values of 050 (county) - do you know why not?
In Section III choose County, State and Cnty from the Identifiers drop-down list. From the Numerics list you can choose whatever you like
but we suggest Pop00C, Popest, PopChang, PctChang, Births, Deaths, NatrlInc and NetMig as an interesting
set of numbers (some of these would qualify as "components of change").
Type in a title of your choice in Section IV and in Section V, under b. Advanced
Report Formatting Options, click to specify the Use variable labels as column headers in reports option. (This is probably the least advanced and the most frequently used of all the options in the Advanced Options section.) Click an
Extract Data button (there's one at the end of Sections III, IV and V.)
On the output screen click on the HTML report link (or, if you prefer to see the results in Excel, click on the
Delimited File link). With the results displayed in your browser, look at the Address window of your browser.
You should see the URL of the page that you and Dexter have just created. You can select that URL and copy it to the clipboard
(ctrl-c); then you can go to your e-mail application and send a note to a colleague or significant other with whom you would like
to share your results (impress). Just paste the URL (ctrl-v) as part of your e-mail message telling your recipient to click
on this to view the report. (These files only remain on the MCDC web site for 48 hours so tell them not to wait.) Of
course, you can always right-click on the links to these files and chose to save them to your local machine.