Summary: We are interested in seeing how we are doing within our state (Missouri, of course, but this can easily be changed) in terms of some common measures of economic well being. We came across this article on the MCDC web site - Measures of Income in the Census which includes a nice table at the end of it that does a comparitive summary of seven different common measures of economic well being. What we'd like to do is get a report and an Excel spreadsheet that display these seven indicators for each city and county in the state, using the most recently available public data. We understand that the Census Bureau has begun releasing such data based on the American Community Survey so that this is a reasonable request. In addition to the 7 economic variables, we would also like to see data on the number of persons, households and families in each area, since the economic measures are based on these universes. Finally, we would like to see these data for the state (totals) and the nation for comparison purposes.
Data Set Accessed: acs2007.usmcdcprofiles3yr
Inputs: Dexter Query Form (scrollable image) | Detailed metadata page | SumLev values
Outputs: Dexter Summary log | csv (comma-delimited) | report(html) | Saved query file
Related Queries: none
What You Need to Know (Broad, Common Knowledge)
The Census Bureau is generally accepted as the best source of such data, at least in the public domain. There are many commercial purveyors of demographic / economic data who will be glad to sell you such data. But if you are like us - poor and skeptical - you would prefer something coming almost straight from the horse's mouth and with no fees involved. Until the advent of the American Community Survey, most such data were only available based on decennial census surveys. There were Per Capita Income estimates for governmental units when they were required for allocating federal revenue sharing dollars not too long ago. And there have been Median Household Income estimates for states and counties coming from the Census Bureau's SAIPE (Small Area Income and Poverty Estimates) group for quite some time. But these were largely viewed as statistical best-guess figures and only available for limited kinds of geography. With the flow of data from the American Community Survey this has totally changed.
There are entire web sites and conferences, hundreds if not thousands of published articles and books on the subject of the American Community Survey (ACS). You do not need to be an expert on the ACS in order to be able to access and make good use of the data produced as a result of it. Unless you are a data professional with a need to access such data on a regular basis, you may find that trying to wade through all the information on the various web sites (such as ours - see the American Community Survey link in the Navy Blue navigation bar to the left on our MCDC home page; or go straight to the official site at the Census Bureau - http://www.census.gov/acs/www/ ) . What you may discover if you spend a few hours or weeks exploring the ACS web site and then the American FactFinder site trying to access ACS data, is that while there are lots of ways to access the results (i.e. the published summary data) of these surveys, there are no simple tools created by the Bureau or otherwise widely available that can handle this data request. At least none that we are aware of (in January of 2009; the Bureau is always adding capabilities to their FactFinder site so we expect this to change some day.) It is relatively easy to access these data one-geography-at-a-time via the Census Bureau's (or the MCDC's) ACS Profile products. But being able to get the specific data items for specific geographic areas or types of areas is a different matter. This is where Dexter can help.
Navigating the Archive (how to get to this Dexter page) You can arrive here via the usual Uexplore navigation path by starting at the Uexplore/Dexter home page and following the links to the American Community Survey major catetgory and the acs2007 specific filetype (data directory) and finally to the Datasets.html page for that subcollection. But, as Yoda would say, there is another way. Many (probably most) users will come across ACS data on the MCDC web site by following the link to our ACS Profile Reports web application, found as the first line of the Quick Links menu box on the right side of most major MCDC web pages, including the home page. So we can navigate from there:
What's In The Data Set
The data set has over 14,000 rows and just under 1000 columns (variables). Each row represents a geographic area (the nation, the rural portion of the nation, a state, a county, a city, etc.), and each column contains some information about that area. The key to getting a better understanding is to take advantage of the detailed metadata page that has been created for it. Right click on the detailed metadata link at the top of the page and then choose the
Open in New Window (or tab) option so you follow along with this discussion.
The top of the metadata page has the short label for the data set: MCDC profile extract for all US areas with 20,000+ population based on 2005-2007 surveys . This is meant to convey to you that this data set is the source of the MCDC ACS Profile reports and contains data for all ACS areas in the entire U.S. based on data collected over the 2005-2007 time period. Notice that the entry for
Units: ends with etc , meaning there are too many to mention here. But you can get detailed information about what kind of geographic entities are included in the data set by going to the Key variables section and clicking on the entry there labeled sumlev. On that page you will see entries such as
010=United States (N=17) This tells us that the variable SumLev contains the value
010 on 17 observations in this data set. The code's value label is "United States"; so when a row has the value
010 for this variable it indicates a nation-level summary. The 4th line on this page tells us that the code for state level summaries is 040 and this code occurs 599 times on the data set.
It is important that you understand the purpose of the Key variables section of the metadata pages; they are the keys to helping you code your query filters in most cases. They tell you what values you need to enter in the critical third column ("Value") in Section II.
Usage Note(s) section of the metadata page has a number of items worth noting. We recommend you read them carefully before proceeding. Reading these and then following the link near the bottom of the metadata page to the Variables.pdf file should give you a good handle on what to expect from this data set. Notice that while there are about 990 variables, about 30 of these are identifiers (geographic and time-period) and the numeric variables mostly appear in triplet. There is the basic estimate variable, the estimate percentage and the estimate Margin-of-error variable. For example:
Age0_4 PctAge0_4 Age0_4_MOE. You will see this pattern consistently as you go through the data set. What this means is that the data set actually only contains about 330 distinct data items, but most items have 3 variables asscociated with them (variables such as means and median obviously do not have corresponding Pct variables).
Coding the Query Filter
Section II of the Dexter query form is where we can specify which rows/observations we want to include in our extract. Since the rows in this data set correspond to geographic areas it means this is where we get to specify which of those we want. Which is:
This would be a pretty straighforward filter were it not for the extra added complications having to do with Geographic Component summaries. Because these special special sub-geography summaries occur at the state and nation level on this data set and because we do not want to see them for this extract, we need to add something to filter them out. Our filter is going to consist of 3 parts:
- United States totals
- State totals (Missouri)
- County and place level data (Missouri)
We use four of the five available rows in Section II of the DQF.
- Select a row if it is a nation level summary.
- Select a row if it is either a state, county or place level summary AND it is in the state of Missouri.
- Add the condition that it NOT be a geographic component summary. This condition applies in conjunction with each of the other 2 conditions.
The trickiest part of the condition involves the use of an extra set of parentheses to specify the order in which Dexter is to combine the pieces of the logical query. Normally complex conditions (those that specify multiple conditions) are evaluated in a top-to-bottom order except that "AND'ing" always precedes "OR'ing". So normally it would combine (using the AND operator) the conditions from rows 3 and 4 before it would OR that result with the result of ANDing rows 1 and 2. Which would give us all cases which were Missouri state, county or place summaries and would also give us Nation level summaries that were not geographic components. This is close to what we want but not exactly. It turns out that there are also geographic component summaries at the state level, so allowing Dexter to do the default logic processing would mean that the geographic component "filter" would not apply to the Missouri selections. The way to get around this is to use parentheses to control the order of logical expression evaluation. You should note that there is a checkbox to the left and right of each row, displayed next to a left or right parenthesis. By checking the left parenthesis box in row 1 and the right parenthesis box at the end of row 3 we are specifying that the logical expression based on these 3 rows should be evaluated first, and then combined with the expression from row 4. Which is exactly what we want. <
- In the first row we specify that the State code should be that of Missouri (
State Equal to (=) 29.
- In the second row we add the condition that the summary level code must be one of the 3 values specified in the Value text box (
SumLev In List 040:050:160).
- The 3rd row is where we add the condition saying we are interested in the nation level summary (
SumLev Equal To (=) 010). Note that we have also checked the
örradio box between between the 2nd and 3rd rows. This directs Dexter to evaluate the conditions in rows 1 and 2 to see of both are true (since the default AND logical connector applies between rows 1 and 2); it then evaluates the expression in the 3rd row and "ors" that with the previous compound condition. The result is TRUE (and the row is selected) provide either of the conditions is true: either it is a Missouri state, county or place level summary, OR it is a nation level summary.
- The 4th row is where we add the special overall qualifier that says that we do NOT want a geographic component summary. The geocomp variable contains the geographic component code, and a value of 00 means it is not really a compnents. So:
Geocomp Equal To (=) 00
Conceptually simple but practically tedious we come to Section III of the DQF where we want to select our variables. What do we need? From the ID variables list we get to select those data items that will allow us to identify the rows of our output tables. We see that we chose the first 3: SumLev, geoid and AreaName. We know about SumLev from the previous discussions. Geoid is the geographic key provided by the Census Bureau to uniquely identify each geographic area. It actually includes the SumLev code as its first 3 characters. The name of the geographic area being summarized is what most users will use to make sense of the data. Choosing from the Numerics list is where things are not quite so simple. Looking at our specs, we have specified seven economic indicator variables and three more universe-size measures of interest. How do we know where these variables are within the 990-elelent select list? One good way to know ahead of time the names of interest on these data sets is to take advantage of the linkable metadata while viewing the ACSProfile report. When you see an item within a table of that report that you want you can right on the sub-table heading and be taken to the detailed metdata page and specifically to the portion of that page dealing with that table. From here it is easy to see and record the variable names associated with the items of interest. You can also go the Variables.pdf file and do searches on it to rather easily find what you want in most cases. Old-times might even resort to printing off this report and circling the items of interest with a felt tip pen. However you choose to do it, you should be able to come up with a list of the names of the varibles you want. Or, you can always guess. You could guess that the variables for Median Household Income would have the word "Median" in its name and/or label, and that the poverty rate would contain the word "poverty" or "poor". We took advantage of the
Filter by regular expression:feature located just below the Numerics select list to help us find the names of the variables we wanted. We just entered keywords separated by OR symbolds (vertical bars) and clicked the Filter button. Doing so results in getting a fresh select list that shows only those variables matching the expression entered. We could then use the cntl key and the mouse to select our variables off the menu. But this would require that we get all 10 or our variables displayed at once and would require careful clicking. And the result would not be viewable in the scrollable image documenting this query. So, we opted to take advantage of the seldom-used "3rd way" of specifying what variables we wanted. We simply typed in their names in the txt box provided just below the variable select lists. Of course, the first time we did this we entered
totpopand so we had to go back and fix that. This takes only a couple of extra seconds provided the browser does not "forget" our specs when we use the Back button to return to it (which seems to be almost always the case with Firefox and less so but still most of the time with IE).
For those of you who are trying to "follow along" with this query in a live session you might initially want to cheat a little by just using the list of variables that we typed in. You may think you can do this by selecting the list from the DQF form and doing a copy-paste into your form. But that will not work because you cannot select text from a graphics image file. What you can do, however, is utilize the links above in the Outputs: section to access the Dexter Summary log, where you should see the list of variables we specfied in a form that can be selected, copied and pasted into your live DQF window.
As usual, nothing too difficult in Section IV). Just a title and subtitle to help label the output report. Just be sure when and if you decide to tweak the query to do your own thing that you remember to modify these titles to reflect your changes. So if you change the state code in the first row of Section II from 29 to 06 then be sure to change the word "Missouri" in the title value to "California".
The only thing we did in Section V was the section b. checking of the Use variable labesl as column headers in reports option.
Questions, Exercises (with alternate outputs)
- Modify the setup to do a comparable report for your state.
- Modify the setup so that the report would be sorted by descending Mean Poverty Ratio (or any of the other six key measures) within each geographic summary level.
- Modify the query to look at data for the nation and for state level summaries excluding Puerto Rico. Use single-year data from 2007 instead of 3-year period estimates. Output to a pdf file and sort by descending Mean Poverty Ratio. See outputs: report (pdf) | Saved query file
- Generate a report showing the top 40 Metropolitan Statistical Areas in the country as measured by Median Household Income. (Hints: Use
cbsatype = Metroas a filter condition. Run it twice; the first time you'll see the "cutoff value" for MedainHHInc which you can then plug into the filter on the final run. See outputs: report(pdf) | Saved query file
- Generate a report showing these indicators for each of the PUMAs in your state. If you do this for Missouri you'll see that the PUMA areas have been assigned meaningful labels to help the user see the data. Consider creating your own set of PUMA AreaName values that you could combine with the output of such a query to create PUMA level reports that someone might actually be interested in reading. While you're at it, see if you can do a choropleth map at the PUMA level as well. Pick your favorite indicator to map. See outputs: report (html) | csv file | Saved query file
- Where would you go to look for equivalent data from the 2000 decennial census so you could look at trends?