Missouri Census Data Center

Dexter Online Help

V. 2.0, July, 2010

For a simpler overview of this material (with pictures) you may want to check out our Dexter Quick Start Guide.

The Stuff at the Top of the Form

We reference the portion of the page above the first horizontal bar, i.e. prior to Section I. There are two links at the very top, to the Dexter Quick Start Guide and to this Online help document.

Data Set ID Line

The line a thte top of the page identifying the selected Data Set is divided into three parts:

  1. The data directory (path) is displayed and doubles as a hyperlink in case you want to return to a Uexplore page referencing the current data collection (directory).

  2. A text box containing the name of the data set (without the extension). This has not always been a text box, it used to just display the name as text. The reason we made it a text box now is because we wanted to allow the user to modify the value. You do that by simply editing the value and hitting enter. Doing so will not change anything you can see on the page. What it will do is change links on the page (for example, to detailed metadata, just below) and, most importantly, the input dataset that is going to be passed to dexter. Be aware that this is a relatively new (i.e. not as thorougly tested) feature and that there is little or no checking to make sure that what you type into the box is valid. For this to work you must enter the name of a dataset that really exists in the specified data directory (spelling definitely matters, case does not; the path/directory cannot be changed from this page) and the new dataset should contain essentially the same variables as the current dataset. The current page contains various select lists (in Sections I and III, for example) that have been generated based on variables (/tables) present in the original dataset. These select lists do not get changed to reflect the variables in the new dataset you have entered. The program does not even check to see if the dataset whose name you just typed in even exists. You will only find this out when you Submit the query and get the message saying Dataset specified does not exist. (from Dexter). So type very carefully. This feature is typically useful when you have a somewhat complicated query that you want to run for a series of similar datasets. You could run it for dataset motracts, then go back and change the dataset and run the identical query for iltracts and again for kstracts, etc. -- the same query to access tract level data for Missouri, Illinois and Kansas.

  3. <.# rows>. rows/observations, <.# cols>. columns/variables This is fairly straightforward, useful information about the dataset selected (if you overtype the dataset name in the preceding text box, it does not cause this information to be updated). If the dataset is a view (i.e. a "virtual" dataset that is stored as a program reference rather than a normal dataset) then the rows/observations count will be missing (the observation count is stored as part of a regular dataset but not as part of a view).
Detailed Metadata Link

There may or may not be a line just below the Data Set: line that invites you to "See detailed metadata for this dataset.". MCDC staff have been creating metadata modules for many of our most frequently-used and important datasets but we still have many datasets without such a module. If the link is there and you are new to this dataset (or at least this type of dataset) then you really owe it to yourself to follow the link and see what you can learn about the data. Note that one of the most important aspects of these meta files is to provide you with Key Values pages that tell you know the meaning of important identifier codes used on the dataset. Such knowledge is critical for coding data filters (see Section II).

Rankster Option

This is another relatively new feature, added to the application early in 2010. The rankster dynamic web application can be used to create reports and data extracts where the focus is on ranking the data based on specified variables. In many ways rankster parallels dexter in the way that it accesses and processes datasets in the archive. There are two basic ways you can invoke the rankster application, and they both require using this line. One way to do it is to just skip dexter and take the specified dataset (in its entirety, i.e. all rows and all columns) and go straight to rankster. You can do this by clicking the link at the end of this line. The other way to invoke rankster is by specifying (via the Rankster option: select list) one of the two options other than the default "Not interested" value. The two "yes" options allow you to use dexter as a preprocessor for rankster. You can specify that you want to view the dexter results and then have an option to pass the results of your dexter query to rankster; or, you can take the more I-know-what-I'm-doing approach and skip viewing the dexter results and just go straight to the rankster query form with the results of the dexter query specified as the input source for rankster. By having dexter serve as a front end module for rankster means we can take advantage of its nice filtering and variable selection features without having to reimplement them within the rankster application; it also is a way of sharing the uexplore navigation tool to choose the rankster data source.

Quick Look

The button can be used to generate a quick and easy report showing selected variables for the first few rows of the dataset. You would typically hit this button when you first start your query to see if it helps you get a handle on what the dataset looks like. By default the report will contain all the Identifier variables (as displayed in Section III of the left side select list) for the first 100 rows (unless, of course, the dataset has fewer than this). These defaults can be overriden as follows:

I. Choose Output Format(s)

This section of the form lets you choose up to 3 output formats. Each of the 3 lines here contain a set of Radio buttons that lets you choose whether you want any output of the specfied type and, if so, which specific format.

The Delimited File buttons let you choose a comma or tab-delimited output file. By default you will get a comma-delimited ("csv") file. Note that "csv" is the filetype extension associated with this format; it stands for "comma separated values" and most browsers - IE especially - will be configured to automatically invoke Excel as a plug-in to handle a file of this type. If you choose Tab-delimited instead (since these are radio buttons you can choose only 1 option from each line) then the fields are delimited by tab characters rather than commas and the file generated by Dexter has a ".txt" filetype extension (instead of the default ".csv"); this will usually result in different behavior by your browser when you click on the output file. In most cases it will just display it in your browser and you can then use File-Save to capture it.
These files will begin with 2 rows of identifiers before the actual data lines begin. The first row contains the names of the variables and the second row contains their labels. (In some files, and for some variables, there will be no labels assigned in which case the variable name is repeated as the default label.)
Choosing "none" in this row indicates that no delimited file is to be generated. (Just be sure that if you make this choice, that you select some option other than "none" from at least one of the remaining 2 rows; otherwise, you will have selected no output at all.) If you are expecting to use this format to load your data into a spreadsheet (and we know that is what most people are doing these days) you need to keep in mind that spreadsheet packages have limits on the number of rows and columns they can handle. Many (older) versions of Excel cannot handle more than 255 columns nor more than 65,000 rows.

The Listing/Report choices allow you to specify an output file that is in a format designed for reading rather than for storage and further manipulation. The rows/observations from the dataset become the rows of the report, and the variables become the columns. If you choose plain text or pdf formats then the program will limit the number of columns to display across the page, and will start a new page (with titles and column headings after every so many lines). An HTML format report only displays the headings at the very top and has an unlimited page width, so all variables are displayed on one line. You really should not request a report format if your extract contains many variables, especially if those variables may be long character strings. Plain text format is the best choice for faster response, since it requires the fewest resources to generate; pdf takes by far the most resources to generate. But pdf is the best choice if you want a report that you want to print and/or share with others.
Most of the options in Section IV and those in Section V-b are relevant to the details of this output file; things such as titles, footnotes, column headers, and even complete styles (font and color choices). Tip: Consider the first checkbox in Section V-b; using labels instead of variable names as column headers can greatly enhance the readability of a report. More ambitious users can combine this with use of the Label option in Section V-c to assign their own variable labels.

The Database File option is probably the least often used. The venerable dbf format is still widely used and is especially popular for use within GIS and desktop mapping systems. These files can also be read into Excel and other spreadsheets, although if that is what you want you are better off going with the default - a csv file. The latter requires less space and also is able to store a row of variable labels, which dbf format does not support.
The SAS dataset option is for those who want to access the data using the SAS(c) software package. We write this dataset in the latest (v9) format for the Windows platform. Many of our datasets have variables that are linked to custom SAS format codes. For example, a variable containing a 5-character FIPS county code may be linked to a $county format code, which causes it to display as the county name associated with the code. In the other output formats you just get the results of applying the format, but with the SAS dataset you get the 5-character variable with the associated format code. This can cause problems when accessing from a SAS environment where $county is not recognized. You can avoid the problem by specifying the SAS system option nofmterr. If you would like to recreate that custom format code for your local SAS environment you can access our public formats library and download the source code. Go to http://mcdc2.missouri.edu/cgi-bin/uexplore?/pub/sasfmats and look for the file with the name corresponding to the format name. Character type formats (which are what we use most of the time) will be named starting with an upper-case "S" instead of a "$"; so the code for the $county format code is in the file Scounty.sas. Once you access the code in your browser you'll need to either save it to a local file or you can copy and paste it directly into your SAS program that accesses the dataset. You may need to add a Proc Format statement, since some of these modules will not include that statement. If you plan to use this format for working with many different data sets you may want to look into the possibility of storing the format code in a permanent SAS catalog. But that is beyond the scope of this help file.
We used to alert people to the option of creating SAS datasets that could be viewed with the SAS Viewer utility from SAS Institute. The viewer was a very useful utility that we thought was akin to the Adobe reader software, a free and easy-to-use viewer for people who did not have access to the parent software. But SAS Institute has apparently not viewed it this way. It has become difficult to impossible to find the software to download, and it requires you to be a "registered user" to do so. We cannot even find a version that will consistently read our latest downloaded datasets. So we no longer recommend this option.
The check box at the end of this section is rarely used. Normally when you invoke Dexter it writes things directly to your browser that summarize the query and provide links to the various output files. But if you are only generating a single output element and that element is either a plain text or HTMl report file then checking this box will cause Dexter to send your report file directly to the browser with no itermediate page.

II. Choose rows (observations) to keep by specifying a filter

This is the section where you get to tell Dexter which rows you want to keep for your output file(s). If you skip this section you get all the rows. If you just fill in the box at the top limiting the number of observations/rows then you just get that many output rows (report lines) from the top of the dataset (which is a good way to run a quick test and see what your output is going to look like.) As a general rule (there are important exceptions) the rows of our datasets correspond to geographic entities such as states, counties, cities, etc. Typically what you are doing in this section is specifying the geographic areas for which you want data. You do this by specifying 1 or more conditions which have to be met before the row is chosen. This process is known as filtering. The form allows for specifying up to 5 conditions and for specifying logical operators (And, OR & And Not) for connecting them; it even allows you to indicate parentheses to insert into the "filter" to control the order in which the conditions are to be evaluated and combined. But it is rare that a filter uses more than 2 lines and even rarer that you ever need to specify any logical operators or parentheses. Each condition is comprised of 3 parts: the Variable/Column, the Operator and the Value. The first 2 of these you select from drop-down menus, but the Value column is where you need to enter something. A typical filter would involve accessing a dataset that has census summary data for every county in the country. You would like to just get the data for California. You do this by selecting the variable State from the Variable/Column select list, then selecting the Equal to(=) value for Operator and then entering the code 06 in the Value text box. This tells Dexter to choose only rows where the value of the State code has a value of 06. How would you know to do this? You need to know 3 things:

  1. That the variable State contains the FIPS code for the state being summarized on each observation.
  2. That 06 is the FIPS code for California
  3. How this condition stuff works.

Item 1 is fairly easy. Whenever you see a variable named State in this archive, you can be almost certain that it contains a 2-character FIPS state code. There may be a label associated with the variable explicitly saying something to this effect (the variable labels are displayed following the variable names separated by a dash on the drop-down Variable select list) but usually this can just be assumed.

The hardest part is knowing what the code is for California (or whatever state you are interested in). These are readily available in lots of places on the web, including our Cure for the Common Codes site. Another likely source, which will show you just the values and their meaning for the current dataset, can be accessed by looking at the top of the Dexter input form page to see if there is a link labeled "detailed metadata" for this dataset. Following that link takes you to a page that provides a good deal of information about the current dataset, including a section labeled Key variables. This section consists of a row of hyperlinks referring to variables within the dataset that we think will be the most likely to be needed to create filters for the dataset. State is almost always a key variable, at least for datasets that have data for more than 1 state. Clicking on the State link under Key Variables will display a page showing what values are associated with this variable for the current dataset, and even displays a count of how many times each value occurs.

The third item, knowing "how this condition stuff works", seems to be what separates the people who will become happy Dexter users and those who will not. To the former group, it all seems obvious and reasonable, while to the latter it will forever remain a mystery. These people often hire research assistants to extract their data for them (if they are academics) or pay way too much to buy the data "prepackaged" from a commercial vendor (if they work for a company).

Value of Operator

The values that drop down when you click the down arrow on the select lists under Operator (the middle column of this section) are mostly straightforward relational operators. Equal to, less than, greater than -- these are pretty standard. But a few may not be so obvious:

Entering Case Sensitive Values

Normally, whenever the variable selected in Section II is of character type then whatever value is entered in the Value text box will be automatically converted to upper case by Dexter and compared to upper-cased values on the dataset. Once in a long while, you may want to override this default behavior and tell Dexter that you want the program to consider case. To do this you must enter a tilde character (~) as the first character in the Value box followed by the actual value. Thus if I select SumType as the variable, Equal to as the Operator and enter ~r as the Value, Dexter will only select rows where the value of the variable SumType is a lowercase "r"; it will NOT select a row if the value if SumType is an uppercase "R". If you omit the tilde from the value then the generated condition is

upcase(sumtype) = upcase("r")

and you would select rows that had a value of either lowercase or uppercase R.

Specifying Blank Values

You may have occasion to want to select using a condition requiring that a character variable has a blank value. Dexter has a special way of recognizing blank values: entering a single underscore (_) will be interpreted as a blank value when the Operator is Equal To, Not Equal To or In List. When a list of values is being specified (i.e. when the Operator is In List) then use the single underscore within the list. For example typing

01:02:_:04

in the Value text box would result in the 3rd list value being a blank. (Note that the length of a blank character field is not important; do NOT try entering multiple underscores.) Also, note that this only pertains to completely blank values, not to blank characters within text fields. You would enter "San Antonio", NOT "San_Antonio".

Logical Operators and Parentheses

Note: This section can easily be skipped by novice and casual users; learning how to use these features will yield a relatively small gain.

Every once in a long while you have a situation where the filtering condition requires more than the typical simple specifications. Something more than just give me the data for this state or this county or this geographic summary level. Tools are provided to permit creating such complex conditions. These are

  1. The ability to specify logical operators connecting your Variable-Operator-Value conditions. By default the logical operator is And.
  2. Checkboxes down the left and right sides allow specifying opening and closing parentheses. These allow you to create groupings of your conditions so that the logical operators that are inside parentheses get applied first.

The only good way to explain this is not to try, but just to give an example:


This file last modified Monday April 07, 2014, 09:41:35

Site Map    |    Our URL    |   
The Missouri Census Data Center is a sponsored program of the Missouri State Library within the office of the Missouri Secretary of State. The MCDC has been a partner in the U.S. Census Bureau's State Data Center program since 1979.

Questions/Comments regarding this page or this web site are strongly encouraged and can be sent to