For a simpler overview of this material (with pictures) you may want to check out our Dexter Quick Start Guide.
We reference the portion of the page above the first horizontal bar, i.e. prior to Section I. There are two links at the very top, to the Dexter Quick Start Guide and to this Online help document.
The line a thte top of the page identifying the selected Data Set is divided into three parts:
enter. Doing so will not change anything you can see on the page. What it will do is change links on the page (for example, to detailed metadata, just below) and, most importantly, the input dataset that is going to be passed to dexter. It is important to be aware that there is little or no checking to make sure that what you type into the box is valid. For this to work you must enter the name of a dataset that really exists in the specified data directory (spelling definitely matters, case does not; the path/directory cannot be changed from this page) and the new dataset should contain essentially the same variables as the current dataset. The current page contains various select lists (in Sections I and III, for example) that have been generated based on variables (/tables) present in the original dataset. These select lists do not get changed to reflect the variables in the new dataset you have entered. The program does not even check to see if the dataset whose name you just typed in even exists. You will only find this out when you Submit the query and get the message saying Dataset specified does not exist. (from Dexter). So type very carefully. This feature is typically useful when you have a somewhat complicated query that you want to run for a series of similar datasets. You could run it for dataset motracts, then go back and change the dataset and run the identical query for iltracts and again for kstracts, etc. -- the same query to access tract level data for Missouri, Illinois and Kansas.
A new wild card dataset specification capability was added to Dexter (November 2015). You can now enter a value containing two consecutive underscores (__) to specify you want all datasets where two characters can replace these underscores. So if a directory contains a collection of data sets for all states in the country you can access all of them (in sequential alphabetical by dataset name order) by replacing the state abbreviation portion of the name with the two underscores. For example in data directory sf12010x there are datasets alslectedinv, akselectedinv, ... wyselectedinv. You could initially select any of these 52 datasets (we have a PR dataset in this collection) to invoke Dexter but then replace the dataset name in the box with __ as the first 2 characters and your query would then access all 52 data sets. If you only wanted to process say 5 of the 52 you could use the wildcard feature and then code a state-based filter specification in Sec. II of the form to filter all but the states actually desired. This will only work if all these data sets have the same variables as the data set you specified when you invoked dexter. None of the datasets accessed can be SAS views (which should not be a problem when accessing a collection with one dataset per state or a time series with one per time period). You cannot specify the "Go straight to rankster" option when using wildcards. You should avoid using the Quicklook feature or following links to metadata after entering the wildcard value. Probably other restrictions will pop up that we have not tested for yet. Let us know if something does not work.
<.# rows>. rows/observations, <.# cols>. columns/variablesThis is fairly straightforward, useful information about the dataset selected (if you overtype the dataset name in the preceding text box, it does not cause this information to be updated). If the dataset is a view (i.e. a "virtual" dataset that is stored as a program reference rather than a normal dataset) then the
rows/observationscount will be missing (the observation count is stored as part of a regular dataset but not as part of a view).
There may or may not be a line just below the Data Set: line that invites you to "
See detailed metadata for this dataset.". MCDC staff have been creating metadata modules for many of our most frequently-used and important datasets but we still have many datasets without such a module. If the link is there and you are new to this dataset (or at least this type of dataset) then you really owe it to yourself to follow the link and see what you can learn about the data. Note that one of the most important aspects of these meta files is to provide you with Key Values pages that tell you know the meaning of important identifier codes used on the dataset. Such knowledge is critical for coding data filters (see Section II).
This is another relatively new feature, added to the application early in 2010. The rankster dynamic web application can be used to create reports and data extracts where the focus is on ranking the data based on specified variables. In many ways rankster parallels dexter in the way that it accesses and processes datasets in the archive. There are two basic ways you can invoke the rankster application, and they both require using this line. One way to do it is to just skip dexter and take the specified dataset (in its entirety, i.e. all rows and all columns) and go straight to rankster. You can do this by clicking the link at the end of this line. The other way to invoke rankster is by specifying (via the
Rankster option: select list) one of the two options other than the default "Not interested" value. The two "yes" options allow you to use dexter as a preprocessor for rankster. You can specify that you want to view the dexter results and then have an option to pass the results of your dexter query to rankster; or, you can take the more I-know-what-I'm-doing approach and skip viewing the dexter results and just go straight to the rankster query form with the results of the dexter query specified as the input source for rankster. By having dexter serve as a front end module for rankster means we can take advantage of its nice filtering and variable selection features without having to reimplement them within the rankster application; it also is a way of sharing the uexplore navigation tool to choose the rankster data source.
The button can be used to generate a quick and easy report showing selected variables for the first few rows of the dataset. You would typically hit this button when you first start your query to see if it helps you get a handle on what the dataset looks like. By default the report will contain all the Identifier variables (as displayed in Section III of the left side select list) for the first 100 rows (unless, of course, the dataset has fewer than this). These defaults can be overriden as follows:
This section of the form lets you choose up to 3 output formats. Each of the 3 lines here contain a set of Radio buttons that lets you choose whether you want any output of the specfied type and, if so, which specific format.
The Delimited File buttons let you choose a comma or tab-delimited output file. By default you will get a comma-delimited ("csv") file. Note that "csv" is the filetype extension associated with this format; it stands for "comma separated values" and most browsers - IE especially - will be configured to automatically
invoke Excel as a plug-in to handle a file of this type. If you choose Tab-delimited instead (since these are radio buttons you can choose only 1 option from each line) then the fields are delimited by tab characters rather than commas and the file generated by Dexter has a ".txt" filetype extension (instead of the default ".csv"); this will usually result in different behavior by your browser when you click on the output file. In most cases it will just display it in your browser and you can then use File-Save to capture it.
These files will begin with 2 rows of identifiers before the actual data lines begin. The first row contains the names of the variables and the second row contains their labels. (In some files, and for some variables, there will be no labels assigned in which case the variable name is repeated as the default label.)
Choosing "none" in this row indicates that no delimited file is to be generated. (Just be sure that if you make this choice, that you select some option other than "none" from at least one of the remaining 2 rows; otherwise, you will have selected no output at all.) If you are expecting to use this format to load your data into a spreadsheet (and we know that is what most people are doing these days) you need to keep in mind that spreadsheet packages have limits on the number of rows and columns they can handle. Many (older) versions of Excel cannot handle more than 255 columns nor more than 65,000 rows.
The Listing/Report choices allow you to specify an output file that is in a format designed for reading rather than for storage and further manipulation. The rows/observations from the dataset become the rows of the report, and the variables become the columns. If you choose plain text or pdf formats then the program will
limit the number of columns to display across the page, and will start a new page (with titles and column headings after every so many lines). An HTML format report only displays the headings at the very top and has an unlimited page width, so all variables are displayed on one line. You really should not request a report format if your extract contains many variables, especially if those variables may be long character strings.
Plain text format is the best choice for faster response, since it requires the fewest resources to generate; pdf takes by far the most resources to generate. But pdf is the best choice if you want a report that you want to print and/or share with others.
Most of the options in Section IV and those in Section V-b are relevant to the details of this output file; things such as titles, footnotes, column headers, and even complete styles (font and color choices). Tip: Consider the first checkbox in Section V-b; using labels instead of variable names as column headers can greatly enhance the readability of a report. More ambitious users can combine this with use of the Label option in Section V-c to assign their own variable labels.
The Database File option is probably the least often used. The venerable dbf format is still widely used and is especially popular for use within GIS and desktop mapping systems. These files can also be read into Excel and other spreadsheets, although if that is what you want you are better off going with the default - a csv file. The latter requires less space and also is able to store a row of variable labels, which dbf format does not support.
The SAS dataset option is for those who want to access the data using the SAS(c) software package. We write this dataset in the latest (v9) format for the Windows platform. Many of our datasets have variables that are linked to custom SAS format codes. For example, a variable containing a 5-character FIPS county code may be linked to a $county format code, which causes it to display as the county name associated with the code. In the other output formats you just get the results of applying the format, but with the SAS dataset you get the 5-character variable with the associated format code. This can cause problems when accessing from a SAS environment where $county is not recognized. You can avoid the problem by specifying the SAS system option nofmterr. If you would like to recreate that custom format code for your local SAS environment you can access our public formats library and download the source code. Go to http://mcdc.missouri.edu/cgi-bin/uexplore?/pub/sasfmats and look for the file with the name corresponding to the format name. Character type formats (which are what we use most of the time) will be named starting with an upper-case "S" instead of a "$"; so the code for the $county format code is in the file Scounty.sas. Once you access the code in your browser you'll need to either save it to a local file or you can copy and paste it directly into your SAS program that accesses the dataset. You may need to add a Proc Format statement, since some of these modules will not include that statement. If you plan to use this format for working with many different data sets you may want to look into the possibility of storing the format code in a permanent SAS catalog. But that is beyond the scope of this help file.
We used to alert people to the option of creating SAS datasets that could be viewed with the SAS Viewer utility from SAS Institute. The viewer was a very useful utility that we thought was akin to the Adobe reader software, a free and easy-to-use viewer for people who did not have access to the parent software. But SAS Institute has apparently not viewed it this way. It has become difficult to impossible to find the software to download, and it requires you to be a "registered user" to do so. We cannot even find a version that will consistently read our latest downloaded datasets. So we no longer recommend this option.
The check box at the end of this section is rarely used. Normally when you invoke Dexter it writes things directly to your browser that summarize the query and provide links to the various output files. But if you are only generating a single output element and that element is either a plain text or HTMl report file then checking this box will cause Dexter to send your report file directly to the browser with no itermediate page.
This is the section where you get to tell Dexter which rows you want to keep for your output file(s). If you skip this section you get all the rows. If you just fill in the box at the top limiting the number of observations/rows then you just get that many output rows (report lines) from the top of the dataset (which is a good way to run a quick test and see what your output is going to look like.) As a general rule (there are important exceptions) the rows of our datasets correspond to geographic entities such as states, counties, cities, etc. Typically what you are doing in this section is specifying the geographic areas for which you want data. You do this by specifying 1 or more conditions which have to be met before the row is chosen. This process is known as filtering. The form allows for specifying up to 5 conditions and for specifying logical operators (And, OR & And Not) for connecting them; it even allows you to indicate parentheses to insert into the "filter" to control the order in which the conditions are to be evaluated and combined. But it is rare that a filter uses more than 2 lines and even rarer that you ever need to specify any logical operators or parentheses. Each condition is comprised of 3 parts: the Variable/Column, the Operator and the Value. The first 2 of these you select from drop-down menus, but the Value column is where you need to enter something. A typical filter would involve accessing a dataset that has census summary data for every county in the country. You would like to just get the data for California. You do this by selecting the variable State from the Variable/Column select list, then selecting the Equal to(=) value for Operator and then entering the code 06 in the Value text box. This tells Dexter to choose only rows where the value of the State code has a value of 06. How would you know to do this? You need to know 3 things:
Item 1 is fairly easy. Whenever you see a variable named State in this archive, you can be almost certain that it contains a 2-character FIPS state code. There may be a label associated with the variable explicitly saying something to this effect (the variable labels are displayed following the variable names separated by a dash on the drop-down Variable select list) but usually this can just be assumed.
The hardest part is knowing what the code is for California (or whatever state you are interested in). These are readily available in lots of places on the web, including our geographic codes lookup web application. Another likely source, which will show you just the values and their meaning for the current dataset, can be accessed by looking at the top of the Dexter input form page to see if there is a link labeled "detailed metadata" for this dataset. Following that link takes you to a page that provides a good deal of information about the current dataset, including a section labeled Key variables. This section consists of a row of hyperlinks referring to variables within the dataset that we think will be the most likely to be needed to create filters for the dataset. State is almost always a key variable, at least for datasets that have data for more than 1 state. Clicking on the State link under Key Variables will display a page showing what values are associated with this variable for the current dataset, and even displays a count of how many times each value occurs.
The third item, knowing "how this condition stuff works", seems to be what separates the people who will become happy Dexter users and those who will not. To the former group, it all seems obvious and reasonable, while to the latter it will forever remain a mystery. These people often hire research assistants to extract their data for them (if they are academics) or pay way too much to buy the data "prepackaged" from a commercial vendor (if they work for a company).
The values that drop down when you click the down arrow on the select lists under Operator (the middle column of this section) are mostly straightforward relational operators. Equal to, less than, greater than -- these are pretty standard. But a few may not be so obvious:
06:41:54. The colons are used to separate the list of values. The condition is true if the value of the variable matches any of the values in the list.
631as the Value entry. The "(Char Vars Only)" qualifier indicates that this operator only works on character type variables. This should rarely be an issue since we would never (well, hardly ever) store a geographic code such as a ZIP code in numeric format. We have a very rigorously followed convention of storing all our geographic and other identifying codes in character format (so that they occur in the Identifiers list in Section III, rather than the Numerics - this is a quick and easy way to make sure that a variable is stored as a character string).
(CDP)as the Value.
10000:50000in the Value box.
Normally, whenever the variable selected in Section II is of character type then whatever value is entered in the Value text box will be automatically converted to upper case by Dexter and compared to upper-cased values on the dataset. Once in a long while, you may want to override this default behavior and tell Dexter that you want the program to consider case. To do this you must enter a tilde character (~) as the first character in the Value box followed by the actual value. Thus if I select SumType as the variable, Equal to as the Operator and enter
~r as the Value, Dexter will only select rows where the value of the variable SumType is a lowercase "r"; it will NOT select a row if the value if SumType is an uppercase "R". If you omit the tilde from the value then the generated condition is
upcase(sumtype) = upcase("r")
and you would select rows that had a value of either lowercase or uppercase R.
You may have occasion to want to select using a condition requiring that
a character variable has a blank value. Dexter has a special way of recognizing blank values: entering a single underscore (_) will be interpreted as a blank value when the Operator is
Not Equal To or
In List. When a list of values is being specified (i.e. when the Operator is
In List) then use the single underscore within the list. For example typing
in the Value text box would result in the 3rd list value being a blank. (Note that the length of a blank character field is not important; do NOT try entering multiple underscores.) Also, note that this only pertains to completely blank values, not to blank characters within text fields. You would enter "San Antonio", NOT "San_Antonio".
Note: This section can easily be skipped by novice and casual users; learning how to use these features will yield a relatively small gain.
Every once in a long while you have a situation where the filtering condition requires more than the typical simple specifications. Something more than just give me the data for this state or this county or this geographic summary level. Tools are provided to permit creating such complex conditions. These are
The only good way to explain this is not to try, but just to give an example:
If you were to simply enter these 4 conditions and leave the relevant logical operators set at the default
SumLev Equal To(=) 040On the second line create the condition
TotPop Greater Than 1000000On the third line create the condition
SumLev Equal To(=) 050and on the fourth line the condition
TotPop Greater Than 50000
Andvalues without any parentheses specified, what you would be creating is a logically impossible condition (and hence a failed query where no rows are selected). This is because the value of the SumLev variable cannot be equal to both 040 and 050 (as specified on the 1st and 3rd lines). Clearly, what you have here is not a simple set of 4 necessary conditions, but rather a pair of sufficient conditions, each consisting of 2 necessary conditions. What we would like to do is this:
What we have here are two compound conditions, and we want to select the observation if either of those conditions is met. So we need to use an
Or logical connector. We do this by checking the small circle next to the word
Or between the 2nd and 3rd rows to choose it as the logical operator. To insure that the program first evaluates the compound expression based on the first 2 lines we check the left parentheses box at the left of line 1 and the right parentheses box at the right of line 2. Similarly we use the left paren and right paren boxes on lines 3 and 4, respectively, to insure that these 2 conditions are logically combined (with the And operator) first, before applying the Or operator. The Or operator is then applied to the result of And'ing the conditions on lines 1 and 2 (the first condition for the Or operator) and the result of And'ing the conditions on lines 3 and 4. If either of these compound conditions is true, the row is selected. Translated into SAS code our condition looks like this:
(SumLev=040 And TotPop > 1000000 ) Or ( SumLev=050 and TotPop > 50000 );
01001then the value you will see in your output file(s) will be
Autauga AL. The $county format code "looks up" the value and returns its label. (You can modify the format associated with a variable in Section V-c.)
In Section II you specify what rows you want to keep; here in Section III you specify the columns (variables) you want to keep. It is a much easier process than coding a filter, which usually involves nothing more than clicking (selecting) values from select lists. It can get tedious when working with datasets that have a lot of variables, but it is conceptually simple. The simplest think to do is check the box at the top that says you want all the variables. It may be simple, but it is not usually what you really want. Most users most of the time will want to be selective. Think about what you'll be doing with the results - do you really need or want hundreds (or thousands in some cases) of variables? Remember the 255 column limit if you are intending to take your extract into Excel.
Note that Section III is the only one where you are required to enter something. All the other parts of the form have builtin defaults that apply if you ignore them. But there is no default for your variable selections - you must do something here, even if it just checking the box saying you want all the variables.
We provide two column select lists, identifiers on the left and numerics on the right. Typically, you will nned to choose just a handful of identifiers - be sure to take enough that you will know what your data pertains to. It is the Numerics list that can sometimes be a challenge, depending on how many there are on the dataset. In a few special cases involving mostly complete-table decennial census files, the Numerics list (which allows you to select variables) is replaced with a Tables list. When this is the case you are only allowed to select entire tables - at least using the select lists. This is a trade off. We determined after considerable experience trying to work with some of these over-sized datasets that going with variables organized by table was the way to go. Examples of filetypes that support table selections are sf32000 and sf12000. An example of a filetype that should support such selections but does not (yet - because we have not created the required metadata files for it) is sf42000. These are all 2000 census Summary Files.
Other filetypes, such as our standard extracts based on 2000 SF3 tables, sf32000x, have enough variables (between 100 and 2000 is the magic interval; for fewer than 100 it is not worth the bother, and for over 2000 it takes too long to respond to searches) to justify some special extra tools for making it easier to find and choose what you want. These tools (select list "filtering") are described in the following paragraph.
When you see the
Filter by regular expression box below the Numerics select list you have the option of typing something in this text box - something called a regular expression - that will be used to modify what appears on the Numerics select list. For example, if I am processing a typical sf32000x dataset with its over 400 variables I might be looking for creating an extract of variables related to income or to poverty. I can enter the value
poverty | income
and click on the
Filter button. Within seconds the Numerics select list will be regenerated and will now only include variables where the specified keywords "poverty" or "income" are present in either the variable name or its label. The vertical bar (|) in the expression is an "or" operator that indicates that you want to select variables that contain either word. You can now make your choices off the much shorter list. If you do not like the results of your filter simply click on the
Clear buttom to restore the list to its original all-variables-shown state. Click the
Case-sensitve buttom in the unlikely case that you want your regular expression matching to take into account case in matching strings - so that "Poverty" does NOT match "poverty".
Extract Databutton, is a text box where you are invited to enter your "own variable list". Most of you will not (and probably should not) ever do this. It is easy to mess up when you type in your names, that is why we have pull-down select lists. But there are cases where this method can constitute a shortcut, especially for users who are familiar with the syntax of SAS variables lists. Entering a value in this box will cause Dexter to ignore any selections you may have made - the list you type is your entire select list; it is not an addition to previously-selected items. (But if you checked the "ALL columns" box that takes precedent and any list you enter here will then be ignored.)
A reason for using this "third way" is that it lets you control the order in which your variables will appear on all your outputs. You need to code an @ symbol as the first character entered here. Normally the order is based on the order of the variables in the source dataset and cannot be modified. So if I were to enter
@name address age city state zip
Perhaps the most common use of this text box is to allow selecting specific variables from a dataset where you get a Tables select list instead of a Numerics select. Say I were accessing the SF3 complete-tables dataset for all P and H tables for ZIP codes (ZCTAs) in the United States (file uszipsph.sas7bdat in data directory /pub/data/sf32000) and I had studied the detailed metadata which pointed me to a
varlabs file that was the best tool for seeing what variables contained what information. Furthermore, let's say I was interested in calculating a poverty rate for children aged 0-17 for every ZIP, and the data cells I needed were in table p87. I could just select the entire table (which has data on poverty by age with a total of 17 data cells, of which only 8 were relevant to the number I wanted.) I would type in the text box something like this:
geocode state p87i3 p87i4 p87i5 p87i6 p87i11 p87i12 p87i13 p87i14
This is fine since we only had to enter the names of 8 data cells. But similar requests in the real world could possibly have many more data cells required. This is where knowing how to use special list notation comes in handy. There are basically three key variable list shortcuts that can be used:
p87i3 p87i4 p87i5 p87i6(used in our last example) can be written as
and all of the variables corresponding to the 7 tables would be specified (kept on output).
p12i:in my list I am saying I want all variables with names starting with the characters "p12i". Our table-variable naming convention guarantees that this is exactly the set of variables comprising table p12. The variable names are p12i1, p12i2, ..., p12i31. The two lists
p12i1-p12i31yield the same result, but the former is easier to code because you don't have to look up the number of cells in the table as you would to code a single-dash interval.
This short section consists of a set of text boxes where you can specify some useful options. The first 3 pertain only to the Listing/Report output type. The sort option applies to all outputs. Special characters, including quotes (apostrophes or double quote symbols), percent sybmols and ampersands, will be removed from any values entered for titles or footnotes in this section.
state county -pctpoorwould result in output(s) sorted first by State; within state by county; and within county by descending values of pctpoor. If you were to insert a minus sign just before
statein the list it would result in the output sorted by descending state (so that Wyoming might appear first, with Alabama at the end, reversing the normal ascending order).
All variables used for the sort options must be selected for keeping on the output in Section III; you cannot sort on a variable that is not present on the dataset. If you are doing aggregation of the data (Section V-a) then the dataset must be sorted by the aggby variables specified there.
State CBSA. That would tell the program to take the initial extract, sort it by CBSA within state and call an aggregation utility that will produce summary rows by aggregating numeric variables. For now, this will only work for character type variables; numeric variables cannot be used in an aggby list. (For the most part, character variables are those that appear in the Identifiers lists (as opposed to the Numerics lists); use the Qmeta report to get the variable type for absolute certain, from the
Typecolumn.) This is usually not a problem, since we are very careful to make sure that category variables on our datasets are character type rather than numeric.
Note that the order of the variables in the aggby list is significant here. If you had typed in
CBSA State (with CBSA entered first instead of second) then the summaries produced would be for State-within-CBSA (a "level 1" summary) and for CBSA's (regardless of State). That's an important difference.
aggby=State CBSA Countyas the aggby parameter and
Agg Level=3then on your output files the variable _lvl_ would take on a value of 1 for County summaries (within State and CBSA), and would take on the value 2 for CBSA (within State) summaries and a value of 3 for State summaries. When requesting multiple levels you might want to keep the _lvl_ variable to make it easier to select subsets and understand your results. The _nag_ variable could also be useful; in our example where we were aggregating county data to get summaries for CBSA's, for example, the value of _nag_ would tell you how many counties there were within the CBSA.
1234567). The value you enter needs to be in the format
variable-name(s) format-code [repeat]. For example:
totpop over65 comma9. pctpoor 5.1causes the value of the variables totpop and over65 to display in comma9 format, (x,xxx,xxx) ; and the value of the variable pctpoor to display in 5.1 format, which is xxx.x (it will be rounded to the 1 decimal place). Note well the use of periods in the format codes - the syntax requires these (they allow the program to distinguish variable names from format codes). It is beyond the scope of this document to go into all the possible format codes available in SAS, but 99% of the time all users will be interested in is controlling the insertion or omission of commas and/or the number of decimal digits. Some common format codes that can be used include:
7.0- variable displays with 7 digits, no commas and no decimal point, i.e. xxxxxxx. Leading blanks will not be retained on delimited output files. (You might specify this format if the variable had a comma. format associated with it and you wanted to override that so that the commas did not occur in your csv file. You could also use it to round off values with decimal portions to integer values.)
dollar10.This could be used for a variable that is a dollar amount and would cause the value to display in $x,xxx,xxx format. If there was a decimal portion you could use the code
dollar12.2which would display in $x,xxx,xxx.xx format.
$5.This format should be used only with character type variables (to see what variables are of type character vs. numeric you should use the View qmeta Metadata report links at the end of Section II; for the most part, variables appearing in the ID list will be character type, while all variables in the numeric list will be of type numeric.) The $5. format code indicates to simply display the value as 5 characters with leading blanks removed (use the $char5. format if you want to preserve leading blanks). The most common reason for using such a code would be for a variable that had a custom format code associated with it, which you wanted to override. For example, on many of our datasets you will find the character variable county which contains a 5-character FIPS county code, and which has the custom format code $county. associated with it. This custom format code is used to translate the codes to the name - e.g. "01001" displays as "Autauga AL". If you wanted to see the code instead of the name you could do so by specifying the $5. format for the county variable.
$state.If there is a variable that contains a 2-character FIPS state code you can specify the custom $state. format code to cause the value displayed to be the name associated with the code, i.e. instead of "29" you would get "Missouri". Similarly, as discussed above, you could specify a format code of
$county.with a variable that contained a 5-character FIPS county code in order to have it display the name of the county rather than the code.
current-name=new-name. For example:
fipco=County Areaname=Area_Summarizedwould result in the variable fipco being renamed to County, and the variable Areaname being renamed Area_Summarized. (These names can occur in header rows of Excel spreadsheets created from Dexter-generated csv files, or as column headers in reports.) When assigning new values you need to keep in mind these rules for SAS variable names:
variable-name='label-text'. Note that the label has to be enclosed in single quotes. For example:
MetroDiv='Metropolitan Division' popjl13='Current Estimate'
For this to work you have to start by checking the box to indicate that the output is to be transposed.
The by variables option is rarely used. It requires that the data being extracted have the same set of rows repeated for the values of these by variables. Maybe if the dataset contained time series data that repeated a cycle for several values of a variable Year, you could specify year as a by variable for the transposition. It would then remain a column instead of a row, and you get a new row for each variable, one for each value of year.
The Name variables on output ... line lets you specify what the names of the transposed variables will be. You do this by specifying an alpha prefix and then specifying a variable in the dataset whose value will be appended to that alpha prefix to form the name. For example, if you were transposing a dataset with county level summaries you might specify a value of "c" as the prefix and the variable fipco for the 2nd option. Then if an observation on the dataset had a value of 01001 for the variable fipco then all the values from that row would be transposed into a column and that column would be named c01001. Note that the resulting name must be unique - there can be only one observation on the extracted dataset with a given value of fipco.
The Assign labels to the output ... line lets you specify the name of a variable in the dataset whose value will become the label of the variables after they are transposed. A frequently used variable for this function is Areaname, i.e. it is a character variable the value of which describes (identifies) the (old) row. The variable label appears as the second row (line) of a csv output file, with the variable name appearing in the first row. In a report output the column headers are by default the variable names, but you can override this by checking the option in section V-b (Advanced Report Formatting) saying to use variable labels as column headers in reports.
Transposing is a tricky feature. As a practical matter it requires that you have a set of numeric data items with similar characteristics and just one or two ID variables which are used to name and/or label the resulting columns.
This section can be used to provide parameters used to select observations based upon distances from a specified point (calculated by Dexter). This feature only works provided that
_distance Less Than 100
The two text boxes labeled Latitude and Longitude can be used to enter the coordinates of a location that will be used to calculate the _distance values. These values should be entered as decimal fractions, with as many digits after the decimal point as you think are needed. A typical value for the Latitude would be 38.9898, and for Longitude -92.313400 . The leading minus sign on the longitude coordinate is optional and will be assumed if not entered. (The program only works for locations in the western hemisphere. ) The trailing 0's on the Longitude value here make no difference and could be left off.
You can also enter the coordinates in degree-minutes-seconds notation rather than as decimal fractions. To do so you must use dd.mm.ss format, where dd is the number of whole degrees, mm is the number of minutes and ss the number of seconds. Both mm and ss should include leading 0s if necessary. For example 38.40.04 would represent 38 degrees, 40 minutes and 4 seconds north latitude. A value of 41.40.64 would be invalid; the value of MM and SS must be between 00 and 59. The program will recognize that you are using this format by checking for and seeing two decimal points (periods) in the entered value.
Alternatively, you can enter a 5-digit ZCTA/ZIP code in the Latitude box. The program will recognize a 5-character string without a decimal point as a ZIP value. It will then do a lookup of the internal point coordinates for that ZIP code in a reference data set, and will use those values to do _distance calculations.
There are 2 check-box options that follow. The first can be checked in oder to specify that you want to keep the calculated _distance variable as well as the intptlat and intptlon coordinate variables which may have been looked up. The second checkbox lets you decide how you want to handle cases where the coordinates of the current observation cannot be determined. Do you want to keep those observations or filter them out? The default is to filter them out, but you can easily override that by checking this box.
Reset Defaultsbuttons) are intended for use by the persons responsible for developing and debugging the software. Please ignore this section. This does not include the very bottom section starting with
Questions and comments .... We strongly encourage you to take advantage of that section.