We'll present a set of simple examples to give you a flavor of the kinds of things that geocorr can do for you. You should spend some time with the extensive HELP module available for the application (its all on a single HTML file so it can be easily printed off and read off-line if you prefer.) If you are not familiar with one or more of the various geographic units shown in the geocode selection lists then you should follow the hyperlinks to the MAGGOT geographic glossary page. (Both the HELP and MAGGOT pages can be referenced from the row of reference links (beginning with "Examples") across the top of the main form.)
As you go though these examples we suggest you enter the specifications as described and run the application, so you can see for yourself what the results will look like. You might even want to get daring and create your own examples. After reviewing the results of a run, use the back key to return to the form page and try changing one of the parameters and running the request again to see what difference it makes.
It is important to note that while geocorr may appear to be overly complex for a casual user, the most typical applications (as illustrated, we hope, by these examples), require the user to specify only a small number of them. You can begin doing simple correspondences and geocode reference listings, and then when you master these basic applications, proceed to the more advanced options involving things such as circular areas, bounding boxes and the much misunderstood "Concentric Ring Pseudo-Geocodes". You can get 80% of the functionality of the application using about 20% of the options. Casual and first-time users should begin by focusing primarily on the Input Options and Output Options sections. They are the only required specifications.
Option/parm Specify Value Comments ----------- ------------- ---------- Input Options: state Missouri Use scroll bar to get to it in select box at top of Input Options section. You MUST select a state.
SOURCE geocodes County (1990) (D) Census Tract/BNA(1990) (D) if you select this without selecting "County" geocorr will select it for you since a tract/BNA without a county makes no sense.
TARGET geocodes ZIP code. (D) You should have lots of questions about this choice. What's the source? How current, complete? See the MAGGOT entry to get more information.
Weighting Var. Population (D) Shows up in output report and file.
Ignore zero ... (check it) You should almost always choose this.
Output Options: Have weighted.. (check it) See HELP for details. Puts an x-y coord. pair on each outline line/record.
Generate a CSV..(check it) (D) Program will know to generate a comma separated value file with "results". Leave the "Just Codes" selection in the select box for the CSV file.
Generate a listing file...........(check it) (D) Generates a report-format output file. This is what the human looks at. (The .csv file is what a program looks at.) Check on "Codes and Names" option here. Then you'll get names for the counties and the ZIP codes (but not for the tracts --census tracts are not named.)
Point-and-Distance Options: (skip)
Bounding Box Filter Options: (skip)
Geographic Filter Options: County codes text 019 This is the 3-digit FIPS county code for Boone co. Note that the "County codes" string is a hyper-link to code pages showing all FIPS county codes. Try it! Could also enter "29019", but since we have selected just one state we can get by with just the 3-digit county code.
Click on the "Run Request" button to initiate processing. A Perl script will examine what you entered on the HTML form to verify that there are no invalid or potentially dangerous characters being passed. It will then invoke SAS(r) and pass it the form information and tell it to run the special geocorr SAS program. Geocorr should take anywhere from a few seconds to several minutes to execute, depending on system load and on your request. This example should take about 20 seconds if there is a normal load on the system. The hour glass will appear while it is running. When it finishes you should be presented with a menu screen labeled "Results of Query". This menu page will list the 5 output files produced by the request with a brief description of what is contained in each. You can almost always ignore the first of these files (the SAS program log). The summary.log should always be checked for any warning or other messages regarding the query. It will provide information about when and where the application ran, what parameters were specified, the number of records written to the output files, how long it took, etc.
The important outputs are the geocorr.lst and geocorr.csv files. Click on each to view them in your browser. Use the browser to save them to a local file and/or to print them. Notice the dramatic difference in their formats, but the nearly identical nature of their contents. Because we asked for "codes and names" on the listing file and not on the CSV file, there will be some content difference in this case. But the basic data content is the same.
Look very carefully at the first two data lines (after the title and column header liens) of the listing file. These two lines have information about the first value of the "source" geocodes we requested -- the first county-tract. It shows each of the values for the "target" geocode(s) that intersect with this area. In the example, we see that tract 0001.00 is partly in ZIP 65201 and partly in 65203. The degree of the intersection is measured by the weighting variable, 1990 total population. This small tract has only 430 people in it, and of these, about 408 lived in 65201 in 1990 and the other 22 lived in 65203. The AFACT (allocation factor) column shows the decimal portion of the source area contained in the target area -- ".949" in the first line means that 94.9% of the tract is contained in the ZIP code for that line. This is based on 1990 population. If we had chosen Housing Units or land area for our weighting variable, we'd see different value for this factor.
Notice that many of the tracts appear on only one line -- they correspond entirely to a single ZIP code. And notice that the values of the AFACT column always sum to 1.0 for all the lines corresponding to one tract.
The columns labeled "INTPTLNG" and "INTPTLAT" are poorly named. These appear as a result of our checking the option to have "weighted centroids" calculated and kept on the output file(s). Where do these come from? The geocorr program is working with a database that has observations at the 1990 census block level. Each observation has these "internal point" coordinates indicating where the spatial centroid of the census block is located. When the program generates a line of the output files, it is really just combining information from all the blocks that are in the intersecting areas. The first line of the report comes from looking at all blocks that are in tract 0001.00 and in ZIP 65201 and summing the 1990 populations of those blocks. At the same time the program looks at the latitude-longitude coordinates of each block centroid and weights each by multiplying it by the population of that block. Prior to output after processing all blocks for a tract-ZIP pairing the program divides the weighted coordinate sums by the population total for the area, creating this "weighted centroid". This location is biased towards where the people actually live within the area, rather than just on the geometry of the census blocks (if land area is chosen for the weighting variable, then the resulting weighted centroids are more of a spatial center.)
The comma delimited (".csv") file can be browsed and then saved to your local disk with your browser's "save as" command. (You might even want to configure your browser to invoke a helper application to customize processing.)
save the file, you should be able to open it for processing by most spreadsheet
and data base programs in Windows. Notice that the first line of the file
contains the names of the fields -- when you import these data into Excel or
Lotus you'll see that these names appear as the first row of the spreadsheet. To
get a more detailed description of what these variables are you can browse the
varlst.lst file, the last entry on your Query Results page. This is
usually a very short file, and in most cases we ignore it (because we already
know what the fields are -- but you may find it very helpful in trying to
interpret what you have.)
Option/parm Specify Value Comments ----------- ------------- ---------- Input Options: State Missouri and Illinois Will need to hold down Ctrl key to select two items from the Select list (with most browsers, at least).Hit one of the Run Request buttons to submit the new request. Follow the usual procedure to view your output elements by clicking on the filenames on the Query Results page. The key output is the listing file. What you should see if you entered the options as specified is a rather long report that lists all of the cities (places) in the St. Louis MSA (including the Illinois side.) It is not really much of a report in terms of showing any geographic correlation. Mostly, it simply tells you what county each place is located in. There are a few cases of a place being in more that one county, in which case it shows you what portion of the place is in each of the counties. Note that the value of AFACT2 represents the portion of the county that is in the place (so we see that about 14% of the population of Madison county, Ill is in the city of Alton.)
Source geocodes Place: city...
Target geocodes County Output will show places (cities) related to counties.
Weighting var Housing Units Instead of default, population.
Output Options: Weighted Turn off. We really don't use these too much. centroids
Generate AFACT2 Turn on. This will cause the program to do double work in terms "allocation factors". Now we get the portion of the source codes in the targets, and the portion of the targets in the source areas.
Generate CSV file Turn off. It'll run faster without it, so if you don't plan to read it with a program...
Geographic Filtering options: County codes. (blank) Will not be filtering at county level.
Metro areas 7040 Selects St. Louis MSA. To see all the metro codes you can enter note that the "Metro Area codes" heading is a hyperlink.
Remember that everything is frozen in the 1990 time-frame. The data you see for O'Fallon, Mo. is based on the boundary of that city as defined for the 1990 census; it is not the current definition of that place. Likewise, of course, the housing unit (weight variable) counts are from the 1990 census.
If you are familiar with the St. Louis metro area you might expect to find
the cities of Troy and Warrensburg, Missouri in this report. These cities are
in Lincoln and Warren county, which were added to the official metro area (MSA)
definition in 1992. But the metro codes stored in the MABLE database are as
of the 1990 census so these two counties will not be selected. You could fix
this by going back and entering the FIPS codes for the two "missing" counties
in the box provided for filtering by county.
Option/parm Specify Value Comments ----------- ------------- ---------- Input Options: state
Remember, you MUST select at least one. You could, of course, select more than one.
SOURCE geocodes County (1990) Metro Area: ... You are requesting a listing of the counties (or county equivalents) and the corresponding MSA/CMSA areas. If you are unfamiliar with the MSA/CMSA concept go to the MAGGOT file and read the explanations there.
TARGET geocodes Entire Universe. Basically, this says you don't want any target layer(s): you just want to know about the source geographic areas in their entirety.
Weighting Var. Population (D) Shows up in output report and file.
Ignore zero ... (check it) You should almost always choose this.
Output Options: Generate a CSV..(uncheck it) Program will NOT generate a comma separated value file.
Generate a listing file...........(check it) (D) Generates a report-format output file. (You MUST check either the .CSV file option or this one - otherwise you have no output!)
Codes and Names..(select this) From the Select list for the listing file -- IMPORTANT for this kind of request. You want to see both the codes and the names associated with those codes.
Leave all other parameters and options unspecified.
Click on the "Run Request" button to initiate processing. Wait patiently for the "Results of Query" page to come back to you.
The important output is the geocorr.lst report file. Click on it to see the report. It should be sorted by (state and) county. The column labeled "COUNTY" contains the 5-digit FIPS code and the field labeled "COUNTYNM" has the name of the county (including the state abbreviation.) These columns are followed by the MSACMSA and MSANAME field with comparable data (code and name) for the metropolitan area. For counties or portions of counties (only in New England) falling outside any metropolitan area you'll see the code '9999' with "Non-metro" for the name. The POP column contains the 1990 complete count population for the county/metro area. For all but a few counties in New England this figure will represent the population of the entire county. The AFACT (allocation factor) column is a constant "1.000" as it always will be when "Entire Universe" is specified for the target geocode.
As an optional exercise for the more serious geocorr user you might try
rerunning this request but with the following changes:
In this case what you will see is a report very much like the one you
just generated in that it will be counties within metro areas on each line of
the report. But the AFACT values will now almost all be less than 1.0. Do you
understand why? If not, remember that AFACT is defined as the "portion of the
area defined by the source geocodes contained within the area defined by the
target areas". In this instance, it become the decimal fraction portion of the
county population which is also included in the metro area. (In the original
example, AFACT represented the portion of the county-metro combination that was
contained with the Entire Universe.)
Option/parm Specify Value Comments ----------- ------------- ---------- Input Options: State Missouri. If the n-mile circle went outside the state we would not pick up those pops.
Source geocodes County When you choose county, state is implied (selected by the program.)
Target geocodes Entire Universe We're not really doing a "correlation list" in this example. We just want the sum of our weight variable - the 1990 total population - for the circle we'll specify.
Output Options: Select "Codes and Names" for the listing file. So you'll know what the counties are.
Point and Distance Options: Coordinates of point: Go down a little on the form to where Latitude: 38.545881 there are a series of links provided Longitude: 91.019346 to help you determine point coordinates. Click on the link to "Gazetteer" (at the Census Bureau). On the form presented enter the city as Washington, the state as Mo and the ZIP as 63090 (optional but helps). The application will present you with the coordinates (among other useful things such as links to maps and census data.) Write the coordinates down, hit "back" several times to get back to the Geocorr form and type in the coordinates. A leading "-" on the longitude is optional. West longitude is assumed.
Label of point Washington, Mo Not required but useful.
Value for Radius or Largest Ring 30 This means 30 miles. If you click the option box above it would mean 30 kilometers.
Hit the Run Request button to run the job. You have told geocorr to find 1990 census blocks whose centroids are within 30 miles of a specified point which we hope is near the center of the city of Washington, MO. We have specified that we want to look at the relationship of counties to the Entire Universe for this geographic area. If you read the fine print (the Note: at the bottom of the bottom of the Point and Distance Options section), you'll be told to expect some extra items in your report when you specify a point and radius. The intptlng and intptlat variables contain the weighted average of the block centroid coordinates for all census blocks that were aggregated to create the output summary line. These are of value only as a general indicator of the "center" of this geographic intersection. The distance variable is the distance (in miles or kilometers, depending on the option you selected on the form -- miles, in our example) between the specified point and (intptlat,intptlng). It thus represents sort of an "average" distance.
The POP item on the output file represents the sum of the block
populations for all the census blocks used to create the geographic summary
area. In this case the output line for Franklin county has a POP figure that
is the total of 1990 population for all blocks that are both in Franklin
county and within 30 miles of our point. To get the overall total population
for the 30-mile circle we shall need to add all POP figures from our output
report. Or, we could go back and rerun the application and choose STATE
instead of COUNTY as our source geocode; then we would get only a single output
line -- the 30-mile circle intersected with the state.
Option/parm Specify Value Comments ----------- ------------- ---------- Input Options: State Missouri. Maybe we should make this the default.
Source geos Block group County and tract will also be selected for you.
Target geos Concentric Ring.. Geocorr will assign the "ring" code dynamically based on x-y coordinates and series of ring values specified below.
Ignore block.. Turn on. Almost always saves time to ignore blocks with no "weight".
Output Options: Generate listing. Turn off. Only want CSV file, do not want report.
Sort by target Turn on. So the output is sorted by the ring geocodes, then.. numbers first, then by block group.
Use tabs on CSV Turn on. Output file will have tabs between fields instead of commas.
Point and Distance Options: Coordinates: latitude 38.70763 longitude -90.31118 Leading "-" is OK, but not required. So how did we get this? Go down to the Yahoo Map Server link, just below. Click on it & wait for page to appear. Type in 8001 Natural Bridge Rd in first box. Then type St. Louis, Mo 63121 in City, State box. It may take a minute for map to appear. Put cursor over "Print Preview" and read coordinates from the URL that appears in the box at the bottom of the browser window. Do NOT click the button. This is a secret trick. It also works for street intersections ("6th & Locust") or just city names ("Cool Valley, MO" with address box left blank.) ZIP code is optional.
Label of point UMSL. University of Missouri St. Louis.
Custom list of ring radii #1: 1 #2: 3 #3: 5 Fill in the ring radii in ascending order. Note that we do NOT enter the radius value or the "# of equi-distant rings".
If you were going to do this a lot or you were in a big hurry you could make
this example run somewhat faster by telling the program which county or counties
your circles fall within. In this case, we could have gone down near the bottom
of the form to the Geographic Filtering Options section. There, in the text box
for the County codes we could have entered
to specify that we wanted to restrict our query to the 2 counties with these FIPS codes. These are the codes for St. Louis county and St. Louis city (which we could have "looked up" using the link there had we not already known these.) Do not do this, of course, unless you are sure that your circle will not go beyond the counties entered (or unless you don't care and want to limit the search to these counties anyway.)
A typical use of such an output file would be to save it from your browser
to a file
and then to bring it into another program where you would use it to select all
the block groups it contains from a data extract file (from STF3, for example).
Then you could sum the numbers for those block groups (multiplied by the
values of the AFACT variable to "allocate" the data when a BG is in more than
one ring.) We hope some day to enhance this application to allow this kind of
post-processing to be integrated into a system which uses the geocorr