The Geocorr geographic correspondence engine generates files and reports showing the relationships between a wide variety of geographic coverages for the United States. It can, for example, tell you with which county or counties each ZIP code in the state of California shares population. It can tell you, for each of those ZIP/county intersections, what the size of that intersection is and what portion of the ZIP's total population is in that intersection. The application permits the user to specify the geographic scope of the correspondence files (typically, one or more complete states, but with the ability to specify counties, cities, or metropolitan areas within those states), and, of course, the specific geographic coverages to be processed. The latter include virtually all geographic units reported in the 1990 U.S. Census summary files, and several special "extension coverages" such as 103rd Congress districts, PUMA areas used in the 1990 PUMS files, Labor Market and Commuting Zone areas, and even hydrolgical unit codes (watersheds.) The application creates a report file and a comma-delimited ascii file (by default) which the user can then browse and/or save to their local disk.
The output files created by this application are referred to as correlation lists. Other commonly used terms for such entities are equivalency files, crosswalk files, and geographic correspondence files. A correlation list consists of a set of source geocodes specifying the geographic coverage to be related (i.e., the "known" geographic coverage), and a set of target geocodes specifying the geographic coverage to which we want to relate the source areas. Frequently (always, in the case of files generated by this application) the correlation list will include a variable to measure the absolute "size" of the correspondence (such as the land area of the intersection or the number of persons living in the intersection). When such an absolute measure is present then there may also be an allocation factor variable that indicates what portion of the source area is located within the target area. An entry in a census tract-to-ZIP correlation list (i.e., a list with census tract as the source coverage and ZIP as the target) might contain the population living in the tract/ZIP intersection and a number indicating what decimal portion of the tract's total population also live within the ZIP. The sum of these allocation factors for any specific value(s) for the source geocodes(s) should always be 1.0. For example:
COUNTY TRACT ZIP POP AFACT 29510 1101.00 63109 1250 .500 29510 1101.00 63110 625 .250 29510 1101.00 63111 625 .250
Here we see three entries from a tract-to-ZIP correlation list. All three entries are for the same source code — census tract 1101.00 within county 29510 (city of St. Louis, MO). The entries show that the tract intersects with three different ZIP codes (estimate based on 1990 census) and show the absolute and relative sizes (POP and AFACT, respectively) of the intersections. Note that if we add the three POP values we get the total population for the tract (2,500), while if we add the three AFACT values we get (as always) 1.0.
Typically (always in this application, unless overridden with an option) correlation lists are sorted first by the source geocodes and then by the target geocodes within the source codes.
MABLE is an acronym for Master Area Block Level Equivalency. This is the database used by Geocorr to create the correlation lists. "Block" here refers to census blocks, the smallest geographic units used in tabulating the decennial census. It was chosen as the base unit for the application, because the Census Bureau used these blocks as their "atomic unit" for all other census-based geographies in 1990. Thus, census blocks will never cross a place (city) or MCD (county subdivision, township, New England town) boundary. While they can and do cross ZIP code boundaries, for the sake of this application (and based on the Census Bureau's offical 1990 Block-ZIP Equivalency file) each block is assigned to a unique ZIP code (vintage October, 1991). The MABLE database is actually a collection of 51 state-level datasets containing a total of just under seven million block entries.
The form has gotten rather large as we have added more features. But the basic features are still all contained in the Input Options and Output Options sections.
These are the options that control the basic nature of the correlation list. Here you specify the states, the geocodes, and the weighting variable. Most items have been assigned default values, so you need to at least consider each one. If you do not, then the default value remains in effect and you need to be sure that this is acceptable. If you do not specify the weighting variable (for example), the program assumes POP and does so without any dialogue with the user.
Click on one or more state names in this select box to indicate the state or states that you wish to process. You must specify at least one state (if you don't, the application will abort the run.) Note: If you need to run the application for large numbers of states, please do so on weekend or during off hours.
This application makes use of a lot of different levels of geography (coverages), most of them corresponding to standard Census Bureau defined areas. To help users who are not familiar with these types of geography, we have created this glossary file with more detailed descriptions of each of the area types. Please check the geographic glossary if you have questions about any of the geographies before you send a note asking for additional explanation.
These two side-by-side select boxes are used to tell Geocorr which geographic units you are interested in. Note that we have pre-selected some defaults for you (census tract to ZIP code). The "Entire Universe" selection can be used when you just want to relate one kind of geography to the whole selected area. This option is most useful in conjunction with the point-and-radius or bounding box features. For example, you can select "County" as the source and "Entire Universe" as the target with Population as the weighting variable. This is an extremely inefficient way to have Geocorr simply print a report with the total populations of these counties. (If this is all you want, we have a link in the Geographic Filters section that will give you this information a lot more quickly.) But if you also specify point-and-radius values, Geocorr will then produce output showing the population of each county within the specified radius. It will only process counties in the selected states, of course.
Click on one or more geographic codes you want to use for the source portion of the correlation list. The output will normally be sorted by the values of these codes. For example, if you select "County" and "Tract" (the defaults), then the output file will be sorted by county and then tract. The sort order is the order in which the variables appear in this select list. Certain geocodes occur in a hierarchy, so that selecting them automatically triggers selection of a higher-level qualifying variable. These are MCD (implies county), tract (implies county), block group (implies county and tract), and block (implies county and tract; block group is not selected but it is implicitly present as the first digit of the block). Note that county is a five-digit code that includes the state code. State is usually added to the output file as an extra ID variable, even if it is not explicity selected as one of the source or target geocodes; this is not the case if county is chosen, since these codes allow you to derive the state.
Most codes are FIPS (Federal Information Processing Standard) if defined, or census codes otherwise. Be sure you understand that selecting multiple source geocodes means that the source areas are the intersection areas formed by looking at values for all the source codes. Thus, if you select County, MCD, and ZIP for the source geocodes, then the source areas represented on the resulting correlation lists are formed by the intersection of these three area types: i.e., a portion of a ZIP code within an MCD within a county. If what you actually want is a correlation list for MCDs and a (separate) correlation list for ZIP codes, then you need to invoke the application twice; Geocorr creates only one list per run.
Most of what was said for the source geocodes, above, apply equally to the target geocodes. Do not select the same geocode in both lists. The codes you select here define an area formed by the intersection of those areas. The correlation list defines the relationship of the source codes to these target areas. The default value (what you'll get if you do not click in this select box at all) is ZIP — the 1991 five-digit ZIP code, as defined in the Census Bureau's ZIP-Block equivalency file. See the geographic glossary file for details.
When you choose this geocode (in either the source or target list), then you must provide some additional information in the Point and Distance Options section. By selecting this geocode, you are creating a new geographic coverage based on geometry rather than any predefined boundaries. Think of a dartboard target, with a small circle at the center, and then concentric circles surrounding that; this is what these CRGs look like. The precise location and size of these areas are specified later in the form. Each block-level entry in MABLE has an x-y coordinate associated with it, and these internal points can be mathematically assigned to these ring areas using a simple distance formula. You can use this feature to do applications such as examining a 100-mile radius of a city or ZIP code, and finding out approximately how many persons (in 1990) lived within a 25-mile radius, in the 25-to-50 mile ring, and in the 50-to-100 mile ring. For further discussion see the Point and Distance options section, below.
This option lets you specify the level of detail for the HUC (aka watersheds) codes. For a more detailed description of HUCs and the meaning of the four levels, see the geographic glossary file.
Select a single variable to be used to measure the amount of intersection between the source and target geocodes on the output file. By default, the decennial complete-count population will be used. On the output, this variable will contain the sum for all the blocks used in creating the output record. While the primary reason for processing such a weight variable is to provide some measure of the degree of intersection of two sets of geocodes, you can also take advantage of this to have Geocorr simply count people of housing units or square miles in selected jurisdictions. Want to know the population (1990) of labor market areas in your state? Just select your state, then LMA90 as the source code and state (or entire universe) as the target code, with pop90 as the weighting variable. Each line of the output list will contain the 1990 total population of one LMA in your state.
There are many census blocks that occupy space but have no population. When building a correlation list with population as the weighting variable, you may find that leaving these blocks in results in output lines showing a correspondence that has a value of zero for the population and allocation factor variables. This indicates some spatial overlap between the areas, but no population in that overlap. If you check this box, then those lines with zero values for the weight variable will not be present on your output. It will also make processing slightly faster, since the program will have fewer observations to process.
These options specify details about the output generated by Geocorr. In many cases, you will be able to accept all defaults for these options.
Each of the census block entries in the MABLE database has a pair of latitude, longitude coordinates for an internal point of the census block. (This is the geometric centroid of the block, except in those few cases where the true centroid is not within the block, in which case it is moved to a location just inside the block.) When you select this option, Geocorr keeps these coordinate values, and as it processes the blocks within the source/target geocode groups, it takes a weighted average of their values (using the weight variable specified in the Input Options section — usually population.) The result is that on the output files, you will have two extra columns of data: INTPTLNG and INTPTLAT. They will be in degrees, with up to six digits after the decimal point kept. West longitude is assumed, no minus signs.
The standard output correlation list from Geocorr has a single AFACT allocation factor variable, which indicates the decimal portion of the source geocodes contained within the target geocodes. It may also be useful to know how this works going in the other direction, i.e., to know what portion of the target area (the complete target area, not just the part within the source area) is contained in the source geocodes. Selecting this option causes Geocorr to do the extra processing and calculations required to create such a dual-factored list. The best way to see how it works is to select the option once and study the AFACT2 values.
Normally the output file is sorted by the source geocodes, then by the target geocodes within the source. This option lets you override the default and have it sorted by the target codes first. An example of where you might want to use this option would be in creating a ZIP to CD103 list. You want to look at which ZIPs and what portions of those ZIPs make up each Congressional District. But you want the results organized by CD first, so that you can focus on the portion of the report relevant to the district you want to mail to. Specifying this option causes the output to be sorted by CD first, then show all the ZIPs within each CD together with allocation factors indicating what portions of the ZIP are in the CD. (If you specified CD103 as the source and ZIP as the target, you would get the sort the way you wanted, but then the AFACT allocation factor values would show the portion of the CD that was within the ZIP, which is typically a very small and relatively useless number.)
There are two basic output files available. Each is optional and each has its own options for specifying what it contains.
This option is selected by default. Generally this is the option to use if you want to do processing of the correlation list back on your platform using your favorite software package. The .csv file extension is a standard that is recognized by most Windows programs, making it easier to import the data into those applications. Note that this file will have the variable names as values in the first line (the header record), which when imported into a spreadsheet such as Excel or Lotus will become the first row. If you have no interest in obtaining such a file (you only want the report format), click on this box to turn off the option. It will save processing time.
In many cases it will be convenient to carry along names to go with the codes on your output file. If you select "Codes and Names" or "Just Names" then, for any geocode for which Geocorr has a name table and that you select as either a source or target geocode, the program will add a new variable (with name ending with "NM", e.g., PLACENM, COUNTYNM, etc.) to the output CSV file. Usually, if you want names, you should select the "Codes and Names" option, rather than asking for just the names.
Some software packages accept tab characters to delimit fields, just as commas serve as field delimiteds in CSV files. An advantage of using tabs as the delimiter is that when you browse the file, the tab characters make the data fields start on tab stops, so its easier to see what the data values are.
You'll normally want to leave this option selected so you can at least see a nicely formatted version of your output. (The CSV file is intended more to be program-readable, although you can browse it and count commas.) This is the preferred format for using as a reference report. The lines can be up to 157 characters across, and it will print 240 lines before generating a page break with fresh column headers. Source geocodes will always appear first (leftmost) on the report, and consecutive duplicate values of the source geocodes will be blanked out to emphasize breaks in the value of the source codes. This will normally be the largest output file. If you do not need or want it, then you can save processing time by deselecting this option.
See the discussion, above, of names for the output CSV file. Generally, you are more likely to want names on the listing output than on the CSV file. The default is Just Codes, so you have to select this option to get the names included.
All the options that remain have to do with limiting the set of blocks that will be processed by Geocorr by specifying lists of county, place, or metro-area level codes that will be used to limit the areas processed.
Geocorr allows you to specify lists of three types of geographic areas that can be used to further limit the geographic universe.
By default, all filters in this section will be combined, keeping only those areas that satisfy all criteria (AND). For example, if you specify three counties and a metro area, you will only get data based on blocks that are in both the counties and the metro area (i.e., the intersection of the selected areas). You may override this default and choose geographies that satisfy any of your filtering criteria (OR).
This option specifies that if multiple types of geocodes are used to filter, they each should be considered as sufficient rather than necessary conditions for inclusion. For example, if I select "AND" and I then enter a value in the place codes box for Kansas City, MO and a value in the county codes box for Jackson County, MO, then the universe would be limited to the portion of Kansas City within Jackson County. But when I select "OR", I will get all blocks that are either in the city of Kansas City or in Jackson County. So now I get all of the city (which I did not before), plus I get the parts of Jackson County that are not inside the city.
To limit the universe based on one or more counties to be selected, you can enter their FIPS codes in the box provided. Be careful to enter full five-digit codes when processing multiple states; three-digit codes are OK if you only selected a single state for processing. Specifying a code for a state that was not selected will cause an error and Geocorr will not complete processing. If you need to look up a county code, click on the "County codes" hyperlink. You'll have to note what the codes are and enter them after returning from the the linked-to code pages.
Enter five-digit Core-Based Statistical Area codes here to filter based on the Metropolitan/Micropolitan statistical areas. You may use the special value "-99999" to select only those places which are outside any metro or micropolitan area. Note that we no longer support filtering using the old (vintage 2000) four-digit MSA/CMSA/PMSA codes. Examples: "27620 42740" selects the Jefferson City Metropolitan and Sedalia Micropolitan Statistical Areas, MO; "-99999" selects all geographic areas that are not within a CBSA.
Enter four-digit FIPS metro area codes with leading zeroes separated by blanks. MSA, CMSA and PMSA codes may be used (4-digit only). The 2000 definitions will be used. Your output will be limited to the metro areas specified. New England NECMA codes cannot be used. A code of "9999" selects non-metro areas (alone or as part of a list of metro area codes). A code of "-9999" excludes all non-metro areas. The -9999 code must be entered as the entire value for the list.
Alternatively, you can enter five-digit CBSA codes here to filter based on these more current and more inclusive (they include micropolitan as well as metropolitan) areas. You can use the special value "-99999" to select only those places which are outside any metro or micropolitan area. See our Metro Pops report (for the larger Metropolitan SAs) or Micro Pops report (for the smaller Micropolitan SAs) to obtain these codes. (Or, use the geographic codes lookup web app.)
Enter five-digit FIPS urbanized area or urban cluster codes with leading zeroes separated by blanks. Your output will be limited to the UA/UC areas (which are mutually exclusive) specified. You may use the special value "-9999" to specify that you want to exclude all non-urban (i.e., rural) areas. The -9999 code must be entered as the entire value for the list. Examples: "00415 02062" the Ada, OK Urban Cluster and the Ann Arbor, MI Urbanized Area (only works if you have also selected the appropriate two states); "-9999" selects only areas that are urbanonly areas that are urban, using the current definition.
Enter seven-digit FIPS place codes with leading zeroes separated by blanks. (You may enter five-digit codes if you selected only one state.) Output will be limited to the official city limits of these cities, as of the 2010 census. Enter a value of "-99999" to indicate that you want to exclude all areas that are not inside any place. You will get all areas that are unincorporated and not within a Census Designated Place. Examples: "70520 70545 70550 53780" selects Saginaw City, Saginaw Township North and South, and Midland, MI; "-99999" selects all geographic areas that are not within a place.
Be sure to specify leading zeroes in all codes.
Geocorr permits the selection of geography based on geographic coordinates. Specifically, blocks from the MABLE database can be filtered (excluded from further processing) by using their internal point coordinates to determine their location. You can specify limiting processing based on a specified point and a radius about that point, or using a rectangular bounding box criteria.
If you enter values for a specified point location as decimal degrees of longitude and latitude, you are telling Geocorr that you want it to calculate the distance between that point and the internal point of each census block on the MABLE database that is otherwise selected for processing (i.e., that first passes the other geographic filters mentioned above). The variable distance will be added to your output files. The distance is that between the weighted centroid of each target area and the location specified in the point coordinates here. If you specified "Concentric Ring Pseudo-Geocode" as either a source or target geocode, then distance and weighted centroid values will not be calculated or stored (because weighted centroids of donuts are misleading).
Note that the longitude value entered is assumed to be west longitude and the leading minus sign is optional; if entered, it is ignored. Entering a value of "92.6543" is interpreted as 92.6543 degrees west longitude. Geocorr expresses all coordinates with this convention: Longitudes on output files are also expressed as positive values for west longitudes. Many GIS programs will require these values to be negated if these coordinates are to be processed.
You cannot enter just one of the coordinate values — if you specify a longitude value then you must specify a latitude value as well.
This box can be used to enter a descriptive label for the point. This label is picked up and used as part of the descriptive label for the distance variable which is included on the output file when you specify a point. This is an optional entry.
This check box can be used if you want the value in the following box to be interpreted as kilometers rather than in miles.
This entry is used in conjucntion with the coordinates of the specified point in three possible ways:
Using this option has a dramatic effect on the way you interpret the entries in the output correlation list, since everything there has to be qualified by starting with the initial filtering options. Typically, use of this option, will be used with a very large target area (such as a complete state or metro area) and the real correlation is between the n-mile circular area and that large target area or areas. For example, you could specify a place-to-state correlation list (source geocodes=place, target geocodes=state), with a metro area filter (only the portions of the places with the specified metro area are processed) and the coordinates of the metro airport entered with a radius of three miles specified. What results is that only blocks within three miles of the airport are selected. On output the POP figure shows the total persons living in the specified places and also in blocks that are within three miles of the airport, and the AFACT variable will typically be 1.0, since all of the blocks in the selected place will be associated with the same state. It is critical to remember that the population figure shown is not the total population of the place, but only the population of the portion of the place within three miles of the specified point. If you need to know what portion of the total population of the place is within this circle, you will have to do some special postprocessing, since this figure is not readily generated directly by Geocorr.
Note that whenever you specify the point option, a DISTANCE variable is added to the output file. This distance is in miles (or kilometers if you checked that box) and represents the approximate distance from the calculated weighted centroid of the output area (source/target intersection) and the specified point. When you are using the point-and-radius options strictly as a filter, you may well have no interest in this item, but it is included in the output nonetheless.
This DISTANCE variable and the weighted x-y coordinates of the output areas will not be kept if you have requested ring pseudo-codes. We decided to force these items to be dropped in this case, because when dealing with a donut-shaped area, the meaning of a weighted average of the coordinates of the blocks making up such an area is at best meaningless, and at worst confusing and misleading.
There are two mutually exclusive ways in which you may specify the specific ring-shaped areas that you want Geocorr to determine for you. You can specify a value for number of equidistant rings; this value must be a positive integer (no fractional portion allowed) between one and 10. Geocorr will divide the radius by this number to determine the size (distance between the inner and outer radii) of each ring. Geocorr will process at most 10 such rings. If your areas are not of uniform size, then you may explicitly enter the outer diameters of each in the boxes provided. In this case, you should not enter a value for the number of areas. Enter the values in ascending order, starting with box #1. The last value entered should have the same value as you entered for the radius of the largest ring (or you can omit that value altogether when entering these explicit values).
We have provided a number of links to web sites that can be of some assistance in determining the lat-long coordinates of the location you have in mind. Our personal favorite is the Census Bureau's Gazetteer application. It lets you type a ZIP code or a city name or a county name (all within state) and will return you a list of geographic entities that match your request. This returned page will have links on it to allow you to jump to the TMS map server to view the area you want to study. We have also provided a direct link to the TMS (Tiger Map Server) application, but in most cases you'll be better off using the Gazetteer as a way to access this application with the area you have in mind already displayed. Otherwise you have to wait for TMS to display a map of the Washington, D.C. area, which is unlikely to be what you want.
The Yahoo Map Server is excellent for getting down to street address or street intersection level and seeing the area of interest. It does not explicitly return latitude-longitude coordinates. However, with your address showing on the map, move your mouse to the "Printer Preview" button and then observe the URL corresponding to it at the bottom of your browser window: it contains the coordinates you need. Write them down so you can enter them in the form when you return to Geocorr.
Finally, we provide a link to Michigan State's Weather Station location page.
This is simple. You enter the latitude-longitude coordinates of the extrema coordinates to define a rectangular area, or bounding box, to which you want Geocorr to restrict itself. Only blocks with internal coordinates inside this box will be selected. Coordinates should be in degrees, with decimal notation (no minutes or seconds). If you accidentally switch the low/high values, the program will check for this and switch them back for you.
When your request is successfully processed, you should see a screen with a series of filenames and descriptions, with each of the filenames being a hyperlink to the file itself. There are four possible output files, depending on what options you select.
This file gives a very brief summary of what you requested and a little about what the program did to satisfy the request. It tells you how many census blocks were selected for processing and how many lines (records, observations) actually made it to the output files. The first line on this file tells you what your "Process id" was for this request. If you have any problems with your request you need to be sure to save this key number and report it to the authors with a description of what went wrong.
This is your listing (i.e., report format) file. It is usually the largest of the output files and often the most important. Note its size before attempting to print or save it to your desktop, since it may be quite large. If you filled in that box on the form that let you specify a name other than "Geocorr," you should see that name here instead. The same applies for the CSV file, next.
This is your comma-delimited ASCII file. You might want to browse/preview it, but you'll most likely want to save this back on your local disk. You should be able to easily load the file into a spreadsheet for further local processing.
This is a very short file that simply provides a little extra information about the variables, as specified in the header record, on your CSV file. If you did not request a comma-delimited file, then you will not get this file either; they are a matched set. The report lists each of the variables (fields) on your file and adds a descriptive label to help you identify what each means. You'll note that the variables have a consistent order in this report and on the CSV file, with the source geocode fields appearing first, followed by the target codes and then the weight variable, allocation factor(s), and any x-y coordinate and distance-to-specified-point items. If you did not explicitly specify "state" as one of your geocodes, you will nonetheless see it added to this file as well, usually after the last target geocode and just before the weighting variable.
These files are all stored in a temporary directory and will remain there for a period of several hours or so. But you should retrieve them to your local system before exiting the application.
If you do not receive any output, or you get output that you feel is not consistent with what you requested, please be sure to record the date and the process id number associated with the query before reporting the problem.
The actual processing is fairly simple. Once you determine the geographic universe that the user specifies as well as the source and target geocodes and weighting variable, it is a matter of extracting these items from the appropriate entries in the MABLE database. This yields a set of census blocks for the geographic area specified, each one identified by the source and target geocodes and with a measure of its size (population, land area, or number of housing units.) To build the correlation list outputs (listing and/or CSV file) is a relatively simple process of sorting and aggregating. What this amounts to is using the census blocks as a kind of "geographic pixel," or indivisable geographic unit. All correlations are "rounded off" to the census block level. For a majority of the geographic codes, the roundoff error is zero, since most of them are never split by blocks. The resulting file is similar to the sort of result you can get from a GIS by doing a polygon intersection operation. But it goes much faster, because we have already determined all the spatial correspondences and stored the results in MABLE. All we need to do is pull out the subset of the approximately seven million predefined answers and aggregate them.
The Geocorr application was written in SAS and uses Perl interface scripts to handle the forms output. The MABLE database is a series of SAS datasets and views with auxiliary tables (SAS format codes), which are used to look up some of the codes during Geocorr processing. Most of the SAS code (and all of the dababase design) was done by John Blodgett of the Urban Information Center, University of Missouri St. Louis, under a contract with SEDAC/CIESIN. The Perl interface routines were written primarily by Hendrik Meij of CIESIN. The HTML design and coding have been a joint effort.