Comparison Group Selection Service

Contact Linda Leyba at (303) 497-0314 or

Most recent data year available:  2009-10 (as of June 2012); 2010-2011 available soon

  1. Overview of Comparison Group Selection Service
  2. How Does the Process Work?
  3. Cautions Regarding the CGSS Database
  4. Completing the Criteria Form - Part 1: Selection Criteria
  5. Completing the Criteria Form - Part 2: Weighting Criteria
  6. Interpreting the Reports
  7. What Next?

I. Overview of Comparison Group Selection Service

The Comparison Group Selection Service (CGSS) is designed to aid institutions in selecting a group of institutions which are similar in mission to be used in comparative data analyses. CGSS has been in use at NCHEMS since 1982 and has been used by hundreds of institutions.

CGSS consists of two primary components. The first is a large database containing indicator variables on each of more than 3,500 higher education institutions. This database is constructed from data files derived from the various surveys which make up the IPEDS (Integrated Postsecondary Education Data System) survey system administered by the National Center for Education Statistics (NCES, a part of the U.S. Department of Education in Washington, D.C.).

The indicator database contains variables covering institutional characteristics, faculty, finance, degrees awarded, enrollments, and other miscellaneous data. Since NCES releases this data over a period of many months and the indicator database requires all of these data files, the indicator database will always lag a few years behind the current academic year. For example, the current CGSS database is using data from the 2007-2008 academic year since it is the most recent one for which all of the survey data is available. Experience has shown that, except in rare instances, the indicators used for the CGSS are stable enough that this lag is not a negative factor. Institutions which have experienced large shifts in enrollments, program mix, or other indicators in recent years  should consult with the Information Services staff for guidance.

The second component of the CGSS is a set of software programs designed to condense the 3,500+ institutions in the indicator database down to a useable list for a particular institution. This software uses a set of criteria (discussed below) supplied by the target institution to determine which institutions appear on the possible comparison institution list and their relative rankings within the list.

The CGSS yields a list of possible comparison institutions. This list will typically be well over 100 institutions in length with each institution assigned a ranking based on the criteria used. It is the responsibility of the target institution to choose the final list of ten to twenty institutions to become the actual comparison group. NCHEMS does not choose the final list as part of this service. NCHEMS can provide this service at additional cost.

The sections below discuss how the selection process works and give guidance in completing the form and interpreting the reports you will receive from NCHEMS. In addition, a complete list of the variables in the database along with information about how they were calculated and guidelines for using them is available from NCHEMS.


II. How Does the Process Work?

The first step in using the CGSS is to contact NCHEMS Information Services and request a copy of the criteria form with the indicator values for the target (the institution to be compared against the group) institution. The completed form is available free of charge. Although you may have much of the data included in the form available on campus, the CGSS database performs some rather specialized calculations which often give slightly different results than your institutional data systems. Therefore, it is necessary to use the values in the CGSS database for the target institution as the starting point. See the last page for a criteria form example.

When you receive the form from NCHEMS it will have the values for the target institution in the column labeled "Your Institution". If you notice a gross discrepancy between the data in the form and what you know to be true about the target institution you should contact NCHEMS for assistance.

You should examine the values in the form and supply the appropriate ranges and or levels of importance for each variable. The procedure for doing this is discussed below.

When the form is complete to your satisfaction you should return it to NCHEMS by mail or FAX. When the form is received at NCHEMS it will be used to create an initial list of possible comparison institutions. This list will be mailed to you for your review. You are encouraged to change some (or all) of the criteria ranges and/or weightings and make additional runs after reviewing the initial list. Three runs (including the initial) are included in the base price of the service. Additional runs beyond the first three are available at a nominal charge but are rarely necessary.


III. Cautions Regarding the CGSS Database

The CGSS database is composed entirely (except for the Carnegie Classification variable) of data derived from several of the IPEDS survey datasets provided by NCES. There are many problems associated with any large-scale data collection effort like IPEDS but, in spite of these problems, IPEDS remains the single best source of national level comparative data for most of the indicators of interest. There are several issues involved in using these data in the way that the CGSS uses them that users of CGSS should keep in mind.

First, the accuracy of the datasets is controlled by the institutions which respond and the staff at NCES, not NCHEMS. NCHEMS receives the data from NCES. NCHEMS conducts only very cursory checks for data validity since NCHEMS does not possess the staff or budget resources to check each response. Occasionally users of CGSS will decide that the values in the CGSS database for their institution are "incorrect" for one reason or another. The Information Services staff can modify the values for the target institution if necessary but at additional cost beyond the normal CGSS price.

Second, the timeliness of the datasets is also controlled by the institutions which respond and the staff at NCES. The CGSS uses data from the Finance, Institutional Characteristics, Completions, Enrollments, and Faculty Salaries surveys. Therefore, all of these datasets must be available before a new CGSS database can be constructed.

Third, keep in mind that some of the indicators in the database are based on different academic years than the main database. For example, the Carnegie Class variable is based on the 2000 dataset from the Carnegie Commission which was the latest available at the time the database was constructed. Other indicators, like minority enrollments, are only collected every other year or even less frequently. When this is the case the CGSS Database Data Element Dictionary will identify the year that was used.

Do these problems mean that the IPEDS data are useless? Some would argue that the age of the data precludes using them for any meaningful analysis. However, those who regularly work with these kinds of data recognize that, for the most part, movements in key indicators like FTE, instruction expenditures per student, etc. occur rather gradually over time, particularly when examined in the aggregate. This points to the importance of using comparison groups that are large enough (seven or eight institutions as a minimum) to even out shifts in individual institutions.

Finally, some users of CGSS believe that the database does not include the right variables or enough variables to derive a good comparison group. There are many other indicators that would be useful to include but these indicators are not available in machine-readable form at a reasonable cost for non-profit users like NCHEMS. However, the original research conducted to create the first CGSS database, and research conducted outside NCHEMS, indicates that the variables used in the CGSS database are the most statistically potent variables of the ones available.


IV. Completing the Criteria Form - Part I: Selection Criteria

The CGSS is designed to reduce the universe of 3500+ higher education institutions down to a much smaller list of institutions which are similar to the target institution. This is accomplished in two ways. The first, Selection Criteria, is discussed here. The second, Weighting Criteria, is discussed in the next section.

The Selection Criteria section of the criteria form is used to exclude from consideration institutions which differ from the target institution in several different institutional characteristics. The most common example of this is institutional control. Often the mission, enrollment, and funding patterns of public and private institutions are different enough that public institutions should be compared only with other public institutions and private institutions should be compared only with other private institutions. If you believe that institutional control is important then you would check the box labeled "Very Important".

Another example of a selection criteria is Region. This variable categorizes institutions based on a region code assigned by NCES. In most cases you will want to generate a national comparison group so you would check the box labeled "Not Important".

The selection criteria work by serving as a series of gates. An institution that does not satisfy one of the selection criteria is excluded from further consideration as a comparison institution if the Very Important box is checked for that criteria. If the Not Important box is checked then this criteria is not used, or the gate is open for all institutions. For an institution to be included in the list it must be able to pass through all of the gates that you specify. If you specify too many gates then very few institutions will make it through.

The selection criteria are very useful for quickly reducing the size of the universe of possible comparison institutions. For example, the control variable by itself will usually reduce the universe from 3500+ to 1500 or so by itself. If this variable is combined with others then the list can become short rather fast. The Landgrant variable is one that has an even more dramatic effect. There are only 62 Land Grant institutions in the U.S. If this variable is specified as "Yes" and "Very Important" then the universe of 3500 is reduced to 62 immediately regardless of other criteria specified. As you can see, these variables must be used carefully to prevent reducing the list too quickly to make the process worthwhile.

The criteria form has tables of the codes that are used in the City Size, Region, and Carnegie Class variables. The codes for Control, Landgrant, and Medical School are specified on the appropriate line.

The City Size, Region, and Carnegie Code variables all have several values and you may want to select institutions in more than one category. To do this you should list the codes you want included on the appropriate line of the form. For example, if you want only institutions from the Far West and Rocky Mountain regions included in your list you would specify codes 7 and 8 on the Region line of the form.

Experience has taught that you should specify only the minimum number of selection criteria necessary on the initial run. This will keep the list as broad as possible and will allow you to see how a wide range of institutions are ranked by the weighting criteria in Part II. The Control variable is often the only one needed for an initial run by most institutions which are not land grant institutions and do not have a medical school. The Carnegie Class variable should also be left as Not Important for the initial run. Inclusion of the Carnegie Class as a selection criterion might exclude from the list institutions which are good comparison institutions but which have a different Carnegie Class from the ones specified. Use of this variable may also cause institutions to be excluded from the list simply because the Carnegie Foundation has not assigned them a code.


V. Completing the Criteria Form - Part II: Weighting Criteria

Once the universe of possible comparison institutions has been reduced by the selection criteria specified in Part I of the form, the Weighting Criteria can be used to rank the remaining institutions. This part of the process is the most important since it will determine the ordering of the comparison group listing and will greatly affect which institutions will be selected for the final group.

There are two ways that the variables in this section affect the rankings of possible comparison institutions. The first way is through the specification of a range for each variable. The range is set by the target institution to be any set of values desired. An institution which falls within the range is not affected by that variable in terms of its placement on the comparison institution listing. An institution whose value for a particular variable falls outside of the range specified will accumulate points and will be moved lower in the listing than an institution which falls within the range.

The second way that weighting variables have an effect is through the level of importance assigned. The number of points assigned to an institution for being outside of a particular range depends on the level of importance specified for that variable. For example, an institution which falls outside of the range on a variable which has been assigned a level of importance of "Very Important" will receive 100 points. Since institutions are ranked in ascending order by the number of points they accumulate, this institution will appear lower on the list than an institution which has accumulated no points. An institution which falls outside of the range on a variable labeled with an importance level of "Important" will also accumulate points but only 50 points in this case. This 50 points will also move the institution lower in the list but not as much as if they had missed a Very Important range.

Note that a variable which is assigned an importance level of "Not Important" will have no effect on the placement of an institution on the list regardless of whether the institution is within the range or not. This means that if you specify a range for a variable but then assign it a Not Important level this variable will have absolutely no effect on the rankings. The Not Important column is included primarily to allow an institution to positively indicate which variables to exclude from affecting the comparison group listing. It is not necessary to specify a range for these variables. Checking Not Important will have the same effect as if you had left the line completely blank.

The Comparison Group Selection Service is designed to generate comparison group lists for both two and four year institutions. As a result, the form includes some variables which may not be appropriate for your type of institution. On the form those variables which are based on certificate and associate awards have the notation "2-yr" after the name of the variable. If your institution grants both two and four year awards then you may want to use both types of variables for ranking institutions. Otherwise, you should simply ignore the variables which do not apply to your institution even though they will appear in your report. If you do not specify a range and level of importance for these variables then they will have no effect on your list of possible comparison institutions.

In selecting variables and setting ranges for the weighting criteria you should focus on a few key variables rather than trying to set ranges for every possible variable. Your institution may be a large, primarily graduate institution with a heavy emphasis on business programs. By specifying ranges for the FTE Enrollment, Percent Masters, Percent Doctorates, and Percent Business Degrees variables you will probably generate an acceptable list for the initial run. You can then use the second and third runs to refine the list and perhaps add a few additional variables. The wide variety of variables on the form is intended to accommodate the needs of many types of institutions so you should not expect to use every variable.

Institutions are often interested in adding additional variables to the database for use in generating their comparison group. The variables used in the comparison group database were selected based on the results of an extensive factor analysis conducted in 1981 at NCHEMS. These indicators were found to have enough statistical power to be of use in attempting to group institutions into similar groups. As a result of this research, as well as the work of other researchers, it was determined that these indicators were the most worthwhile for use in determining comparison groups. While there are certainly some additional variables that would be of use they are generally not available as part of any national data collection effort. As a result, NCHEMS does not have ready access to these data and cannot include them in the database.


VI. Interpreting the Reports

As you interpret the reports you should remember that the threshold methodology used by NCHEMS for generating the comparison group reports relies on subjective judgement at several stages. Judgement is needed to set the initial criteria and ranges and then to interpret the reports, make adjustments to the ranges, and finally to choose the actual comparison group. Although the method may appear less "scientific" than some more heavily statistically-oriented methods it has generally produced comparison groups which are at least as good as, and in many cases superior to, those produced by other methods. You should also remember that choosing good comparison institutions is as much an art as a science.

The comparison group reports consist of six parts. Each part lists the values in the comparison group database for a group of the variables included on the criteria form. The variables on the report are in the same order as they appear on the form except for the Carnegie Class which appears on the last page of the report.

Each part of the report contains three common variables which appear as the first three columns on each page. These are the Distance Score, ID Number, and Institution Name. The ID Number and Institution Name are included to help you identify the institutions. The ID Number is the Unit ID assigned by National Center for Education Statistics.

The Distance Score is the total number of points an institution has accumulated because it has missed one or more ranges that you have specified. For example, if an institution missed two ranges that you specified as Very Important then the institution would have a Distance Score of 200. An institution which fell outside of the range for one Important and one Very Important variable would have a Distance Score of 150. Typically the ranges will have been specified so that the target institution appears first on the list with a Distance Score of zero. Occasionally there may be a few other institutions which also fall completely within the ranges specified and they too will have a zero Distance Score. Institutions which have higher Distance Scores will be grouped in ascending order by their scores and will appear after those with zero scores.

Generally, any institution in the first several Distance Score groups could be an acceptable comparison institution. If you are planning to make additional runs after modifying your criteria then you should examine the institutions within these groups and attempt to modify your criteria to better discriminate among the top institutions. The best strategy is to try to make the ranges do as much work for you as possible by setting them just narrow enough to divide the institutions with the lowest Distance Scores into several small groups (5 to 10 institutions per group). If the lowest score groups have more than ten institutions then it is best to narrow a key range or two to reduce the size of the groups. You should also expect to have several institutions with relatively low distance scores in the first group after the target institution. If the lowest distance score obtained is more than 50 or 100 points then you have probably specified too many weighting variables and/or set the ranges too narrow. Keep in mind that many variables are complementary. If you specify a range of 90-100% for percent bachelor degrees and a range of 0-10% for percent master’s degrees then an institution outside one of the ranges will almost certainly be outside the other and will accumulate points for both variables.

When interpreting the Distance Scores remember that an institution can accumulate points in a variety of combinations. Ten institutions may be listed in a group with 350 points and they may have accumulated those points in ten different ways by falling outside of the ranges for different variables. You can easily tell how an institution was scored by examining the ranges you specified and then looking at the values for each institution.

Once you have made all of the report runs that you plan to make you can determine your final comparison group. NCHEMS recommends that your final group include no more than fifteen institutions for most purposes. To arrive at this final list you should pick institutions from the three or four lowest score groups. At this point in the process your subjective judgement is the most important factor in determining whether an institution should be included or not. At this stage of the process it is also useful to examine other data sources, if available, that may help to distinguish one possible comparison institution from another.

When selecting the final list of institutions you should keep several things in mind. First, a typical next step in using comparison groups is to contact the other institutions you have selected and set up an informal data exchange. Some institutions are more willing, and able, to participate in these exchanges than others and this might sway your choice of them as one of your comparison institutions. If you intend to establish such an exchange then it might be wise to contact the other institutions before publicizing the comparison group list. Second, remember that, except in rare cases, the list of comparison institutions will be scrutinized by other people at your institution, at one or more governing bodies, and other external groups. The list you select should be defensible as being rational and reflective of the actual mission of your institution unless the group's clear intended use is as an aspirational group rather than a peer group.


VII. What Next?

Once the comparison group is established then you can begin to make use of it in conducting various types of comparisons. Although a complete discussion of comparative analysis is beyond the scope of this document a few hints in that direction are in order.

First, plan on collecting and examining as much current information about the comparison group as possible. This will allow help verify the accuracy of the comparison group and establish a baseline for future comparisons. Second, historical trend data is often very revealing when working with comparison groups. Sometimes there will be large differences between the trends observed for the target institution and the comparison group average over time. This might be indicative of problems with the comparison group selected or it might reflect changes in data reporting over time. There also may be fundamental differences in the way the target institution has performed over time versus the comparison group. If these differences affect the validity of the comparison group then you will need to adjust the group accordingly.

Third, as mentioned above, either join an existing data exchange that includes your comparison institutions (or at least the majority of them) and/or form your own data sharing group. The nationally-available data sources like IPEDS cannot match the timeliness or depth of institutionally-based data sources.

Finally, you should consider checking the continued validity of the comparison group every one to two years. This validation may be a bonus of data sharing activities. In any case, you should check the assumptions that the comparison group is based on to be sure that no major changes have occurred in the target institution or one or more of the comparison institutions.

Contact Linda Leyba at (303) 497-0314 or