Annual Reports

On an annual basis, candidate performance reports and test statistics reports are produced for the MTEL program. These reports are described in the following sections, and annotated samples of the reports are provided. Links to the most recent set of annual reports can be found within the associated sections.


Candidate Performance Reports

The following three candidate performance reports are produced annually for the MTEL:

  • Annual Pass Rate Report
  • Annual Test Results by Category
  • Total Scaled Score Distribution by Test Field (All Forms)

These reports are designed to provide information about testing outcomes for each test field, including overall candidate pass rates, average scores, results by candidate reporting group, and distribution of scores. The reports are described below.

Interpreting Candidate Performance Reports

  • Extreme caution should be used in interpreting data for small numbers of candidates.
  • The candidates for whom results are presented in annual reports may not reflect the same proportion of all the types and capabilities of candidates in the population who will take the tests in the future.
  • The results presented in annual reports may not be comparable to results in previous or future reports due to regular test updating that results in variations in test form design. Regular test updating activities may lead to changes in passing scores that result in variations in the percent passing for a test. Therefore extreme caution should be used in making comparisons between data contained in annual reports representing different program years.
  • Reporting group designations for gender, ethnicity, and primary language are based on self-reported information provided by candidates during the registration process. The questions about gender and ethnicity are labeled "optional" for registration purposes and some candidates do not respond to these questions.

Annual Pass Rate Report

The Annual Pass Rate Report includes information regarding pass rates and candidates' average score for each test field for the program year. Pass rates are provided for first-time test takers and for all test takers. See MTEL Annual Pass Rate Report 2021–2022 PDF for the report generated for the September 2021–August 2022 program year.

Annual Test Results by Category

The Annual Test Results by Category Report includes information regarding pass rates for each test field for the program year, reported by gender, ethnicity, and primary language. Pass rates are based on each candidate's best attempt during the program year. See MTEL Annual Test Results by Category 2021–2022 PDF for the report generated for the September 2021–August 2022 program year.

Total Scaled Score Distribution by Test Field
(All Forms)

The Total Scaled Score Distribution by Test Field (All Forms) provides information about the distribution of candidates' scores above and below the minimum passing score. For the MTEL, results are reported on a scale ranging from 100 to 300, with a scaled score of 240 representing the minimum passing score for each test. This report is provided for fields with 10 or more attempts during the program year. See MTEL Total Scaled Score Distribution by Test Field (All Forms) 2021–2022 PDF for the report generated for the September 2021–August 2022 program year.

Test Statistics Reports

The following two test statistics reports are generated annually for the MTEL program:

  • Test Form Statistics Report
  • Open Response Statistics Report

These reports are designed to provide information about the statistical properties of MTEL tests, including the reliability/precision of the tests.

Standard 2.0 Appropriate evidence of reliability/precision should be provided for the interpretation for each intended score use.

Standard 11.14 Estimates of the consistency of test-based credentialing decisions should be provided in addition to other sources of reliability evidence. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Statistical Measures Used

As indicated by the Standards above, it is important to provide evidence of the reliability/precision of the MTEL scores for their intended use to make pass/fail classifications for the purpose of educator credentialing (i.e., to confer a Preliminary or Initial educator license). The Standards define "reliability/precision" as the "consistency of scores across instances of the testing procedure." Standard 11.14, specific to credentialing tests, indicates "the consistency of decisions on whether to certify is of primary importance."

A number of statistical measures are used for the MTEL to measure the reliability of the test scores. Measures of reliability are reported for the total test, the multiple-choice section, and the open-response section. However, because pass/fail decisions are made based on the total test score only, it is the total test reliability that is the focus of interest; other measures of reliability for portions of the test are presented as supplemental information. When considering reliability indices for a single test section (multiple-choice or open-response), it should be kept in mind that a section of a test may have lower reliability statistics than the total test if the test section contains fewer test items than the total test.

The statistics of primary focus are those that describe the consistency of pass/fail decisions on the total test and the error of measurement associated with the total test, as follows:

  • Total test decision consistency. Total test decision consistency (Breyer and Lewis) is a reliability statistic that describes the consistency of the pass/fail decision. This statistic is reported in the range of 0.00 to 1.00; the closer the estimate is to 1.00, the more consistent (reliable) the decision is considered to be. The statistic is reported for test forms with 100 or more attempts during the program year. Test forms are considered to be identical if they contain identical sets of scorable multiple-choice items, regardless of the order of the items.
  • Total test Standard Error of Measurement (SEM). Standard Error of Measurement (SEM) is a statistical measure reported as a number that provides a "confidence band" around a candidate's score; if a candidate retook the test, the candidate's reported score would likely be within the reported score plus or minus the number reported as the SEM. The smaller the SEM, the closer a candidate's score could be expected to be to the one reported upon repeated testing. This statistic is reported for each test form with at least 100 attempts.

Additional supplemental statistics for the total test, multiple-choice section, and open-response section of MTEL tests are provided in the MTEL test statistics reports, as follows:

  • Stratified alpha. Stratified alpha is an estimate of total test reliability for a test containing a mixture of item types (e.g., multiple-choice and open-response). This statistic is reported in the range of .00 to 1.00, with a higher number indicating a greater level of consistency (reliability).
  • Multiple-choice section Standard Error of Measurement (SEM). This statistic is similar to the total test Standard Error of Measurement described above, but it is applied to a candidate's score for the multiple-choice section of a test. It is reported for each test form with at least 100 attempts. There are two versions: Keats' estimated SEM (Keats, 1957) and the computed SEM.
  • KR20. The KR20 (Kuder-Richardson index of homogeneity) is a measure of internal consistency of the multiple-choice test items. KR20 is reported in the range of .00 to 1.00, with a higher number indicating a greater level of internal consistency (that is, the degree of consistent performance on items intended to measure the same construct). This statistic is reported for each test form with at least 100 attempts.
  • G coefficient. The G (generalizability) coefficient indicates the degree to which the variability in scores for the open-response section is attributable to candidates, such as subject area knowledge, rather than to measurement error. It is reported in the range of .00 to 1.00, with a higher number indicating a greater level of dependability (or accuracy of the generalization from observed score to universe score). This statistic is reported for test forms with at least 100 attempts.
  • Pearson Correlation of multiple-choice and open-response section scores. The Pearson Correlation indicates the correlation (relationship) between the scaled scores for the two sections of a test. This metric reports the degree to which respondents who do well on the open-response section also do well on the multiple-choice section. This statistic is reported in the range of 0.00 to 1.00 for each test form with at least 100 attempts.
  • Scorer agreement. For each test form with open-response items, information is reported on scorer agreement regarding the individual raw scores assigned to each candidate's response to an open-response item. The following information is reported: the percent of cases in which the first two scorers were in agreement (i.e., assigned identical scores or scores that only differ by 1 point, also called adjacent scores), the percent of identical scores, and the percent of adjacent scores.
  • Inter-rater reliability. For each test form with open-response items, inter-rater reliability reports the degree to which different raters assign the same score to the same response.

Factors Affecting Reliability Measures

Reliability measures for MTEL tests may be influenced by many factors, including the following:

  • Number of candidates. To be interpreted with confidence, statistical reliability estimates must be based on adequate numbers of candidate scores that represent a range of candidate knowledge and skill levels and that provide variance in candidate score distributions. Statistical reliability estimates based on few candidate scores may be very dependent on the characteristics of those candidates and their scores. For this reason, reliability estimates are calculated for MTEL tests that are taken by 100 or more candidates.
  • Variability of the group tested. In general, the larger the variance or true spread of the scores of the candidate group (i.e., the greater the individual differences in the level of knowledge and skills of the candidates), the greater will be the reliability coefficient. Reliability estimates tend to be higher if candidates in the group have widely varying levels of knowledge, and lower if they tend to have similar levels of knowledge. The range and distribution of candidate scores for each test field can be seen in the report Total Scaled Score Distribution by Test Field (All Forms), described previously.
  • Self-selection of candidates by test administration date. MTEL tests are administered throughout the year, and candidates can select when to take and retake the tests. The composition, ability level, and variability of the candidate group may vary from one test form to another as a result of the time of year that different test forms are administered.
  • Number of test items. Longer tests generally have higher reliability estimates. Some MTEL tests consist of two or more subtests that candidates must pass separately and for which they retake only the failed components. Because the pass/fail decisions are based on a decreased number of test items when compared to a total test model, KR20 and other reliability evidence cannot be expected to reach the levels found in single-component tests of greater length.
  • Test content. Reliability estimates are typically higher for tests that cover narrow, homogeneous content than for tests that cover a broad range of content. MTEL tests typically test a broad base of knowledge and skills that pertain to educator licenses that will apply in a wide range of educational settings, grade levels, and teaching assignments.

Aids to Interpreting the MTEL Statistics

The following interpretive aids and cautions should be kept in mind while considering the MTEL test statistics reports.

  • MTEL scores are reported to candidates as scaled scores with a lower limit of 100, a passing score of 240, and an upper limit of 300. This is the scale used in reporting all scaled score statistics.
  • Some tests may not be taken by any candidates during a reporting period. Data for such fields are not available to be reported.
  • Statistical information on the MTEL should be interpreted with the understanding that the tests taken by candidates are a composite of multiple-choice items and open-response items, and this may affect psychometric characteristics of the test.
  • For MTEL tests with short-answer scoring (e.g., language structure items in some language tests, sentence-correction items in the Communication and Literacy Skills test), the scores for those items are included in the multiple-choice section of the test.
  • Information that is based on the test performance of relatively small numbers of candidates (i.e., fewer than 100 candidate test attempts) may not be indicative of the performance of larger numbers of candidates.

Test Form Statistics Report

The Test Form Statistics Report provides information regarding the statistical properties of MTEL test forms with at least 10 attempts during the program year. See MTEL Test Form Statistics Report 2021–2022 PDF for the report generated for the September 2021–August 2022 program year.

Open Response Statistics Report

The Open Response Statistics Report provides selected statistics for the open-response items for test fields with at least 100 attempts during the program year. See MTEL Open Response Statistics Report 2021–2022 PDF for the report generated for the September 2021–August 2022 program year.


Top of Page