Assessing Test Item Quality
It’s important to get feedback on the quality of your test items so that you can make any changes that may be necessary to improve the test’s reliability and validity. The two methods are (a) using a self-review checklist and (b) obtaining your students’ evaluation of the quality of the test questions. You can use the information gathered from either method to identify strengths and weaknesses in your item writing.
Method 1
A self-review checklist for evaluating test items
Mark the suggestions that you feel you have followed. The more you can check, the better.
Multiple-Choice Test Items
____ | When possible, I stated the stem as a direct question rather than as an incomplete statement. |
____ | I presented a definite, explicit and singular question or problem in the stem. |
____ | I eliminated excessive verbiage or irrelevant information from the stem. |
____ | I included in the stem any word(s) that might have otherwise been repeated in each alternative. |
____ | I used negatively stated stems sparingly. When used, the negative words were underlined and/or capitalized. |
____ | I made all alternatives plausible and attractive to the less knowledgeable or skillful student. |
____ | I made the alternatives grammatically parallel with each other and consistent with the stem. |
____ | I made the alternatives mutually exclusive. |
____ | When possible, I presented alternatives in some logical order (e.g., chronologically, most to least). |
____ | I made sure there was only one correct or best response per item. |
____ | I made alternatives approximately equal in length. |
____ | I avoided irrelevant clues such as grammatical structure, well known verbal associations, or connections between stem and answer. |
____ | I used at least four alternatives for each item. |
____ | I randomly distributed the correct response among the alternative positions throughout the test having approximately the same proportion of alternatives A, B, C, D, and E as the correct response. |
____ | I used the alternatives "none of the above" and "all of the above" sparingly. When used, such alternatives were occasionally the correct response. |
True-False Test Items
____ | I based true-false items upon statements that are absolutely true or false, without qualifications or exceptions. |
____ | I expressed the item statement as simply and as clearly as possible. |
____ | I expressed a single idea in each test item. |
____ | I included enough background information and qualifications so that the ability to respond correctly did not depend on some special, uncommon knowledge. |
____ | I avoided lifting statements from the text, lecture or other materials. |
____ | I avoided using negatively stated item statements. |
____ | I avoided the use of unfamiliar language. |
____ | I avoided the use of specific determiners such as "all," "always," "none," "never," etc., and qualifying determiners such as "usually," "sometimes," "often," etc. |
____ | I used more false items than true items (but not more than 15% additional false items). |
Matching Test Items
____ | I included directions which clearly stated the basis for matching the stimuli with the response. |
____ | I explained whether or not a response could be used more than once and indicated where to write the answer. |
____ | I used only homogeneous material. |
____ | When possible, I arranged the list of responses in some systematic order (e.g., chronologically, alphabetically). |
____ | I avoided grammatical or other clues to the correct response. |
____ | I kept items brief (limited the list of stimuli to under 10). |
____ | I included more responses than stimuli. |
____ | When possible, I reduced the amount of reading time by including only short phrases or single words in the response list. |
Completion Test Items
____ | I omitted only significant words from the statement. |
____ | I did not omit so many words from the statement that the intended meaning was lost. |
____ | I avoided grammatical or other clues to the correct response. |
____ | I included only one correct response per item. |
____ | I made the blanks of equal length. |
____ | When possible, I deleted the words at the end of the statement after the student was presented with a clearly defined problem. |
____ | I avoided lifting statements directly from the text, lecture, or other sources. |
____ | I limited the required response to a single word or phrase. |
Essay Test Items
____ | I prepared items that elicited the type of behavior you wanted to measure. |
____ | I phrased each item so that the student's task was clearly indicated. |
____ | I indicated for each item a point value or weight and an estimated time limit for answering. |
____ | I asked questions that elicited responses on which experts could agree that one answer is better than others. |
____ | I avoided giving the student a choice among optional items. |
____ | I administered several short-answer items rather than 1 or 2 extended-response items. |
Grading Essay Test Items
____ | I selected an appropriate grading model. |
____ | I tried not to allow factors which were irrelevant to the learning outcomes being measured to affect my grading (e.g., handwriting, spelling, neatness). |
____ | I read and graded all class answers to one item before going on to the next item. |
____ | I read and graded the answers without looking at the student's name to avoid possible preferential treatment. |
____ | I occasionally shuffled papers during the reading of answers. |
____ | When possible, I asked another instructor to read and grade my students' responses. |
Problem Solving Test Items
____ | I clearly identified and explained the problem to the student. |
____ | I provided directions which clearly informed the student of the type of response called for. |
____ | I stated in the directions whether or not the student must show work procedures for full or partial credit. |
____ | I clearly separated item parts and indicated their point values. |
____ | I used figures, conditions, and situations which created a realistic problem. |
____ | I asked questions that elicited responses on which experts could agree that one solution and one or more work procedures are better than others. |
____ | I worked through each problem before classroom administration. |
Performance Test Items
____ | I prepared items that would elicit the type of behavior I wanted to measure. |
____ | I clearly identified and explained the simulated situation to the student. |
____ | I made the simulated situation as "life-like" as possible. |
____ | I provided directions which clearly informed the students of the type of response called for. |
____ | When appropriate, I clearly stated time and activity limitations in the directions. |
____ | I adequately trained the observer(s)/scorer(s) to ensure that they were fair in scoring the appropriate behaviors. |
Method 2
Students' evaluation of test questions
You can easily use ICES questions to assess the quality of your test questions. These items are presented with their original ICES catalogue number. You are encouraged to include one or more of the items on the ICES evaluation form in order to collect student opinions.
102--How would you rate the instructor's examination questions? | 116--Did the exams challenge you to do original thinking? |
| Excellent | Poor | | Yes, very challenging | No, not challenging |
103--How well did examination questions reflect content and emphasis of the course? | 118--Were there "trick" or trite questions on tests? |
| Well related | Poorlyrelated | | Lots ofthem | Few if any |
114--The exams reflected important points in the reading assignments. | 122--How difficult were the examinations? |
| Strongly agree | Stronglydisagree | | Toodifficult | Too easy |
117--Examinations mainly testedtrivia. | 123--I found I could score reasonably well on exams by just cramming. |
| Strongly agree | Stronglydisagree | | Stronglyagree | Strongly disagree |
119--Were exam questions worded clearly? | 121--How was the length of exams for the time allotted? |
| Yes, veryclear | No, very unclear | | Too long | Too short |
115--Were the instructor's testquestions thought provoking? | 109--Were exams, papers, reports returned with errors explained or personal comments? |
| yesDefinitely | Definitelyno | | Almost always | Almost never |
125--Were exams adequately discussed upon return? | |
| Yes,adequately | No, not enough | |