Writing Essay Test Items

The essay test is probably the most popular of teacher-made tests. In general, a classroom essay test consists of a small number of questions to which the student is expected to demonstrate in his/her response his/her ability to (a) recall factual, conceptual, or procedural knowledge, (b) organize this knowledge, and (c) interpret the information critically in a logical, integrated answer to the question. An essay test item can be classified as either an extended-response or a short-answer. The latter calls for a more restricted or limited answer in terms of form or scope. An example of each type of essay item follows.

  • Extended-Response: Explain the difference between the S-R (Stimulus-Response) and the S-O-R (Stimulus-Organism-Response) theories of personality. Include in your answer (a) brief descriptions of both theories, (b) supporters of both theories and (c) research methods used to study each of the two theories. (10 pts. 20 minutes)
  • Short-Answer: Identify research methods used to study the S-R (Stimulus-Response) and S-O-R (Stimulus-Organism-Response) theories of personality. (5 pts. 10 minutes)

Advantages & limitations

Essay items have several advantages. They:

  • are easier and less time consuming to construct than are most other item types.
  • provide a means for testing student's ability to compose an answer and present it in a logical manner.
  • can efficiently measure higher order cognitive objectives (e.g., analysis, synthesis, evaluation).

Essay items also have several limitations. They:

  • cannot measure a large amount of content or large number of learning objectives.
  • generally provide low test reliability and low grader reliability.
  • require an extensive amount of instructor's time to read and grade.
  • generally do not provide an objective measure of student achievement or ability (subject to bias on the part of the grader).

Suggestions for writing essay test items

  1. Prepare essay items that elicit the type of behavior you want to measure.

    Learning Objective: The student will be able to explain how the normal curve serves as a statistical model.
    Undesirable: Describe a normal curve in terms of: symmetry, modality, kurtosis and skewness.
    Desirable: Briefly explain how the normal curve serves as a statistical model for estimation and hypothesis testing.

  2. Phrase each item so that the student's task is clearly indicated.

    Undesirable: Discuss the economic factors which led to the stock market crash of 1929.
    Desirable: Identify the three major economic conditions which led to the stock market crash of 1929. Discuss briefly each condition in correct chronological sequence and in one paragraph indicate how the three factors were interrelated.

  3. Indicate for each item a point value or weight and an estimated time limit for answering.

    Undesirable: Compare the writings of Bret Harte and Mark Twain in terms of settings, depth of characterization, and dialogue styles of their main characters.
    Desirable: Compare the writings of Bret Harte and Mark Twain in terms of settings, depth of characterization, and dialogue styles of their main characters. (10 points 20 minutes)

  4. Ask questions that will elicit responses on which experts could agree that one answer is better than another.
  5. Avoid giving the student a choice among optional items, as this greatly reduces the reliability of the test.
  6. It is generally recommended for classroom examinations to administer several short-answer items rather than only one or two extended-response items. Doing this prevents a student's grade being based on his or her performance on only one item.

Suggestions for scoring essay items


Choose a scoring model. Two of the more common scoring models are ANALYTICAL SCORING and GLOBAL QUALITY.


Each answer is compared to an ideal answer and points are assigned for the inclusion of necessary elements. Grades are based on the number of accumulated points either absolutely (i.e., A=10 or more points, B=6-9 pts., etc.) or relatively (A=top 15% scores, B=next 30% of scores, etc.)


Each answer is read and assigned a score (e.g., grade, total points) based either on the total quality of the response or on the total quality of the response relative to other student answers.

Examples Essay Item and Grading Models 

"Americans are a mixed-up people with no sense of ethical values. Everyone knows that baseball is far less necessary than food and steel, yet they pay ball players a lot more than farmers and steelworkers." 

WHY? Use 3-4 sentences to indicate how an economist would explain the above situation. 

Analytical Scoring

Necessary Elements to be Included in Response


Salaries are based on demand relative to supply of such services.


Excellent ball players are rare.


Ball clubs have a high demand for excellent players.


Clarity of Response



9 pts.

Global Quality

Assign scores or grades on the overall quality of the written response as compared to an ideal answer. Or, compare the overall quality of a response to other student responses by sorting the papers into three stacks: Below average, average  and above average

Read and sort each stack again divide into three more stacks, and then repeat. In total, nine discriminations can be used to assign test grades in this manner. The number of stacks or discriminations can vary to meet your needs.

  • Try not to allow factors which are irrelevant to the learning outcomes being measured affect your grading (i.e., handwriting, spelling, neatness).
  • Read and grade all class answers to one item before going on to the next item.
  • Read and grade the answers without looking at the students' names to avoid possible preferential treatment.
  • Occasionally shuffle papers during the reading of answers to help avoid any systematic order effects (i.e., Sally's "B" work always followed Jim's "A: work thus it looked more like "C" work).
  • When possible, ask another instructor to read and grade your students' responses.