So, is a randomised test fair?

It depends.

A randomised test can be totally fair but it can also be biased.

A test is biased when the results have consequences that unfairly advantage or disadvantage test takers.

Is it possible to determine whether a test is fair? Whether it is equally difficult for all candidates?

Yes it is. But only in hindsight.

An analysis of the average p-value of the test is of great help in establishing the fairness of the test. When the average p-values ​​are spread across a broad range then it is highly likely that several tests had varying levels of difficulty.

p-value

But…in hindsight is too late!

You want certainty about the fairness of your test before it is delivered to your candidates.

It will prevent you from having to deal with some considerable headaches afterwards 🙂 .

So, how is this done?

How does one create a test that is fair to each and every candidate?

The key condition is that you have a clear understanding of the level of knowledge and/or skills of your candidates.

Your candidates really do not have to know the answers to all of the questions. In fact, some things are learnt by doing and through gaining experience.

So what does this have to do with randomised tests?

Everything!

Because: When designing a randomised test you want to ensure that candidates who come well prepared are presented with questions that they can answer.

You want to be able to distinguish the competent candidates from those that require some further education. Evidently, you work your way back starting from the norm.

So what is the goal? What do you want to measure?

You want to be able to assess a candidate’s knowledge and insight at a predetermined level of the subject matter. You want to set a standard. And for this to work correctly you need to be a subject matter expert. You apply your knowledge of the subject matter during an item review and qualify all of the items into buckets of easy, moderate and hard questions. This method of standard setting is the foundation for a good randomised test. This approach is known as the Method of Angoff.

Depending on the testing solution that you use you ensure that all levels of difficulty are reflected in your test specifications matrix – or blueprint. QuestionMarkPerception has this to say on the subject.

Seeding items

Andriessen’s Sisto offers the possibility of seeding pretest items. These items do not count towards a candidate’s final score but you can use them for determining their p- and rit-values. Is it a hard, moderate or easy question.

When you have collected sufficient information about the pretest item you then decide whether it can be included in the test. This allows you to remove an item that is not performing well or you amend it and include it as a new pretest item in the next release of your test.

Taking this approach to the design and further development of your test allows you to improve its quality.

Your item bank will increasingly reflect items of a similar difficulty level.

Should you wish to use items with significantly different levels of difficulty then you will want to label your items or use a test matrix that is designed to fairly distribute these items.

Randomised tests: The advantages:

  1. They decrease the value of exam or item theft. Every test is different!
  2. It is easy to swap pretest items in and out.
  3. It allows you to gradually grow your item bank increasing the randomisation of items.
  4. Every candidate is presented with a test that is unique.

So, is a randomised test fair?

Yes.

But it requires work and maintenance. Particularly in the area of item difficulty.

Multiple choice is king

Computer based testing tools all offer the well-known and highly maligned multiple choice question, also known as the MCQ item type.

But did you know that testing (or examination) tools offer many other different item types? And that most of these are based on closed questions?

Candidates’ responses to closed questions can be automatically marked.

In my view this is a great example of the benefits that testing software offers versus the classic paper and pencil test.

Providing the candidate with the result of his or her test does not require manual intervention. The result can be automatically sent to the candidate or the institution that sponsors the test.

Types of closed questions

Examples of closed questions include:

  • Multiple Choice Single Response: One of the answers is correct
  • Multiple Choice Multiple Response: More than one answer is correct
  • Drag and drop (matching): Drag and drop an object in an image or piece of text
  • Ranking: Put lines of text or images in the correct order
  • Fill in the blank: Enter the correct word or combination of words into a text box
  • Hotspot: Place a marker on the correct spot in a picture, video or image
  • Numerical: Enter the answer to a numerical or mathematical question

All of these questions can be marked automatically.

But are all these item types used?

Well no, not really.

Have a look at this data that we pulled from Sisto:

 

Itemtype Volume  Share
Hotspot (single marker)

3,988

0,0%

Hotspot  (multiple markers)

4,248

0,0%

Fill in the blank (single)

37,043

0,1%

Fill in the blanks (multiple)

56,546

0,1%

Multiple Choice Single Response

38,692,327

90,2%

Multiple Choice Multiple Response

1,273,913

3,0%

Numerical

898,283

2,1%

Essay (computer based)

1,428,539

3,3%

Ranking

81,307

0,2%

Essay (on paper)

185,965

0,4%

Matching

10,930

0,0%

Speaking

231,100

0,5%

Upload

166

0,0%

Total

42,904,355

100,0%

Clearly the multiple choice item type is the most popular. By a long way.

So why do the technical specifications of tenders or requests for proposal often put such a strong focus on the types of items that a vendor is able to support?

It really is not as important as we are led to believe.

Take it from me: The multiple choice question is king!