So, is a randomised test fair?

It depends.

A randomised test can be totally fair but it can also be biased.

A test is biased when the results have consequences that unfairly advantage or disadvantage test takers.

Is it possible to determine whether a test is fair? Whether it is equally difficult for all candidates?

Yes it is. But only in hindsight.

An analysis of the average p-value of the test is of great help in establishing the fairness of the test. When the average p-values ​​are spread across a broad range then it is highly likely that several tests had varying levels of difficulty.

p-value

But…in hindsight is too late!

You want certainty about the fairness of your test before it is delivered to your candidates.

It will prevent you from having to deal with some considerable headaches afterwards 🙂 .

So, how is this done?

How does one create a test that is fair to each and every candidate?

The key condition is that you have a clear understanding of the level of knowledge and/or skills of your candidates.

Your candidates really do not have to know the answers to all of the questions. In fact, some things are learnt by doing and through gaining experience.

So what does this have to do with randomised tests?

Everything!

Because: When designing a randomised test you want to ensure that candidates who come well prepared are presented with questions that they can answer.

You want to be able to distinguish the competent candidates from those that require some further education. Evidently, you work your way back starting from the norm.

So what is the goal? What do you want to measure?

You want to be able to assess a candidate’s knowledge and insight at a predetermined level of the subject matter. You want to set a standard. And for this to work correctly you need to be a subject matter expert. You apply your knowledge of the subject matter during an item review and qualify all of the items into buckets of easy, moderate and hard questions. This method of standard setting is the foundation for a good randomised test. This approach is known as the Method of Angoff.

Depending on the testing solution that you use you ensure that all levels of difficulty are reflected in your test specifications matrix – or blueprint. QuestionMarkPerception has this to say on the subject.

Seeding items

Andriessen’s Sisto offers the possibility of seeding pretest items. These items do not count towards a candidate’s final score but you can use them for determining their p- and rit-values. Is it a hard, moderate or easy question.

When you have collected sufficient information about the pretest item you then decide whether it can be included in the test. This allows you to remove an item that is not performing well or you amend it and include it as a new pretest item in the next release of your test.

Taking this approach to the design and further development of your test allows you to improve its quality.

Your item bank will increasingly reflect items of a similar difficulty level.

Should you wish to use items with significantly different levels of difficulty then you will want to label your items or use a test matrix that is designed to fairly distribute these items.

Randomised tests: The advantages:

  1. They decrease the value of exam or item theft. Every test is different!
  2. It is easy to swap pretest items in and out.
  3. It allows you to gradually grow your item bank increasing the randomisation of items.
  4. Every candidate is presented with a test that is unique.

So, is a randomised test fair?

Yes.

But it requires work and maintenance. Particularly in the area of item difficulty.

Consider this: video clips in tests

One of the advantages of computer based testing is the ability to use multimedia files.

From a technical perspective the inclusion of video clips is not a problem. Almost all vendors of computer based testing solutions will offer this in some shape or form. There is a variety of low-cost tools that will allow you to place your videos online and embed them your test.

video in test

YouTube offers a simple, reliable and cost-efficient way of embedding video in your test.

By default YouTube videos are public, which means that anyone can watch them.

Keep this in mind when you include video in a test.

The use of video complements the candidate experience and it can add flavour to formative tests. In regards to summative tests you want to be more careful. It is important to establish that candidates who have already seen the video before the test have no advantage over those who haven’t.

YouTube

YouTube offers the option of making a video unlisted.

This means that only those people that have the link to the video can view it. Unlisted videos don’t show up in YouTube’s search results unless someone adds your unlisted video to a public playlist.

A good alternative to YouTube is Vimeo. Vimeo Plus or Pro subscriptions are very affordable (approximately $60 or $200 per year respectively) and offer features such as video password protection, domain-level privacy and advanced views statistics. Furthermore you can add your own logo to the video player – a nice touch!

Consider this when using (online) video:

  1. What is the impact if a candidate has already watched the video before the start of the test?
  2. Do you have the rights to use the video in your test?
  3. Is the bandwidth sufficient for all candidates to view the video simultaneously?
  4. Can YouTube, Vimeo or another video player be accessed from the test station?

Sources: https://support.google.com/youtube/answer/157177?hl=en and https://vimeo.com/upgrade

Example of embedded video in the English Example.

The quest for distractors: Getting the wrong answers right

I need distractors. And not because I am bored. No, I am looking for information on distractors in multiple choice questions.

The word distractors is used for the alternatives to the correct answer. So we are talking the baddies, the wrong ones. But how do I get those right?

Literature is certain about one thing in relation to distractors. And anyone who has ever developed a test with multiple choice questions will agree:

It is difficult to create good distractors.

Naturally, writing a good question (or item) is an art. But we get them right most of the time. However, creating good distractors – nice alternatives – is a real challenge.

Here’s some help.

An important condition for the answer alternatives is: All distractors must be likely. That is, all distractors should be a potential answer to the question.

But when you are not an expert in the subject matter then you will most likely see all of the response options as equally logical or probable.

The purpose of creating good distractors is distinguishing the good from the bad test takers. No doubt you want a good candidate to answer all of the questions correctly. And the candidate who has not studied hard enough will be thrown off balance by the answer alternatives. He will start guessing what the correct answer is and, eventually, he will fail the test (hopefully). Guessing is done when a candidate is unsure. And what this means is that the candidate will try and argue which answer fits the question.

Recommendations for creating great distractors:

  1. all distractors must be equally likely and grammatically and factually correct.
  2. all distractors must be of similar length and try to keep them brief. Provide relevant information in the question instead of in the answer alternatives.
  3. make use of counterexamples for creating distractors, do not use (double) negatives.
  4. all distractors must be written in the same style. If possible, avoid lingo and watch out for vague descriptions.
  5. use a limited number of distractors. Three answer alternatives are as good as four alternatives. In practice one out of four answer alternatives is rarely selected.

And on that last point: It is a best practice to analyse the performance of your items. Verify whether all of your answer alternatives were used by candidates. Have a look at this pie chart.

afleider

You can see that the alternative D is never selected. Alternative C is selected by two of the 201 candidates.

Conclusion: there is work to be done.

In any case, alternative D can be deleted.

Last but not least: Four-eyes principle

Remember always that the best way to make good questions and answers is the four-eye principle.

When you create a question with possible answers, you have to check the question by a colleague. Two see more than one.

Good luck with making good distractors.

Want to read more?

Writing good multiple choice test questions

Multiple choice is king

Computer based testing tools all offer the well-known and highly maligned multiple choice question, also known as the MCQ item type.

But did you know that testing (or examination) tools offer many other different item types? And that most of these are based on closed questions?

Candidates’ responses to closed questions can be automatically marked.

In my view this is a great example of the benefits that testing software offers versus the classic paper and pencil test.

Providing the candidate with the result of his or her test does not require manual intervention. The result can be automatically sent to the candidate or the institution that sponsors the test.

Types of closed questions

Examples of closed questions include:

  • Multiple Choice Single Response: One of the answers is correct
  • Multiple Choice Multiple Response: More than one answer is correct
  • Drag and drop (matching): Drag and drop an object in an image or piece of text
  • Ranking: Put lines of text or images in the correct order
  • Fill in the blank: Enter the correct word or combination of words into a text box
  • Hotspot: Place a marker on the correct spot in a picture, video or image
  • Numerical: Enter the answer to a numerical or mathematical question

All of these questions can be marked automatically.

But are all these item types used?

Well no, not really.

Have a look at this data that we pulled from Sisto:

 

Itemtype Volume  Share
Hotspot (single marker)

3,988

0,0%

Hotspot  (multiple markers)

4,248

0,0%

Fill in the blank (single)

37,043

0,1%

Fill in the blanks (multiple)

56,546

0,1%

Multiple Choice Single Response

38,692,327

90,2%

Multiple Choice Multiple Response

1,273,913

3,0%

Numerical

898,283

2,1%

Essay (computer based)

1,428,539

3,3%

Ranking

81,307

0,2%

Essay (on paper)

185,965

0,4%

Matching

10,930

0,0%

Speaking

231,100

0,5%

Upload

166

0,0%

Total

42,904,355

100,0%

Clearly the multiple choice item type is the most popular. By a long way.

So why do the technical specifications of tenders or requests for proposal often put such a strong focus on the types of items that a vendor is able to support?

It really is not as important as we are led to believe.

Take it from me: The multiple choice question is king!