When you have built and implemented a test and you think you are done…then think again!
You are not done yet.
Computer based tests that consist of items that can be auto-marked generate heaps of management information.
The testing tool records the input of all candidates and this data can be turned into valuable information. P-, D- and Rit-values are examples of useful psychometric data. Through analysing the statistics and making appropriate changes the test can be improved.
Despite the fact that the testing system provides all of the statistical data, improving test items is a manual task.
It is important that you understand the data, so it can be interpreted correctly. The process of reviewing the performance of items is what we call psychometric analysis.
So, let’s imagine you are running a testing program and you have extracted the P-, D- and Rit-values.
Let’s have a look at what these values represent.
This value represents the share of candidates that answered the question correctly. In practical terms: If an item has a high P-value then chances are that the question was fairly easy; most candidates who were presented with this item provided the correct response. Items that are either too easy or too difficult do not add great value to a test. In fact, easy questions are likely to distract good test takers because the candidate assumes that the question cannot be that simple to answer. The optimal P-value for an item is in between the 0.3 and 0.8 mark.
This value represents the distribution of the candidates’ choices for the available distractors. The question that we want to see answered is: Were all of the answer options (A, B, C etc.) used by the candidates? The purpose of distractors is creating plausible answers for test takers who do not master the subject sufficiently. Performing a distractor analysis will demonstrate that multiple choice questions really only need three answer alternatives. Using more than two distractors (note: two distractors plus the correct answer equals three answer alternatives) you will find that some distractors will not be selected at all. You may then decide to remove these distractors altogether.
This value reflects the performance of the item versus the test as a whole. It tells us to what extent an item contributes to isolating the good candidates from the entire pool of test takers. In short: The rit-value demonstrates the discriminating properties of item.
Combining P-value an Rit-value
The items in the orange boxes need to be reviewed.
It is striking to see that this test contains a high number of easy items (high P-values). The accepted norm for Rit-values is debatable. In general we will find scientific research papers make the following recommendation:
The devil is in the detail.
And clearly there is much to be gained from a thorough analysis of the performance of items.
As such, the delivery of your test is not the final step in the process.
It is merely the start of improving the validity of your testing program.