Test Intentionally and Strategically, Part One
Monday, 15 May 2017
I have a friend who has done some temping in the past. Apparently, temp and staffing companies like to use commercially available computer-based tests to assess the capability of new applicants.
If I understand it correctly, the tests are set up to give you a task to complete. For example,
- Create a 4 cell x4 row table.
- Put “xyz” in the top row.
- Now make it bold.
- <Insert next micro-step here and continue>
Basically, the test leads you through steps, instead of telling you the end goal and assuming you can figure out the steps. The software keeps track of where you click and, if you click the wrong place, it is counted as wrong. If you click the right place, you get the points. Sounds pretty slick, doesn’t it? You can screen out people who don’t know what they are doing, the results are indisputable, and it doesn’t require any management time to review and score the tests. What could go wrong?
Actually, quite a bit. It’s always worth some skepticism when something seems that easy. But there are a couple of specific and serious logical flaws to the approach described above. And, lot’s of companies are using similarly flawed testing strategies.
Specific to this screening strategy, the biggest issue is understanding how people use software. But this can be generalized to other performances as well. The issue is output vs. process. Or, results vs. task. Here is what I mean.
Think about how you might go about building a table in a document using Microsoft Word. First you do some mental planning to figure out what the table needs to look like, for example, whether there should be borders, the number of columns and rows, headers, etc. When you start to build the table, you might or might not look under the right menu heading on your first attempt but, if you know what you are looking for, you will find the right option fairly quickly. In fact, there are a couple of ways to create a table (for example, you can start from the icon or the menu) and, as long as you end up with a table at the end of it all, your approach should be acceptable. The thing to measure in this case is not the process but the output. The criteria for what constitutes a “good” table can be specified. For example, correct margins, width of borders, number of columns, correct settings for the title row, etc. There might even be some things that can only be evaluated by looking at the file (vs. a printed document), such as making sure the user didn’t use tabs and hard returns instead of setting up columns. These tests never get that far though.
Basically, the test confirms that you can create a table if someone tells you every step along the way. In any case, the order and location of your clicks doesn’t really determine the effectiveness of your performance. Looking under the wrong menu heading, realizing your mistake, and then going to another doesn’t mean you can’t do the task. In a way, it means you can do the task…because you know what you are looking for, just not specifically where to find it. Sure, at some point it matters if you take too long, but it needn’t be a primary concern even if the person had to consult help…chances are that if it was a task that is performed frequently on-the-job, they would learn it better and get fast enough, soon enough. If it isn’t a frequently performed task, consulting help is perfectly acceptable.
But, if you are tracking where and in what order someone clicks, you are evaluating what they do, the process, instead of the output or result. In a case where there is no one right process, the test is invalid. You are checking whether someone remembers the steps…not whether they can produce the result. Instead, the tester needs to find a way to evaluate the output. Think of it this way, if you send your teenager to run an errand, is it better that they get to the store and come back with the right groceries or that they used a specific route? (Okay, maybe you want them to stay off the highway or not swing by their friend’s house but still…)
Ultimately, when you are designing any test, it is critical to start by defining the performance you want to evaluate and then determining a strategy to evaluate it. Avoid being led astray by solutions that are simple to implement — it is always easy to measure unimportant data. (This seems to happen a lot in the world of computer/web-based training because computers record every transaction so it is easy to count them.) Decide whether process or output is important.
Sometimes both the output and the process need to be tested but in general, if you can sufficiently assess capability by evaluating the output, it is both more efficient and valid. Usually, the people being tested prefer this approach as well, because it allows them to be assessed based on their ability to get something done. It measures something closer to their eventual job performance.
Certainly in some cases, it is important to standardize and evaluate the process. Maybe some key performances aren’t visible in the result. The example we use a lot when talking about performance testing is cooking a turkey. Sure the result has to look and taste good (that is, the output) but it is probably a good idea to monitor the process as well, to ensure safe food-handling techniques were used and that there was no cross-contamination. Even this could be evaluated by the result (i.e., verify that no one became ill or test samples of the food for bacteria) but the risk involved makes it appropriate to expend the effort to evaluate the process in addition to the output. Because the consequences are significant, it becomes worth the extra effort and cost of testing the process. But, if you start with output testing and strategically backfill with process testing only where needed, you can least reduce non-value-added testing time and costs. And avoid making decisions based on faulty data.