Commonly Used Training Evaluations Models: A Discussion with Dr. Will Thalheimer

Learning Evaluation Image

Dr. Will Thalheimer is of the most respected learning researchers out there. And that’s especially true when it comes to issues regarding learning evaluation.

We were excited to be able to talk with Dr. Thalheimer about four common learning evaluation models, and we’ve got the recorded video for you below. If you prefer your learning evaluation information in written form, just scroll down for the transcript of our discussion. And if you’d like to read other discussions we’ve had with Will, click these links to learn more about spaced practice, the effectiveness of elearning, smile sheets, and learning myths v. learning maximizers.

Many thank to Will for participating in this discussion on learning evaluation and for everything he does. Please be sure to go off and check out his other materials and offerings at his website. And when you finish this discussion, know that we had a follow-up in which Dr. Thalheimer explained his new LTEM learning evaluation model as well.

Below is the transcription of our discussion with Dr. Will Thalheimer about learning evaluation. Enjoy.

Training Evaluation Methods–An Introduction

Convergence Training: Hi there, everybody, and welcome. This is Jeff Dalto of Convergence Training and Vector Solutions, and we have a really special guest today. I’m very excited, we have Dr. Will Thalheimer. Will is the owner of Work-Learning Research and writes at the Will at Work blog and also has some great books out.

You probably already know Will on your own—Will’s a big name. Hi, Will. If you follow the Convergence Training blog, you know that we’ve interviewed Will multiple times, and we are constantly referring to him and to his work on any number of topics. So obviously, we’re excited to have Will.

Will, how are you today?

Dr. Will Thalheimer: I’m pretty good. It’s Friday here.

Convergence Training: It’s Friday here too. Imagine that. We’re on the same side of the international dateline. Well, thanks again for coming on.

And today we’re going to talk about training or learning evaluation. We’re going to have two interviews, actually. This will be the first one. And we’re looking at some common training evaluation models. And then we’ll come back and have a second one on a model you just created called LTEM (listen to that LTEM discussion here).

But I wonder, just to kind of set the scene, Will, if you could let people know what we’re talking about when we’re talking about training evaluation.

Dr. Will Thalheimer: Sure.

And, you know, Jeff, you sent me a whole bunch of questions that you might ask me and so what I did, I took the liberty of creating some visuals, some slides, because sometimes a picture paints 1,000 words. So if you allow me to jump in there…

Convergence Training: Yeah, please do, please do get it started.

Dr. Will Thalheimer: And I’m going to start looking at some big-picture ideas because learning is really complicated, right? Much more complicated than the hard sciences, like rocket science, because human beings are so complicated. And then you put the layer of learning evaluation on that, and it becomes even more complex. So I want to go over some big picture things first. Just so that we are all on the same page.

Convergence Training: Yeah, please.

Dr. Will Thalheimer: Do you see my whole screen or just…

Convergence Training: The full screen, not just one PowerPoint slide. There we go—bingo.

Dr. Will Thalheimer: Hopefully none of my secrets are there.

Convergence Training: I’m sure people will be like sponging that up.

Dr. Will Thalheimer: Okay. So look, one of the things that people have told me, or that I’ve seen out there, is a lot of our learning evaluation experts are telling us that learning evaluation is really hard and we’re not doing it very well. But also we as learning practitioners sort of believe the same thing.

Here’s some research I did with the eLearning Guild just last year. And when we asked people in general, are you able to do the learning measurement you want to do, you can see 52% of them, over half, said no, I wish we could substantially change what we’re doing now. So we as practitioners have this unease about learning evaluation.

By the way, I’m going to have a bunch of links for you. And Jeff’s going to post those nearby. So don’t worry about capturing all the links or stopping the video or anything like that. We’re going to give them to you all later. But this research study is available at the eLearning Guild. If you’re a member, you can get it, and you can get the executive summary if you’re not.

One of the things that I focused on is some of the common mistakes that we make. And last year, I thought I would capture maybe 15 to 20 of these, but I came up with 54. You can see there are some problems out there. And again, I’ll share those with you, you can take a look.

One of the things we ought to realize is that when we’re talking about training, that’s just part of the performance ecosystem. There are all other kinds of factors that get involved there. And sometimes we want to evaluate those. Sometimes we don’t. But it’s good to keep that in mind.

Because there’s been so many confusions about learning measurement, I created this sort of three-part…I don’t really want to call it a model…but three ways to measure learning. And people find it really helpful. So I’ll share it with you. And we can measure learning to demonstrate the value of the learning, we can do it to support our learners in learning, and we can also improve the learning as well.

Now when I asked about this, I’ve done some research on this with eLearning Guild a long time ago, but also when I continue to ask and work with clients, when I ask learning practitioners: “what does your organization want from you, let’s put these in order,” and it almost always goes like this: the first thing our organization wants us to is demonstrate the value, then support learners, and only down at the bottom, is focus on improving the learning.

Now my thing is that this is the most important, this is where we can leverage things. Because if we’re not improving the learning, we’re not going to maximize the way we can support learners, we’re not going to maximize their learning, and we’re certainly not going to maximize the value that we can create. So improving the learning is the linchpin of all of this.

Convergence Training: Yeah, sounds like a little parable of a cart and a horse there.

Dr. Will Thalheimer: Exactly. I like that.

So there we are, good-looking learning professionals. Some of us are not quite as well coiffed as these folks.

But anyway, when we think about data and analysis, you know, we need to collect data that’s accurate, valid, relevant, highly predictive of what we care about, what’s important, that’s also cost-effective. This is something we sometimes forget about, but we don’t want our evaluations to cost so much that it hurts the overall cost benefit.

The most important thing though, is when we do our learning evaluation we want to help. We want that data to help us make our most important decisions. And I’m sure you all have your own thoughts on what our most important decisions are. But here’s some of the things that I consider some of our most important decisions.

  • Is the learning method, or methods, we’re using working or should we use another one?
  • Is this skill content or teaching useful enough to teach?
  • Are we doing enough to give learners support in applying learning?
  • Are we sufficiently motivating our learners to inspire them to act, to take the learning and actually implement it on the job, overcome the obstacles with implementing learning, etc.?
  • And then is training useful?
  • Or should we provide other or additional supports–we know that training doesn’t work in a vacuum and so are there other supports we can make?

So there are some things, like this, that are crucial to our performance as learning professionals. If we can use our learning evaluation to get feedback on these, we ought to do that.

A couple of other things. One of the things to think about in learning evaluation is the inputs and the outputs. So we’ve got learning interventions, right? Whether it’s classroom training, whether it’s people learning in the workflow, hands-on, or whether it’s elearning–doesn’t matter, we can evaluate it.

When we evaluate the outputs, what are the results? We can also evaluate the inputs. So the outputs look like things we’ve seen before right? Smile sheets, the learner perceptions, learner knowledge, what have they understood? Are they able to make decisions? Are they able to remember? Can they perform on the job and are they sharing what they’ve learned as well? There’s going to be many types of outputs. These are the effects of learning interventions. But we can also evaluate the inputs.

So one of the things we can do is research benchmarks, so we could look at our learning designs. And I do this a lot in my consulting work. So people come to me and say, “Will, we’re not sure about our learning designs, we want to make them better. Can you do a learning audit for us?” So I compare it to some research benchmarks, some best practices. I use the decisive dozen, but there are other things people can use. So we can also look at our designs, our analysis, our assumptions, and look at and get information about our learning by looking at the inputs as well as the outputs.

Now most people, when we think about learning evaluation, are thinking about the output side. And that’s perfectly legitimate. I just want to emphasize here that sometimes we can look at the inputs as well.

Four Common Learning Evaluation Models–Kirkpatrick, Kaufman, Philips & Brinkerhoff

Convergence Training: So I think that’s a great intro.

Our next question that we talked about was just discussing some of the more commonly used training evaluation methods. And I wonder if you could tell us a couple of the most common ones and then we’ll, once we have them kind of on the map, we’ll drill down and talk about each one.

Dr. Will Thalheimer: Sure. Well, of course, the most common, the most well-known, is the Kirkpatrick four-level model. And it’s been the dominant model in our field for a long time.

There are also other models. People talk about the Philips model (aka ROI), the Kaufman model, Roger Kaufman’s work, and Rob Brinkerhoff as well through the success case methodology. Those are the big four.

Of course, I’m a little biased, I would add LTEM, the new model that I worked on, as well.

Convergence Training: All right, great. And again, for everybody listening, we are going to talk briefly about LTEM near the end of this one discussion, and then we’ll have an entire detailed discussion on LTEM in a second recorded discussion. So hold on for that. (Note: Here’s that second conversation on LTEM.)

And okay, if we can just walk through each of the four models you talked about–Kirkpatrick, Phillips, Kaufman, and Brinckerhoff–and maybe you can explain to people, especially people who may not have heard of any of these, what they are and what are some pros and cons of each.

And I’ll interrupt you after we discuss Kirkpatrick just to get a little interesting background, because I know you’ve done some research on as well.

The Kirkpatrick Four-Level Training Evaluation Model

Dr. Will Thalheimer: Okay, so and we’ll go back to the picture that paints 1,000 words. So this is this is a picture of Donald Kirkpatrick.

And most of you know the Kirkpatrick model. It’s level one, reaction…these are sort of the learner feelings about it. Level two is learning. Level three is the behavior of learners when they get back to the workplace and start implementing and applying what they’ve learned. And level four is the results that they get from that behavior.

Okay, so fairly straightforward. Now, I think I’m going to anticipate one of your questions.

Convergence Training: OK, fair enough.

Dr. Will Thalheimer: So I have now started calling this the Kirkpatrick-Katzell four-level model because as it turns out, Raymond Katzell is really the originator of the four-level idea. And we’re going to give you this link later. And if you’re interested in that, you can read an article that I wrote, sort of a little piece of investigative journalism, if you will. And it talks about Katzell’s role in creating the model.

Here’s Will’s article on the origins of the “Kirkpatrick-Katzell Four-Level Training Evaluation Model.”

So that’s the four-level model. Now, one of the things I talk about in my work is how to evaluate a model, right? Models are good if they help us. So I talk about the messaging that the model sends, really. You can think of this in terms of sort of the behavioral economics notion of nudging. What does the model push us to do? And models push us in good ways and bad ways.

Well, the Kirkpatrick model is the same as all models, it has some beneficial messages. So the most important thing that it does, it tells us that we shouldn’t just focus on learning. That we should focus on results too. And our whole field over the last…starting 30-40 years ago, began to move from a focus on the classroom to a focus on performance, and the Kirkpatrick model was aligned with that result.

The other really beneficial message that it sends is that learner opinions, learner surveys, are not that important. So they put those down at level one. That’s important to know because too often we default to those.

By the way, in the in the report that goes along with the LTEM model, I go into a lot more beneficial messages and harmful messages of the Kirkpatrick model, but here’s just highlighting a few of the top ones.

Here’s Will’s report on LTEM.

Convergence Training: Will, before you get started on harmful messages…and just to give a plug. While level one, the evaluation surveys and smile sheets, are the least important level, I do want to mention that Will has written a great book to help you get more value out of that. So keep your eyes open for that and apologies for the interruption.

Learn more about Will’s book on smile sheets here.

Dr. Will Thalheimer: No, no worries.

Well, and actually let me point out that, even though Will has written a book on smile sheets, notice he’s saying that these are not the most important things.

Okay, so one of the things that’s harmful. The four-level model does not warn us against ineffective evaluation practices.

Now if I asked you what you think the most common way that we evaluate learning is, you’re going to tell me it’s smile sheets. But that’s not actually correct. The most common way is that we measure attendance or completion. So the Kirkpatrick model doesn’t warn us against those. And I think a good model should warn us against things that we should not do, not just things we should do.

The Kirkpatrick four-level model also ignores the role of remembering. So we as designers of learning, we want to support people in being able to learn, but we also want to support them in being able to remember. And from a learning design perspective, those are two different things. So from a learning evaluation perspective, those things should be taken into account separately as well.

And probably the biggest problem with the Kirkpatrick model is that it puts all types of learning measurement into the level two bucket. They’re all mashed together. So we can measure things like trivia, the regurgitation of trivia. That could be in level two. It could be the recall of meaningless information, or the recall of meaningful information, or it could be the ability to make decisions, or the ability to actually perform a task. All these things are related to learning. But because we put those all in one bucket, sometimes that means that we just sort of default: “Oh, you know what, we need a level two!” “Oh, okay, let’s do a knowledge check.” And that really creates a lot of problems. It gives us bad information when we evaluate, but it also pushes us as learning designers into creating learning that’s not really that effective.

So some beneficial messages, some harmful messages.

Convergence Training: Could I ask you a couple questions on the right hand column there?

Dr. Will Thalheimer: Sure, absolutely.

Convergence Training: Can you tease out what you’re talking about, about ignoring the role of remembering?

Dr. Will Thalheimer: Okay. So we can teach somebody something. But if they forget it within a couple days, then we haven’t really done our job. We haven’t done our job well enough, because most of the time, people need to remember what they’ve learned over at least a little bit of time. And even if you teach 20 things on a Monday, how many people are going to use all those 20 things on Tuesday? Well, not many, probably. Maybe they’ll use five things on Tuesday. Five more things on Wednesday, and then you know, the rest of the week, maybe a few things. But you can see some of those 20 things are not going to be needed for a week or more, two weeks or three weeks. So remembering is really critical.

Convergence Training: So what would be your response to somebody who said like, but that’s implicit in the level three evaluation on the job observations about the job behavior, which presumably would accomplish some of that memory, right?

Dr. Will Thalheimer: So we have to be careful here, though. Let’s say you try to measure behavior on the job. And let’s say they fail. Well, why did they fail?

One of the most important things is they might have failed because they forgot. They could have failed for other reasons as well. But if we’re not measuring remembering, then that causal chain is broken. We don’t know what went wrong.

Convergence Training: Okay. Yeah. Cool. Thanks.

And then level two learning is mashed up. How much of that problem could be solved by simply having better level two assessments instead of restating trivia, like you said?

Dr. Will Thalheimer: Well, obviously it could. But it doesn’t.

What I’m saying is because we’ve had this model that has level two as one big thing, we have not been pushed, we have not been nudged, we have not been sent the message, that there’s better learning measurements to do.

Convergence Training: Right, good.

Dr. Will Thalheimer: And we’re going to see this again in LTEM because I basically designed it to be a response to the weaknesses of these other models.

Convergence Training: Cool. Thank you very much.

Dr. Will Thalheimer: So just so you don’t think that I’m the only one being critical of the Kirkpatrick model or the four-level model. This is a scientific review and a top-tier scientific journal, and they evaluated the Kirkpatrick framework and they said : “It has a number of theoretical and practical shortcomings. It is antithetical to nearly 40 years of research on human learning…leads to a checklist approach to evaluation. And by ignoring the actual purpose for evaluation, risks providing no information of value to stakeholders.”

So that’s pretty damning. Now, that’s not the only one. This was just published this year: “Kirkpatrick’s framework is not grounded in theory. And the assumptions of the model had been repeatedly disproven over the past 25 years.” And they showed a number of research reviews of that. And that was published by Tracy Sitzmann and Weinhardt.

Dr. Will Thalheimer: So let me sort of summarize.

So the Kirkpatrick framework has been our dominant model for a long time. It’s got some good benefits. It’s got some things that are not as useful as they could be.

In all fairness, it was designed back in the 1950s. This is before the cognitive revolution in psychology, before we really compiled a lot of the most important research and learning, so you wouldn’t expect that it would be integrated with that science of learning stuff.

So now we have an opportunity to go beyond the Kirkpatrick model.

The Philips/ROI Learning Evaluation Model

Convergence Training: All right, great, and you’re right, you did anticipate my question about the interesting history of the Kirkpatrick model. I’ll include some additional links for people who want to go down that rabbit hole, and maybe now can you turn your attention to the second model? I think we decided we’re going to talk about the Philips ROI model now.

Dr. Will Thalheimer: Sure. Okay, so the Philips model basically takes the Kirkpatrick four-level model and adds ROI on to the end of it. And I’m going to go into ROI, but it means return on investment, just like you would learn in business school. We’ll get into that.

But I just want to emphasize that even though Jack Philips is known for the ROI model, he does a lot of evaluations, most often with his wife Patty. They do great work, they’re out there evaluating all the time, they’re not just teaching workshops. They do a lot of great stuff.

So this is what the ROI methodology looks like. And I’m just giving you the high points. But here’s how you do it. So based on the training, what have you done differently, if anything? (These are the questions you ask of learners who have gone through a training.) Based on the training, in what ways has the organization benefited? Based on your accomplishments, enabled by the training, what is the monetary value of the organization to the organization and make sure you explain the basis so they’re really pushing them. What percentage of the improvement was due to the training, from 0 to 100? What confidence do you have in your estimate? Again, zero to 100. And then, was the investment in training an appropriate investment?

And then the calculation is very simple, right? You take the benefits and the costs. But what Jack really adds to this is that, as you can see in in four and five, you’re really getting a way to be very conservative about your estimate. So, you might estimate that you save the organization $100,000. Well, what percentage of the improvement is from the training? Wow, maybe 50%. So you can see that’s going to cut it down to $50,000. And then what confidence do you have, and that’s again going to cut it down. So the benefit side of the equation, the top part of the equation, is a very conservative estimate.

Convergence Training: Will, just to clarify something and maybe anticipate something you might say soon–these are questions asked of the learner. Is that correct?

Dr. Will Thalheimer: That is.

So the strengths of the model are that it’s framed in terms that some of our organizational stakeholders, some of our business stakeholders, really care about: ROI, return on investment. Also, one of the strengths of it is it takes a very conservative estimate.

Now, on the weakness side, you can see that this is the subjective input of the learners and learners may not be very good at estimating, how much benefits there are, sometimes they can but it depends on the type of type of learning.

The way I see ROI is there are times when we have particular type of stakeholders where this is going to be very important. But, you know, probably don’t want to do this as the only thing we’re doing.

Convergence Training: I anticipated and you made the point, but just to underline it, how often do you really think a learner could put some kind of ROI estimate and do that in an accurate sense? What are your personal feelings about that?

Dr. Will Thalheimer: Well, there’s some areas that’s easier than others.

So if you’re a salesperson, right, and you take a sales training on a new product, or you know, on an old product, and you see your sales increase, well, you probably have a pretty clear idea about that.

Now, something like leadership training, I used to be a leadership trainer, right? You ask how much does this improve your people’s productivity and how does that relate to how much money they’re making for your organization. That’s really not so easy to do.

Now, even then, you could argue that there are times when you have some stakeholders, for whom this kind of thing is very important. We don’t work in a vacuum. We’re trying to do two things at once. We’re trying to create excellence in learning and performance. But at the same time, we need to maintain our budget and not get fired, things like that. You know, even Jack says, you probably don’t want to do this on that high of a percentage of your programs. I think he estimates 5% or something like that. So now there’s some real times when it’s valuable, and there’s other times when it’s less viable or really not needed.

The Kaufman Learning Evaluation Model

Convergence Training: All right, great. Kaufman is our third model, are you ready to talk about that?

Dr. Will Thalheimer: Sure.

Convergence Training: Cool. Thank you.

Dr. Will Thalheimer: Alright, so there’s Roger Kaufman. He has a five-level model too, and he’s sort of based it on the Kirkpatrick model a little bed but not too much.

And by the way, he’s working with some other folks. I’m not sure I know all the folks, but John Keller’s one of them, Ryan Watkins, I know he’s done work with Ingrid Guerra.

But basically, he talks about level one, but it’s not just reaction. He wants to add stuff about the inputs, the learning supports, all the kind of organizational things. Roger’s real focus is…well, he’s really got two focuses.

One is that this should not just be about learning. Remember, I showed you the performance ecosystem before? And so when he talks level two, it’s not just about learning. It’s about acquisition of learning, and resources, and everything you need to get your results. You can see that in application, too, so it’s not behavior based on learning, it’s application. And that can be not just learning application, but all the other resources you’re applying as well. Level four is results because results are results. And then level five, and this is one of Roger’s great contributions to our field, he calls it “mega,” but it’s really the societal impact, the impact that goes beyond just our focus on our organizations.

Convergence Training: A sidetrack here…you’ve written a kind of interesting article about elearning and global warming, haven’t you, that maybe partly relates to that?

Dr. Will Thalheimer: Well, yeah, I I tried to look at the big picture.

So one of the big pictures is…my wife just sent me this article yesterday to calculate how damaging flying is. These planes, they go up into the, I don’t know, stratosphere or somewhere up there, and they’re spewing a lot of pollutants into the air and causing global warming and things like that. So I just asked the question, should we look at our travel budgets and our traveling as part of a big-picture ethical kind of concern?

So yeah, it’s very interesting. I published that and some people really liked it. But there was a lot of silence. Because I think it’s hard for us, this is what we’ve been doing.

But it does suggest to me that if we can create really good elearning, then we should try to do that, not just because it can be well designed–in fact, some research I did a couple years ago show that in the wild, if you just have a classroom training versus elearning, elearning tends to be better designed. Because we just follow the old methods in the classroom, we lecture basically. It doesn’t have to be like that, both can be better designed. But yeah, so there are some opportunities ther

Jeff Dalto, Senior Learning & Performance Improvement Manager
Jeff is a learning designer and performance improvement specialist with more than 20 years in learning and development, 15+ of which have been spent working in manufacturing, industrial, and architecture, engineering & construction training. Jeff has worked side-by-side with more than 50 companies as they implemented online training. Jeff is an advocate for using evidence-based training practices and is currently completing a Masters degree in Organizational Performance and Workplace Learning from Boise State University. He writes the Vector Solutions | Convergence Training blog and invites you to connect with him on LinkedIn.

Contact us for more information