On the afternoon of 13th January 2016, I spent a very enjoyable and insightful 90 minutes on the phone with Ron Kaye. Until late last year, Ron was the leader of the Human Factors PreMarket Evaluation Team within the FDA’s Ofﬁce of Device Evaluation.
To many people in the industry, Ron has been the founding father of Human Factors for Medical Devices, and many of the procedures in place today at the authorising body, are the ones that Ron developed and implemented during his career there. Also, most of the people currently working in the Device Evaluation team, owe their understanding of this social science to Ron.
What follows is a lightly edited transcript of the conversation we had.
Moderator: Sean Warren (SW) Respondent: Ron Kaye (RK)
SW: Ron, thanks for your time today. Let’s start by you introducing yourself to the audience.
RK: I have a Masters degree in Applied Psychology and I got into human factors in the early 80’s. The technology I began working with had always been high-risk technology, where the risk of user error could have serious consequences.
I later began to work with military weapons systems and communications systems. Following that, I also got into human factors for air trafﬁc control systems. I moved to MITRE, – a high-tech think tank, and there I worked with air trafﬁc controller systems and transport aircraft cockpit collision avoidance systems, speciﬁcally one called TCAS, which I believe they’re still using today. I’m sure it’s several generations in, being an automated system, but that system would advise aircraft pilots how to take speciﬁc actions to avoid mid-air collisions in the terminal space (after take-off and during landing approaches).
Then I came to the FDA and I did ultimately develop the process with medical devices. Because there was no pre-market review process for human factors in place and “use error” was becoming an increasing concern for the agency, I did ultimately develop the pre-market review process for human factors, for new medical device applications coming into the FDA.
2″ I guess the realities of the environment that ﬁrst struck me was that there were accounts of doing human factors for medical devices but it was uncommon and rarely focused on risk associated with medical device errors. Questions were directed only at assessing the opinion of the device users, how easy it was to use a device or how much they liked it. What was missing was a systematic valuation of the critical tasks that users do. I noted that some devices for which use error became a problem in post-market, had either no human factors done on the device by the manufacturer or that which was done, was superﬁcial and did not address the actual problems that users had while using them.
For instance, to consider the safety and effectiveness of the design of a piece of technology from a perspective of human factors, whether it is an aircraft cockpit advisory system or medical device, the user interface design is the element of consideration. The human factors assessment is essentially meaningless if the focus of the assessment does not consider the most important interactions that users make with that technology. This requires a systematic consideration of critical user tasks. A critical task can be thought of as a user action (or inaction) that if you don’t do it right, the result could be some kind of serious problem, such as inadvertently hurt or even killing someone. If we assumed that critical task could be the calibration of a medical device, this and likely other critical user tasks would not be considered in the inadequate human factors assessments. There was too much emphasis put on things that were mundane, every day types of tests and a lot of emphasis on grading scales and criterion for success of the test drives around rating scales and those were all so variable. Some of the medical device industries simply preferred not to acknowledge that risk-prioritised human factors testing methods existed. Some, even, still have that view. In addition, the kind of errors, why and how these were made, were also not considered. Most often however, human factors was simply ignored in those days.
In the light of this, a colleague and I developed the initial guidance at the FDA, the 2000 Guidance, which is still in use today, to try to get human factors, the practice of human factors, incorporated with the risk management process.
The main message of that guidance was users of medical devices can make errors when using them. Perhaps they’re confused or for whatever reason, but because of their interaction with the design of the user-interface of that device, those “use errors” can happen, the results of which are as serious as results of device malfunction from causes other than use. If somebody gets an inadvertent overdose, let’s say from an infusion pump, they’re just as much killed as they would be if some electrical, software or mechanical component had failed. If it’s a user error, or if it’s the quality of the manufacture of the device itself, the problem is just as bad either way.
SW: Yes, the outcome is the same.
RK: Therefore, human factors should be part of risk management and the use of the device should be evaluated for safety purposes just as the manufactured of the quality of the device should be assessed.
SW: Okay, in summary of that, Ron, originally the world was all about subjective opinions and we think our device is pretty and what you’ve done is you’ve turned it into an objective focused on a variety of different risks rather than a subjective assessment.
RK: I understand how you could see it that way and it’s a perfectly logical conclusion, however, it’s actually a little different from that.
There’s nothing wrong with subjective opinion, and in fact subjective data can tell us a lot; in fact, it is a necessary part of the data that is expected for review in new device submissions to the FDA. We (the reviewers I mean) need both objective and subjective data in those. The difference is the orientation of the test.
RK: For instance, if I needed to know what your favourite colour was and I started asking you about how long your commute was or what your favourite rock group is, I might not ﬁnd out what your favourite colour is, because I’m asking the wrong questions.
In the same way, if you need to ﬁnd out what is causing users difﬁculty or confusion when they’re using a medical device, (for instance, the failure of a “critical task” in simulated use testing), you need to ask them speciﬁcally about that and what they say is important. But this means the assessor doesn’t ask the user questions that don’t have anything to do with safety necessarily, like “how much do you like it?” or “how easy is it?”. A lot of times people will like a device, like the looks of it or they like certain things about using it or what it does, but if you talk to them and if you ask the right questions carefully, you can ﬁnd out if the design of the user interface is causing difﬁculty, confusion and errors or if it is allowing errors to manifest as serious consequences too easily.
‘Well, there was this point where I was using it and you know, the other devices that I’ve ever used that were similar to this, I’d do it this way, but on this one it’s different and I had to almost didn’t do that right’ or ‘I realised from what they said in the training that we took yesterday that I know how to do this, but it’s not the way I would normally do it, and I think a lot of people might have problems with that’. You can get answers like that, which might sound rather vague in this discussion, but in the context of testing and task performance, it results in information that is quite helpful and important for establishing whether the device’s user interface design, will support safe and effective use (or not).
SW: That’s almost like a functional insight of what they are experiencing with that device, so their response is a function of how they are able to use the thing, and so that kind of subjective data becomes very important in terms of understanding how users will really interact with the device.
RK: Yes, and particularly important for the more critical user tasks involved in the use of the device.
There have been a lot ways of looking at the measurement of human factors and trying to develop testing that will lead to some kind of a numerical result, that is derived from the data and with that there will be a cut-off score to indicate an acceptable result. Quantitative approaches are good for a lot of kinds of testing, but if you’re counting the molecules or electrons, you’ve got apparatus that will do that. If you want to know how much pollution there is in drinking water and there are certain levels that’s acceptable for some molecule of interest, you can test for that.
With people and the use of technology, the variability between individuals, variability between types of users and variability between how a user might approach the device using the technology, you can’t get a consistent result or meaningful result by numerical scores if use error and safety error are to be evaluated. Even if you approach the human factors evaluation in a quantitative way and even if your numbers might help understanding of the use interface design, you have to create tools to generate the numbers as well as the criteria you are going to use to establish the acceptability of the numerical score on the test. Experience shows that across multiple practitioners, the numerical scales, their meaning and the cut-off scores are variable. It just doesn’t work. Rating scales and counting yeses and no’s leasing to numerical scores are a cheap and ineffective way to make an assessment of user interface design quality for safety critical systems.
First of all there is whether the question you are asking is focussed on priority or not, and often they aren’t and if it’s not, then it doesn’t matter anyway if you’re concerned about safety and effectiveness of use, because you’re not going to get there, but even if you could start getting at the more important aspect with respect to risk of using the device, if you approach it that way and you create these numbers and these criteria, it’s going to be something to be unreliable when different people do it. If they approach it from a rating scale perspective primarily, that’s just a kind of a cheap mechanism for producing numerical results.
SW: So what you’re saying is it’s a balance between the two?
RK: Well, it’s still a different kind. “Subjective” is a broad word, it has to do with the perspective of the user. Now the perspective of the user, which can be quite helpful, but if somebody says, ‘Well, I’ll rate that a ﬁve on a seven-point scale’ of let’s say, the user uses intuitiveness or something like that, what does that really mean? If I say ‘it’s a ﬁve’, or you say ‘it’s a ﬁve’, my wife says ‘it’s a ﬁve’ and my kid says ‘it’s a ﬁve’, that’s pretty consistent. but what does saying something is a “ﬁve” really mean? Or a seven, or a 3.8?
SW: Yes, exactly.
RK: Does that mean that there’s something wrong with the design? When it comes right down to it, all the FDA is concerned about is whether the device interface has been designed in a way that retains some kind of a ﬂaw in the interface – or not, where the “ﬂaw” in the design of the interface would be something that would cause the user to make an error that could cause harm. Let’s say, go back to the calibration example and say that users in a hypothetical human factors study perform a calibration task incorrectly and as a result,they mis-dose themselves or perhaps mis-dose their patients. The subjective data that is valuable here derives, not from rating scales, but rather from open-ended questions such as asking the user, ‘You seem to have had difﬁculty when you were doing the calibration task. Can you give me your perspective of what you were thinking and what the difﬁculty was and when you were doing this?’ To which the user might say ‘Well, when the the green light lit up, I pushed the green button because green usually means go’. Or who knows what they might say but regardless, what the user provides in response to the right question, and the right kind of question, can provide understanding of the problem with the design of the user interface.
When you get to that level, the user is talking about their experience with the interface of the device and particularly, when difﬁculties or task failures arise, about the characteristics of the design that were involved with the problem. When you get down to that level of assessment, you can also ask the user ’Okay, well if this design element is difﬁcult, what might make it better?’. Because what the FDA human factors reviewer is really concerned with, is if there is some aspect of the design that isn’t optimal, which might be pushing users towards errors or likely errors to occur, so it should be clear why this kind of subjective data, in addition to performance data, is necessary for the reviewer to make his or her determination of the adequacy of the device. But, if you step back a little bit more and look, this kind of data a bit more abstractly, hopefully its clear value to users, patients, FDA reviewers and (although they might not realise) is that you can demonstrate good safe design in this way. As long as the test collects the necessary data, including data that would point to any dangerous design ﬂaws in the medical device user interface, and if you give the test to representative users and under representative conditions of use, the test is capable of detecting ﬂaws in the interface. If there are no problems found (or often, there are a few problems but they are minor and don’t indicate a design inadequacy) then you can conﬁdently conclude that device will be safe for intended users, uses and environments. That is what the FDA wants. That should be what everyone wants. I’ll repeat that because it’s sometimes hard to get the ﬁrst time. If you can apply a test that is powerful enough that it can identify design ﬂaws like that if they exist and you don’t ﬁnd them, or you don’t ﬁnd much then the human factors testing for the device has “passed”. It’s really pretty simple.
RK: That’s how to pass the test, it’s not about achieving an average score of 7.9 or 5.2, or any of this nonsense. So it’s a matter of performance, of words, of discussion of ﬁndings and a description of how the users are interacting with the device.
SW: Okay, I understand that.
RK: Yes, yes, okay.
SW: I understand that completely. That’s very similar to the sorts of things that we do on our innovation side.
We have a front end innovation process that we teach to global companies. We started off by having lots of things that were balanced score cards with numbers and just as you said, we got rid of them all because when you came to present the data, whether you’re presenting it to a regulating authority or your boss, customers had no idea why they scored something at ﬁve and whether that ﬁve was the same as somebody else’s ﬁve, because there was no commonality there.
RK: Right, right. And how you deﬁne your terms of measurement and assessment, are they really relevant?
SW: Yes, absolutely.
SW: Yes, and I forget the number of times I say to people, ‘Don’t answer the question you think I’m asking, answer the question that I actually am asking, because that’s the one I want the answer to’, because it doesn’t matter who you talk to, whether it’s a patient or a chief executive, or chief technology ofﬁcer at a global organisation, everyone wants to please everybody and they try and give you the answer they think you want by answering a different question, and the whole thing about this is to, ‘Just answer the question I’m asking you’.
RK: Human factors is really a social science that’s very much blended with engineering, but engineering has much to do with modifying things, you know. They’re very clever and good with measurements and maths and numerical assessment. Statistics, which of course I took a lot of when I was in grad school, taking experimental design, and statistical analysis of behavioural science etc. I was marinated in quantitative assessment and techniques. The problem is when you get out into the real world those doing human factors testing don’t have time, don’t have the necessary control to really evaluate or “test” in a way that produces meaningful quantitative results.
RK: So, some of the early human factors efforts that we saw at the FDA, in addition to rating scale data, we also saw a lot of studies containing timing data, speciﬁcally how long it takes a user to do a task. In essence, they were saying ‘We’ve got a stopwatch and we know how to use it and it’s digital and we can write down the numbers’. So they would time things like, how long it takes somebody to take an infusion pump out of the box, how long it takes them to ﬁnd the cord and plug it into the wall, and that sort of thing and getting tons of data on this type of rubbish that doesn’t mean much at all regarding the adequacy of the data.
SW: So possibly not relevant then?
RK: Yes, far from relevant for establishing safety and effectiveness of use. Testing like this is essentially running in circles and generating lots of data and often doing some elaborate analysis of it. Quite often in my job, I would practically want to weep just thinking about the number of staff hours that went into studies of this kind that are completely worthless.
SW: So I guess that’s where the risk thing comes in?
RK: Well, one second, let me clarify something before we move on. You asked me about subjective data. What I’m saying is that it seems more “scientiﬁc” to focus on “objective” rather than “subjective” data. What I’m trying to get across is that for device use, the subjective experience is also important if it is “good” subjective data. Again, a user rating a device a 3 or 5 on a 7 point scale of “ease of use” is clearly “subjective” data, but it provides, at best, only a tiny bit of information. But subjective data derived from open-ended questions that focus on important aspects of device use like critical user tasks, known problems with use, failures or difﬁculties encountered during simulated use allow for data that illuminates the user’s experience.
That subjective data is so important because it reﬂects the users experience with the user interface. Also important because it can show what the design problem is and often makes any necessary design modiﬁcation more understandable. Let me explain another reason that subjective data is important. Let’s consider a new model of infusion pump that will ultimately be used perhaps tens of thousands times a day. If there happens to be a ﬂaw in the device user interface, that won’t cause a use error for every use or even every user. Such ﬂaws in will only impact a small fraction of the uses, or the users.
So if you are running a test with, let’s say, ﬁfteen people and have each one of them use the pump for an hour, even if that device does have a design problem, you might not ﬁnd it by observing them because they might not make an observable error. They’ll work around it. They’ll be more vigilant perhaps, or if speciﬁc circumstances that would lead them down that path aren’t part of the simulated use scenarios that isn’t going to occur-, you don’t know. But if you talk to them and ask them about their experience of the device, often if there is such a problem, they will-, there’s a very good chance that they will tell you about if you ask the right questions.
RK: So the subjective data is therefore a very powerful source of information wherein, for situations, in which the power of the objective performance data is limited in its ability to detect a problem. For medical device use, this is quite often the case. I want to be very clear about why subjective data is important and it’s important because it tends to ﬁll that gap. It’s really absolutely essential and that’s why review of human factors for new medical device submissions is considered to be incomplete if subjective data of this kind is not provided.
SW: Okay, that’s brilliant.
RK: But you were going in a different direction, so.
SW: No, I was going to ask you to just comment a little bit about risk, because you gave a great example of measuring how long it takes to take it out of the box and plug it in and throw the packaging away.
RK: Right, yes.
SW: And, I can see how people could get drawn to looking at that sort of information because it’s easier to capture, but as you said,it’s not relevant, so how does risk, or risk assessment dovetail into the whole human factors and application process?
RK: Right, well, certainly risk assessment is absolutely necessary, but there are some considerations about it, with respect to its inherent limitations that need to be acknowledged. I think I can best answer this by talking about what often goes wrong or how it can be used ineffectively.
RK: And so I guess the main thing is not doing risk analysis and not incorporating that into the human factors assessment approach, is the worst thing.
SW: Yes, exactly, yes.
RK: Certainly and I’ve certainly seen plenty of that – where “human factors testing” doesn’t mention risk at all! Then there are cases in which risk analysis is incorporated. But that can have problems too. To understand this, its helpful to understand a bit about how risk analyses are done and what, therefore, risk analysis actually is. I have some actual experience in my past from doing risk assessments. It can be done in a variety of ways but it comes down to a few smart people trying to “anticipate” speciﬁc use errors, trying to evaluate their priority and likelihood and then incorporating the results into the structure of the human factors testing and evaluation approach. You might start with the question ‘Well, what might somebody have a problem with when using this device?’ or you can work backwards from known anticipated use errors, let’s say somebody gets an overdose from an infusion pump, then try to establish how that result might happen and whether there are multiple ways it could happen. Even though the risk analysts may be expert, in doing this work and it’s certainly good work, still the analysts are not representative of the users.
Their abilities are different and most often they are engineers, who are very clever and capable with technology and very familiar with the medical device. Actual use errors are often counterintuitive or even bizarre from the perspectives of the risk analysts and therefore some kinds of error (or even use error scenarios) can be difﬁcult or impossible for the analysts to even imagine. Similarly the “frequency” component of risk associated with a use error scenario that has been anticipated during the risk analysis, can be grossly underestimated. Therefore, risk analyses can produce results that are not reﬂective of the actual risks associated with real users and real uses of the device and therefore the results can be incomplete. In addition, the results of risk analysis typically generate numbers. Like rating scales and checklists, risk priority numbers are helpful but when they are incomplete or inaccurate, they can be misleading.
Typically, risk is a product, a mathematical product of likelihood multiplied by the severity (with numerical “estimates” applied to each). What do these numbers really mean? For instance, how likely is likely? I said previously that some of the more dramatic cases of interface design have led to problems, still only impacted a small minority of users and uses. The point is though that once they’re found, they can be corrected and the problem goes away. But if the use error scenario is dismissed from evaluation because the priority estimate is a low number, they will likely not be detected.
RK: So, this can hard to explain so let’s go back to the calibration task. If you did a risk analysis and you looked at the risk associated with calibration, you might’ve concluded “there are some serious risks that could cause harm, so we’ll assign a high number to the associated user tasks for “severity””. Then, for “likelihood”, we say extremely remote, or some value that is very low. This happens quite often. The analysts rate the likelihood of occurrence so low, that when you multiply those numbers, severity of harm and likelihood (remember these aren’t fact, these are estimates) together, the low likelihood of occurrence number times the harm number, yields a low product – a low product a low Risk Priority Number (RPN). Often then, the manufacturer will apply a “threshold” for the RPN and everything below the threshold is not included in further evaluation or testing. They say, in effect: ’Identiﬁed risks associated with RPN values under this threshold, we’re not going to evaluate. We’re only going o evaluate risks and associated use error in scenarios associated with RPN values that are above this threshold.
RK: But, at the same time, we know that when these things happen, they’re fairly infrequent anyway, so there’s seriousness in application of RPNs to use error scenarios. Lots of design ﬂaws have been ignored in this way. This is why for the FDA human factors reviews, it doesn’t matter how you rate the likelihood, if it is possible that the user is going to have a problem with this that could lead to serious harm, it should be included in the human factors evaluation in testing for this reason, the FDA human factors would call this a deﬁciency. And it is a deﬁciency.
RK: Risk analysts estimate harm better than they do frequency of occurrence in my experience, but still the number associated with harm is analytically derived and sometimes does not coincide with reality. For instance, harm can be pretty low for a given use error, but if that error occurs repeatedly, the individual incidents could result in an accumulation of harmful effect over time which, might not have been taken into account when assigning a numerical value to harm, associated with a speciﬁc use error scenario.
RK: If frequency of occurrence is necessarily omitted from the calculation of RPN, analysts might say, and i’ve been asked questions like this, ‘What about the risk of having an earthquake right when somebody is controlling the device and their ﬁnger shakes? The machine shakes, or an asteroid falls through the ceiling and hits them on the head and they make an error because of it?’, and all these things are possible, right? They’re non-zero and yes that’s true, but, the idea is not to go down the road of psychosis and schizophrenia with this. We don’t have to be distracted by this kind of thing, when there are plenty of potential use scenarios involved with the use of the device that can be meaningfully considered in the analysis.
SW: Yes, exactly.
RK: The idea is to keep it real and talk about what really might happen, and that’s how I feel about that argument. So, in terms of risk management and risk analysis, it has those drawbacks regarding frequency and these are discussed quite well in ISO 14971 Application of risk management of medical devices.
This is the risk manifest for medical devices and is probably the most well-known standard, and the recognised standard by the FDA and widely applied. It originally started out talking in this way and it still has that language, because you do have to do this. You have to do risk analysis, but it warns in Annex D I believe, speciﬁcally about this, about taking risks out of consideration based on multiplying them by a low-frequency number on that product.
RK: But still manufacturers often do this and they do it for use error scenario risks. The traditional way of doing risk analysis is that you do do that, that’s the game, that’s what you do and so that’s often not observed, but it’s particularly important for human factors not to do that.
Even if people are doing their best efforts, they are human and they are variable and you can get a group together and you can average the ratings, and I’ve been part of that too, but, so as far as the FDA pre-market review is concerned, and how all this applies to the human factors testing that will be reviewed, it’s a cardinal sin to eliminate user tasks based on risk assessments, that include a low probability in occurrence and the subsequent numerical cut-off process that I was talking about. You can’t do that. That would be a deﬁciency.
SW: So it’s almost like robustness in your thought process. We considered it and we thought about it and we didn’t do anything with it because it’s such low probability, but I guess the approving body needs to see that the fact that you’ve been-, you’ve had some depth to your risk analysis, rather than just saying, ‘We’ve only thought of the three things that are generally going to happen, rather than the thirty-eight low probability, but high outcome events that we also thought about as well’.
RK: Oh yes, I guess maybe the way I would summarise it would be that you use your risk analysis as an important development or tool, but you don’t overweight it, in terms of what it’s telling you. The part that is based on analysis by, you know, what I call a bunch of smart, expert people sitting around a table. Those are estimates or approximations, not facts.
SW: Exactly. Yes, exactly right. So for human factors there is a real need for risk analysis?
RK: Oh yes, risk analysis should be done and to the extent possible any use error scenario that could result in error and possible harm should be included. But the data should be considered to be preliminary. It should be subject to reﬁnement or modiﬁcation based on formative testing (better) or human factors validation testing, where new use error scenarios might, and often are, identiﬁed when representative users interact with the device (or prototype) through simulated use.
In retrospect, the analyst might say: ‘Well, we thought this was very low risk because we thought it was very infrequent’, then we do some formative testing and we get some test participants saying ‘I just about screwed that up and that would have been a big problem.’ Then you reassess your risk and your risk priority, based on this information or add a new use-related risk scenario. In evaluating human factors, you can do a certain amount by expert reviews, or smart people sitting around a table, but there’s always the unknown because you’re dealing with human behaviour and you don’t really know for sure until you grab them and watch them use the device and talk to them about it. That’s why you go and get people and bring them into a room and play that human factors game with them, because you can get the necessary information that way. It’s simply doing it right.
RK: Regarding what results can be found in simulated use testing and what should be done about it, the potential for ﬁxing the problem should be part of the consideration moving forward. For example, if a design feature is causing some problems with use and the ﬁx is readily available (for instance maybe the font is too small and difﬁcult to read and it could be easily made larger) that should be done. Even if a frequency of occurrence estimate value is small. If the potential harm includes injury or death, even if the design solution is difﬁcult to understand or implement, focused attention should be applied to the ﬁx and subsequent evaluation, as part of the human factors process. If it is found that some use scenarios that could result in harm are simply unmanageable or not able to be completely prevented – for instance “off-label use,” these should be described in the human factors report.
SW: Okay, I’m with you.
RK: So risk analysis is an important part of human factors evaluation and I suppose it also works the other way too. You do your preliminary risk analysis, formative evaluation or testing, certainly you want to focus your analyses on what you believe and what your analysis shows are the more critical aspects of use, but you might ﬁnd out once you start getting users to interact with your device, or its prototype, that certain things are more or even less important than you thought and you can adjust that and then going forward into summative or human factors validation testing. You maintain priority that the test itself is robust enough, even if you didn’t anticipate something, but there is a critical error made, the test will detect that and you will-, and it likely won’t go unacknowledged by a well designed by a well designed simulated use-based test.
SW: Okay, I’m with you, I understand.
RK: I hope I’m not wearing you out.
SW: No, this is brilliant. It’s so good because it’s giving us all clarity on what is required and what’s expected and the fact that there are so many similarities between the way that products are developed in non-medical device, non-healthcare ﬁelds, as there are with the ones that you’ve been talking about, so it’s really useful.
It is a case of understand, talk about it, talk to your customers about it, test it once, go back and test it again, to demonstrate beyond reasonable doubt that you’ve done everything possible to eliminate not only the catastrophic, but generally improbable problems that may actually occur, rather than just saying, ‘Oh, it’s never going to happen because we’re really clever people’ or ‘We have a really clever bit of technology’. Humans always ﬁnd a way, ‘nature always ﬁnds a way”, to quote Jurassic Park!
RK: That’s true. You might get the impression that it means that you need to do more and more exhaustive human factors testing, to try to grind this out, but really if you approach the testing in a way that you are collecting data that is meaningful, this testing can be quite effective. If this is done in formative testing, you have a very good chance of ﬁnding a design ﬂaw or ﬂaws if you do it right, without a whole lot of resources being expended, it can point out modiﬁcations that can be used to make use problems go away.
People ask, how did the FDA come up with a minimum number of users for an acceptable validation test? I saw a test that had only two people and that worried me, so we came up with the number ﬁfteen. Now, in terms of, biomedical research, that’s a small number.
If you’re looking at patients and looking at their improvement from before and after intervention, or even a drug, those numbers tend to be very large, but that’s because they use quantitative measures and that is appropriate.
With human factors testing, once you get up to about ﬁfteen that’s a pretty good size, because you’re going to go deep with your data collection. You’re not just going to do a bunch of quick survey responses and generate some numbers and generate some averages. To do it like that, is like proceeding with a blindfold on. You’re only going to see maybe what’s right in front of you, but what’s worse is you can often get a result that would indicate nothing is wrong when, there may be plenty wrong.
SW: I was going to wrap up by just asking, what would be, in all the years of experience that you have at the FDA and all the different submissions that you’ve seen, what are the top reasons for failure of a submission? Which are the ones that you had a good laugh about around the coffee machine, when you say, ‘You wouldn’t believe what this company did’?
RK: Right, well I guess the number one and the most peevish, is not doing human factors at all.
RK: One is, it’s just not mentioned, not thought about, not contained in the submission and no-one ever even did anything on it. Now that was fairly common earlier on in my career and it got less common, but the other species of that same issue is rationale for why human factors should not be done, that it isn’t appropriate and an example of that would be for instance, ‘the pump and the motor and the electronics, everything here is all the same as it was, we just changed the user interface, we added a couple of functions, we changed from monochrome to colour screen everything and instead of mechanical controls it’s a touch-screen now’.
SW: Okay, I’m with you.
RK: ‘But everything else is the same, so do we need to do human factors?’ Yes, well you do because you just completely changed the user interface
SW: You changed the one bit that your patient is actually going to interact with.
RK: Right, right. That one bit being the entire user interface. Yes, so not doing it at all is always the worst and that could be because, it’s just ignored or it could be because they used, you know, rationale that just doesn’t make sense.
RK: It doesn’t support the decision well, in the context of the case of patient and user safety, but that would be one. Not doing human factors at all. The second one would be basing it on checklists, rating scales, that sort of thing. And often applying some sort of usability objectives, usability objective criteria I guess this is almost the ﬁrst. It’s hard to put these in order because-, but not coherently associating the human factors testing with risk priority of use error, or not being clear if the user task set that’s involved in the testing and evaluation is comprehensive. So that would deﬁnitely be another one. Another one is simply ignoring subjective data. Not asking open-ended questions to test users as part of testing. The tester brings the test participants in, they do the test, they observe their task performance but then they let them go, and they don’t have the subjective side, so that’s incomplete and I think these days, that’s probably the more common failing. And such an unfortunate missed opportunity to do it right. Eliminating consideration based on RPN numbers, as I’ve discussed, of course. So there’s that. Let’s see, now if you’re satisﬁed with those three, we’re pretty good.
SW: That was absolutely brilliant! Thank you so much. I think that certainly taught me an awful lot and it reafﬁrms some of the assumptions I had made from my long time ago history of medical devices and what else we do at Pure Insight.
This is a transcript of the interview between Sean Warren and Ron Kaye. It should not be copied or reproduce in anyway without written permission of the copyright holders ©Pure Insight and Ronald D Kaye 2016
For more information, please contact
Pure Insight Ltd 3D Enterprise House Valley Street Darlington DL1 1GY United Kingdom +44(0) 1325 526000 email@example.com
PERSPECTIVE FROM RON KAYE
©PURE INSIGHT 2016