What does out-of-tolerance mean? An out-of-tolerance or non-compliant condition can mean different things to different people depending on where in the chain of traceability it occurs. Calibration is a comparison of a metrology laboratory’s standard with a known value and uncertainty, to the unknown behavior of the unit-under-test (UUT). In the higher level metrology laboratories in the traceability hierarchy, this comparison data is all that is needed. It is up to the owner of the instrument to perform an analysis and determine the status of the UUT, and the associated impact on their measurement process during the most recent usage cycle between the UUT’s previous calibration and the current calibration. This process is relatively easy to handle for highly knowledgeable metrology professionals who are responsible for a limited number of artifacts, standards, and equipment. However, for the end user who is responsible for a significant quantity of test and measurement equipment, monitoring the behavior of each individual piece of equipment is impractical at best! Fortunately, the manufacturers of test equipment have done most of the analysis work for us. This is accomplished by the manufacturer's specifications which describe what type of behavior can be expected for the majority of the units manufactured, constrained by a typical recommended time interval between periodic calibrations. It has been stated there are no perfect measurements and subsequently, there are not perfect instruments, even new instruments have a possibility to measure inaccurately.
It is from the Original Equipment Manufacturers (OEM) published specifications that end users make their purchasing decisions. It is also from these published specifications that a commercial calibration provider will most likely determine the test limits, or allowable tolerances, for the calibration process. It is entirely up to the customer requesting the calibration services to inform the laboratory which specifications should be applied to the calibration process. Many commercial calibration providers offer a default service that uses the OEM’s published specifications; however it is mandated by any standard’s document to provide a default service such as this and, therefore, is not a requirement. A customer can request their equipment to be calibrated against any specification they provide. At a minimum, it is a good business practice for a calibration lab (internal or external) to converse with their customers with the goal of understanding the expectation for tolerance limits and/or test points. This is a specific requirement under ANS/ISO/IEC 17025 as well. Once the calibration specifications have been agreed upon, the laboratory can calculate the test limits against which the laboratory results can be compared and a statement of compliance can be determined.
The typical customer who uses a commercial calibration provider is looking for the laboratory to make a statement of compliance with the As-Found condition of the UUT. On the surface, making this determination appears rather straightforward and simple, however, upon closer examination, it becomes more complex since all measurements have some uncertainty. How to deal with the laboratory’s measurement uncertainty with respect to the test limits is the issue. ANS/ISO/IEC-17025 only requires laboratories to make statements of compliance, “…where necessary for the interpretation of the test results…” and” where relevant...” There are many different ways to interpret the necessity and relevance of making a compliance statement. Some labs will not make a statement at all, some labs will mark the data that does not meet the limits with an asterisk, but not make a compliance statement, and some labs will make a compliance statement, quantify the results with an uncertainty value and provide an associated probability of compliance to the specification. This can be a very complex topic and will not be covered in this paper. In any case, it is critical for the customer to understand the decision rules used by the laboratory in making any compliance statements.
The statement As-Found: In-tolerance is generally assumed to mean that the entire instrument, all functions, parameters, ranges and test points - are within the calibration specifications at the time of calibration, for the stated conditions at the location where the calibration took place. The in-tolerance condition is a good indication the UUT was performing within expectations during the time period from the last calibration. For the commercial calibration customer who has hundreds or thousands of calibrated items, the statement of compliance may be the single most important piece of information on a certificate. In essence, the metrology laboratory, staffed with measurement experts, has completed an initial data evaluation and concluded the unit to be performing within the agreed upon specifications so the customer does not have to spend very much additional time reviewing the calibration. Likewise an As-Found: Out-Of-Tolerance (OOT) condition indicates that at least one data point in the data report drifted or shifted beyond the allowable tolerance limits and the measurements it was providing may not have been accurate at some point since the previous calibration of the UUT. Again, the laboratory measurement experts have indicated that this unit had a problem and needs further analysis by the customer. The As-Found: Out-Of-Tolerance statement of compliance is the flag or trigger for many quality or manufacturing engineering departments to start an investigation, evaluation or analysis.
The first thing to do when faced with an out-of-tolerance unit is to read through the calibration certificate and data to get a firm understanding of WHAT specifically failed calibration. This is also the point where a complete set of As-Found and As-Left calibration data becomes essential. A Calibration Certificate without data is never a good idea but especially is useless when faced with an out-of-tolerance unit where information is not available to conduct an impact analysis. If the metrology laboratory provides an out-of-tolerance report that only shows the out-of-tolerance data you have something on which to conduct an evaluation, but even this limited information does not provide a complete picture. It is like having a photograph of a forest and erasing all but two trees. A review of all the calibration data should be done to identify what functions, parameters, ranges and test points were found out-of-tolerance. For example, let’s say we have a balance with a full-scale range of 1.1 kg, a resolution of 0.1 g, and an accuracy of ± 0.5 g. The unit was found to read 1.0008 kg at full scale (out-of-tolerance) and intolerance at all the other readings which were taken every 0.2 kg. This means that during the use of the balance over its most recent cycle, any measurements between 0.8000 kg and the full scale 1.1000 kg were likely giving erroneous values to the user of the balance for the measurements taken. Again, a full set of data will be very helpful at this point in answering questions like how many points within a range were out-of-tolerance; was the entire range out of tolerance; were all the ranges even checked; was there a linearity issue; was only the zero out-of-tolerance; or only the full scale reading out of tolerance; were other relevant test points close to or at their limits? The quality of the calibration and quantity of data available can have a tremendous impact on narrowing the scope of the evaluation at this point. We now have a thorough understanding of the out-of-tolerance conditions.
Armed with a complete picture of the instruments out-of-tolerance conditions, the next action should be to identify the time frame during which (WHEN) questionable measurements may have been taken. The objective is to identify a specific time when the instrument was last known to be taking correct measurements. Typically, this is going to be the previous calibration date; the historical calibration certificate will have this date. This is to say, the last time the metrology lab verified the unit’s performance, it was known to be measuring correctly in its As-Left condition. This will provide a starting point to work from, and most likely the longest period to examine. If you are fortunate to have a well-developed measurement assurance program, you might have collected additional data during the period in question which can reduce the evaluation time frame. Most metrology laboratories follow good metrology practices (GMetP) and conduct mid-cycle checks, tests, and inter-comparisons, also called cross-checks, to determine the “health” of their measurement processes and provide confidence in the quality of the measurement process. If these checks are documented and have data supporting the measurements, you may be able to reduce the period of questionable measurements. For example, let’s say our balance in a production cell was found out-of-tolerance during its annual calibration, but you have process a where a precision check mass is used to verify the performance of the balance every quarter. A review of this data may allow you to conclude the balance was performing accurately 3 months ago, so the questionable period is only going to be 3 months instead of 6, 9 or 12 months (or whatever interval has been assigned to the instrument). A schedule of cross-checks and inter-comparisons is often developed for critical measurements or high volume processes (with these out-of-tolerance situations in mind) in order to reduce risk, liability, and evaluation time. At this point, we understand the degree of the out-of-tolerance condition and a time frame to begin identifying where the potential impacts might be found.
The next step is to identify WHERE the out-of-tolerance instrument was used. This is where the really big challenges start. This is where the last link in the chain of traceability is often broken, linking the actual measurement instruments to the products and services provided. The ease of identifying potentially impacted product depends upon the design of the end users processes and systems. The objective at this point is to identify where this instrument has been used during the questionable period. In a large facility, test equipment can move around without tracking its location. This is especially true of handheld instruments and bench level instruments. A robustly designed system with strict instrument control procedures will be able to identify exactly where any given instrument was for any given time frame. For example, let’s say your company has each instrument tagged with an identification number consisting of three distinct fields for Department -Instrument Identification-Location. The department field could stand for a specific work center, cell, or department, while the location could stand for a specific workbench or station. The instrument identification is usually a unique identifier that the company has placed on each instrument. An example of an instrument identification tag string would be: (Production-#123456-X45 Cell). While this is a good system, it still needs a method or a log to track any changes in the department and location of the instruments and the date of the change. Again I stress the strict adherence to maintaining the integrity of the log, any hole or missing location data will bring any evaluation to a halt. Imagine a facility with 50 identical instruments that move around different production cells without any control. It would be impossible to identify what measurements or products it touched and what errors went undetected. For our running example, the instrument was in the Production Department and operated in the X45 Product Cell and was not moved at all. Now, with a robust tracking system that indicates if and when this instrument moved, you should be able to identify where this instrument was at any given time. We now know the following:
- WHAT: the instrument measurements to be concerned about (from the calibration data report)
- WHEN: the time frame to investigate (from the last calibration date or cross-check)
- WHERE: to look for the potential process problems in which the OOT data may have impacted the product (from the instrument tracking log system)
The last step in the out-of-tolerance evaluation process is HOW; that is, to identify how the out-of-tolerance instrument was being used, exactly what measurements were being made at a given location, during the time frame in question. This information will likely be found in the end users procedures, or the operator’s work instructions, or an engineering specification. The objective of this step is to determine whether the out-of-tolerance instrument could have affected any of the products manufactured or services provided by this instrument, in this time frame, in this location, for these measurements. This can be accomplished by reviewing the process documentation, and all revisions that were in effect during the time frame in question, for out-of-tolerance measurements that were identified in the first step. Were any of the out-of-tolerance functions, parameters, ranges and test points used to make the measurements listed in the process documentation? If the answer is no, congratulations, your evaluation is almost complete, you just have to completely document the steps you have taken and your conclusion.
If the process documentation indicates that measurements were taken using any of the out-of-tolerance functions or ranges, then you have to go further and quantify the severity of the impacted products or services and determine if a recall must be done. Now comes the most difficult part of our journey, quantifying the impact on products and services. In order to effectively complete this analysis, a thorough understanding of the affected process is necessary and a working understanding of tolerances and the application of uncertainties is required. Due to the wide variety of applications and situations possible, I will stick to the most probable findings using a few sample cases to illustrate the analysis process. For our balance example, we will run a few possible situations that might occur.
Case 1 - No Impact
(No Impact): Let’s say the process documentation states that the balance is used to measure a 0.600 kg product with a process tolerance of ± 5.0 g. Since our process measurement was not in the out-of-tolerance portion of the balance (>0.800 kg to 1.1000 kg), we can conclude with reasonable confidence that no product was affected.
Case 2 - Impact Evaluation Using Ratios
In this case, we will use accuracy ratios in our analysis. The process documentation states that the balance is used to measure a 1.000 kg product with a process tolerance of ± 5.0 g. Since our process measurement was in the out-of-tolerance portion of the balance (>0.800 kg to 1.1000 kg), the product might have been negatively impacted. We need to go a step further and compare our process tolerance to the magnitude of the out-of-tolerance data. The process tolerance, in this case, was ± 5.0 g, so our process limits are 0.9950 kg to 1.0050 kg. The accuracy of the balance was ± 0.50 g which means the balance is 10 times more accurate than our process tolerance giving us a Process Accuracy Ratio (5.0g / 0.5g) of 10:1. Now the calibration report stated the balance was reading 1.0008 which basically means we had a balance accuracy of ± 0.8 g which drops our Process Accuracy Ratio (5.0g/0.8g) to 6.25:1. Is the risk of a reduced process ratio acceptable? An analysis by ratios can help quantify the potential impact by a rough order of magnitude, but may not be sufficient. For instance, a ratio change from 100:1 to 80:1 may be fairly insignificant, but a ratio change from 4:1 to 2:1 could have quite the impact on the end products. A ratio analysis may be a quick way to rule out potential recalls if the ratios involved are sufficiently high. If the ratios are low, then additional evaluation becomes necessary. This method may also be the only option available if there isn’t any historical process measurement data to review.
Case 3 - Impact Evaluation Using As-Found Calibration Data
The process documentation states that the balance is used to measure a 1.000 kg product with a process tolerance of ± 5.0 g. Since our process measurement was in the out-of-tolerance portion of the balance (>0.800 kg to 1.1000 kg), the product might have been negatively impacted. We need to go a step further and compare our process tolerance to the magnitude of the out-of-tolerance data. The process tolerance, in this case, was ± 5.0 g, so our process limits are 0.9950 kg to 1.0050 kg. The out-of-tolerance data indicated that the balance was reading 1.0008 kg, or out of specification (i.e. beyond the upper tolerance limit of 1.0050 kg) by +0.3 g, which is well below our ± 5.0 g process tolerance, so there wasn’t a problem. Or was there? You might want to jump to that conclusion, and you would be correct as long as your process stayed centered on 1.000 kg, but what if your process moved around, which it will. This is why we have a process tolerance, to begin with! To figure out what is going on here, go back to the fact that the balance was reading high by +0.8 g; the balance has a +0.8 g bias or offset. The balance was actually delivering process limits of 0.9958 kg to 1.0058 kg. Which means any measurements greater than 1.0042 kg during the time frame in question actually exceeded the upper process limit. With this information, you should review any historical process measurement data you have and identify any products that had measurements greater than 1.0042 kg. You have now identified the specific units that might have been impacted by the out-of-tolerance unit and may have to be recalled. But wait, there’s more! No measurement is perfect, so what about the metrology labs data, doesn’t that have some error in it too? Why yes, yes it does….
Case 4 - Impact Evaluation Using As-Found Calibration Data and the Lab's Uncertainty
Continuing with Case 3 information, let’s say the metrology lab reported their uncertainty for the measurement: 1.0008 kg ± 0.71 mg. That means the value they report lies somewhere between 1.000 799 29 kg and 1.000 800 71 kg. This additional uncertainty will carry on down to the process tolerance calculation. So in the worst case, the balance was actually delivering process limits of 0.995 799 29 kg to 1.005 800 71 kg, which in our case is insignificant because the resolution of the balance is not sensitive enough to see this small difference of mass. It is interesting to note that in our situation the metrology lab had an uncertainty of ±0.71 mg for the calibration against the unit’s tolerance of ± 0.5 g which provides a calibration Test Uncertainty Ratio of 704:1 (500 mg / 0.71 mg). Here is where the value of that pesky Test Uncertainty Ratio those metrology guy’s are always talking about comes into play. Had the metrology laboratory’s uncertainty been ± 0.125 g, the measurement would have been 1.0008 kg ± 0.125 g, and the TUR would have been 4:1 (0.5 g/0.125 g) meaning the balance would have actually been delivering process limits of 0.9957 kg (0.995767500) kg to 1.0059 kg (1.00592500 kg). Now, this additional count might not seem like a big deal, but it does increase the size of the potential recall and increase the potential risk and cost.
Again, here is where a complete calibration report with As-Found and As-Left data becomes very helpful. This is also the point where the Test Uncertainty Ratio (TUR) and the Uncertainty of the Calibration Laboratory come into play and why all calibrations should include uncertainties for every measurement (to maintain traceability). The laboratory’s uncertainty information on the measurements they provide will give you the information to further refine your evaluation and subsequent analysis. Every bit of measurement information at your disposal allows you to make additional distinctions, observations, calculations and improves the quality and confidence in your conclusions and recommendations for further actions. The cost of a single product recall will far exceed the additional cost associated with a complete calibration which includes As-Found, As-Left data with uncertainties.
As cases 2, 3, and 4 above illustrate an out-of-tolerance instrument that could affect the end product or service can lead to a tremendous amount of work because the analysis will need to be completed for each product or service identified. This could lead to hundreds or thousands of calculations! As you can imagine, any effort spent in the four steps (what, when, where, and how) in the evaluation process which eliminates additional products to be analyzed is well worth the time. When faced with an As-Found: Out-Of-Tolerance condition, a systematic approach to identify what the out-of-tolerance values were, when the OOT unit was used, where it was used and how it was used will help concentrate your efforts and readily identify the potential problems that will need further analysis. The objective is to filter out as many possible items that do not need closer analysis so you can get to the ones where detailed analysis is required in order to quantify the impact to the products or services provided.
If all this analysis seems like a tremendous amount of work, you are correct. However, it does not have to be this difficult. A well thought out electronic system of linking instrumentation to processes and product traceability as part of a measurement assurance program can ease the burden of out-of-tolerance evaluations and analysis. A measurement assurance program is more than a calibration program; it is a thought process to link and relate measurements through the entire product lifecycle, from concept to end product. And, it is part of the definition of traceability!
Elements of a measurement assurance program start with the research and development phase where the end products specifications are identified. Every step in the manufacturing process supports the end result of a good product. The development of process documentation should include the process measurements and the acceptable tolerances. The selection of the instrumentation used in the process needs to have the appropriate accuracy. In the metrology world, the minimum target is a measurement device that is at least 4 times more accurate than that of the process tolerance; the preferred ratio is 10 times more accurate. Tip: select a single parameter or ranged instrumentation over multi-function instrumentation to reduce the evaluation time by eliminating the need to evaluate unused out-of-tolerance conditions. For example, use a DC volt panel meter instead of a full digital multi-meter; if the DMM is found out-of-tolerance on a resistance or AC current range that is never used, then an evaluation will still need to be completed. A robust program will also include the implementation of Quality Control tools such as Statistical Process Controls (SPC), process capability studies, the use of control charts, the implementation of check standards and cross-checks in high volume or critical processes.
Another important element is the implementation of a preventative maintenance program. There are many different tools and materials used in a manufacturing process that require some level of maintenance, cleaning, conditioning or periodic replacement. Don’t forget to include the inspection, cleaning, and replacement of worn cables, contacts, switches and adapters. One often overlooked maintenance function is the cleaning of electrical terminals on measurement equipment; over time dirt and dust build-up inside terminals and induces small thermal EMF errors. Technically, periodic calibration could be classified as a preventative maintenance function. Calibration should be done at regular intervals with full As-Found and As-Left data with detailed uncertainties.
As previously mentioned it is important to document and maintain control of the location of all measurement instrumentation. One tip worth considering is to rack mount instrumentation rather than leave instruments on bench tops where it can be easily moved from one location to another, it takes a bit of time to remove a unit from a rack and usually not worth the effort by someone needing a unit for a “quick” measurement somewhere else.
Probably the greatest challenge and the most important element, is linking the test equipment to the end product. There are several different approaches to maintaining this final, critical link in the chain of traceability. One approach is to create workstations with the specific instruments identified and associated with that work center where the instruments are seldom moved. Any measurements conducted on the workstation should record the measurement data, the works station number and the date of the measurements. This in effect becomes the traceability number for the end product. It is very important to maintain a log of the instruments in the workstation and document any changes to the configuration. Another approach is to list the individual instrument identification number or asset number with each measurement data set. This is typically the methodology used by metrology laboratories. And finally, the most robust system associates the instrumentation asset number with each and every measurement taken. In all cases, electronic systems based upon paperless certificates, electronic data and databases is the only realistic approach. With the wide range of electronic technology available a system utilizing barcodes or even RFID systems can be developed to readily link the workstation or instrument to a work order to traveler which can be associated with a specific end product or a batch identifications system.
With the complete chain of traceability established the filtering of relevant measurement information can be reduced to a few computer keyboard strokes. Depending upon the type of measurement data recorded, as in the third approach discussed in the previous paragraph, the entire analysis process could be fully automated to identify the potential end product recall down to the individual unit with nothing but a mouse click. The age of handwritten calibration data reports and handwritten process measurement data records needs to come to a close. With the computing power and the technology at our disposal and the speed with which industry and product move around the world, a file drawer filled with a mountain of paper is nothing less than antiquated.
In this paper, we have discussed the general guidelines and approach to solving one of the most dreaded situations in the measurement world: the evaluation of an out-of-tolerance instrument and its potential impact. The four-step process of identifying What, When, Where and How the instrument was used to filter out unaffected products, provides a systematic method to identify the units that require in-depth analysis. Also discussed are some of the elements of a robust measurement assurance program to further ease the burden of detailed analysis using computers and technology for practical application.