Understanding Outliers and Off-Scale Measurements in Data Collection
Data collection is a fundamental aspect of statistical analysis and research. However, not all data points fall within the expected range or measure accurately. One such phenomenon is when measurements exceed the scale of the measuring instrument, which is often referred to as 'outliers' or 'off-scale' measurements. These points can significantly impact the accuracy and reliability of the data, and it is crucial to recognize and handle them appropriately.
What Are Outliers?
Outliers, in statistical terms, are data points that significantly differ from other observations in a dataset. They can occur due to variability in the data or may indicate experimental errors. In some contexts, particularly when values exceed the limits of the measuring instrument, these off-scale readings might also be referred to as 'saturated' or 'censored' data points.
Instrumentation and Time-Limited Data Acquisition
During the process of collecting data, it is not uncommon for measurements to fall outside the range of the measuring instrument. For example, consider an electrical signal that ranges from -11 volts to 11 volts but is only read by an Analog-to-Digital Converter (ADC) that has a range of 0 to 10 volts. In such cases, any value below 0 would register as 0, and any value above 10 would register as 10.
Labeling and Discarding Off-Scale Data
When data points fall outside the range of the measuring instrument, they should be labeled as incorrect and discarded, especially if statistical processing of the population is intended to include these points. This is because their inclusion can skew the results significantly. If the acquisition trace spends a considerable amount of time outside the range, it is suggested to calculate the proportion of saturated points relative to the total number of points in the run.
Common Terms and Their Usage
There are various terms used in different contexts to describe off-scale and outlier data points. Some of the commonly used terms include:
LLOQ (Lower Limit of Quantification): This term is useful when the measurement concentration is below the lowest tick mark on the measurement instrument. For example, if an ADC can read up to 10 volts, but the input signal goes below 0 volts, the measurement would be labeled as 0 volts, indicating a value below the detectable threshold. Off-Scale Low (OSL) and Off-Scale High (OSH): In some scenarios, such as the mission control example during the Columbia space shuttle accident, verbal reports of 'off-scale low' or 'off-scale high' were given to indicate that the measurement or signal had exceeded the range of the instrument. This term is often used in monitoring and control systems to alert operators to out-of-range conditions.Censoring: The Statistical Perspective
The concept of curtailing or 'censoring' is used in statistics, engineering, economics, and medical research to describe a situation where the value of a measurement or observation is only partially known. For instance, in a mortality study, if the exact age of an individual's death is not known but is known to be at least 75 years, this can be considered censored data.
Censoring due to out-of-range measurements is a specific type of censoring. For example, if a bathroom scale can only measure up to 300 pounds, and a person weighs 350 pounds, the observer would only know that the person's weight is at least 300 pounds, but the exact figure is unknown and cannot be measured. This type of data is considered right-censored because the exact value is not fully known but is known to be above the threshold.
Understanding and managing outliers and off-scale measurements is crucial for ensuring the accuracy and reliability of statistical data. By recognizing these data points and appropriately handling them, researchers and data analysts can improve the quality of their analysis and avoid drawing incorrect conclusions from their data.