Home > Application, Hubness, Sensor data > Outlier/error detection in sensor data based on bad hubs

Outlier/error detection in sensor data based on bad hubs

October 23rd, 2012

A lot of sensor data is being collected every minute and used for various sorts of prediction. Yet, these measurements are not perfect and the sensors sometime break or malfunction. Detecting these anomalies is a part of the data cleaning and preparation process.
There are many ways to do outlier and anomaly detection and there is a whole body of literature devoted to the problem.
What we have taken a look at instead was one specific test scenario – whether the curse of dimensionality affects the time series enough that the emerging hubs in the data can be used as potential markers for such anomalous measurement records. It turns out that they can and that high bad hubness of measurement points clearly indicates that something is not right. What exactly, well – that is for experts to say in any particular test-case. Here are some graphs from the tool we’ve developed and described in one of our papers.

What is the ‘bad’ hubness of the suspicious points here? They are frequent neighbors to points in other geographical regions, so the distance in measurements does not correspond well to the spatial distance. Of course, the properties of a region are not homogenous and it is certainly possible for correct, non-noisy sensors to produce such data. However, the number of such measurement points is usually small and they are the prime candidates for taking a closer look. This makes for a good semi-automated anomaly detection system.

Categories: Application, Hubness, Sensor data Tags:
Comments are closed.