Skip to contents

Detect local outliers in vector data with a Hampel filter using median absolute deviation (MAD), and replaces with NA or the local median value.

Usage

replace_outliers(x, width, t0 = 3, return = c("NA", "median"))

Arguments

x

A numeric vector.

width

A numeric value for the window length of (2 × width + 1) samples.

t0

A numeric value for the outlier threshold, default is 3 (Pearson's rule).

return

Indicates whether outliers should be replaced with NA (the default) or the local "median" value.

Value

A numeric vector of filtered data.

Details

The "median absolute deviation" computation is done in the [-width...width] vicinity of each point at least width steps away from the end points of the interval. At the lower and upper end the time series values are preserved.

A high threshold makes the filter more forgiving, a low one will declare more points to be outliers. t0 = 3 (the default) corresponds to Pearson's 3 sigma edit rule, t0 = 0 to Tukey's median filter.

Missing NA values in x are removed before processing and restored in the returned vector.

return = "median" will replace outliers with the local median value, as in pracma::hampel(). Otherwise, the default return = "NA" will replace outliers with NA to be replaced later by your choice of methods (see replace_missing().

See also

Examples

set.seed(8421)
x <- numeric(1024)
z <- rnorm(1024)
x[1] <- z[1]
for (i in 2:1024) {
    x[i] <- 0.4*x[i-1] + 0.8*x[i-1]*z[i-1] + z[i]
}
x[150:200] <- NA ## generate NA values
y <- replace_outliers(x, width = 20, return = "median")
ind <- which(x != y) ## identify outlier indices
outliers <- x[ind] ## identify outlier values

if (FALSE) { # \dontrun{
plot(1:1024, x, type = "l")
points(ind, outliers, pch = 21, col = "darkred")
lines(y, col = "blue")
} # }