R implementation of algorithms for detection of outliers based on frequent pattern mining.
If you would like to cite our work, please use:
@InProceedings{kuchar:2017:FPI,
title = {Spotlighting Anomalies using Frequent Patterns},
author = {Jaroslav Kuchař and Vojtěch Svátek},
booktitle = {Proceedings of the KDD 2017 Workshop on Anomaly Detection in Finance},
year = {2017},
volume = {71},
series = {Proceedings of Machine Learning Research},
address = {Halifax, Nova Scotia, Canada},
month = {14 Aug},
publisher = {PMLR},
issn = {1938-7228}
}
Available implementations:
Package installation from GitHub:
library("devtools")
devtools::install_github("jaroslav-kuchar/fpmoutliers")
library(fpmoutliers)
dataFrame <- read.csv(system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))
model <- FPI(dataFrame, minSupport = 0.001)
dataFrame <- dataFrame[order(model$scores, decreasing = TRUE),]
print(dataFrame[1,]) # instance with the highest anomaly score
print(dataFrame[nrow(dataFrame),]) # instance with the lowest anomaly score
Currently not suitable for large datasets - the plot is limited by the number of rows and columns of the input data.
library("fpmoutliers")
dataFrame <- read.csv(
system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))
model <- FPI(dataFrame, minSupport = 0.001)
# sort data by the anomaly score
dataFrame <- dataFrame[order(model$scores, decreasing = TRUE),]
visualizeInstance(dataFrame, 1) # instance with the highest anomaly score
visualizeInstance(dataFrame, nrow(dataFrame)) # instance with the lowest anomaly score
library("fpmoutliers")
dataFrame <- read.csv(
system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))
model <- FPI(dataFrame, minSupport = 0.001)
# sort data by the anomaly score
dataFrame <- dataFrame[order(model$scores, decreasing = TRUE),]
# instance with the highest anomaly score
out <- describeInstance(dataFrame, model, 1)
# instance with the lowest anomaly score
out <- describeInstance(dataFrame, model, nrow(dataFrame))
library("fpmoutliers")
data("iris")
model <- fpmoutliers::build(iris)
library(fpmoutliers)
library(XML)
dataFrame <- read.csv(system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))
model <- FPI(dataFrame, minSupport = 0.001)
saveXML(generatePMML(model, dataFrame), "example_out.xml")
All implemented methods return a list with following parameters: - minSupport
- minimum support setting for frequent itemsets mining - maxlen
- maximum length of frequent itemsets - model
- frequent itemset model represented as itemsets-class - scores
- outlier/anomaly scores for each observation/row of the input dataframe
Apache License Version 2.0