Supervised, multivariate, and non-parametric discretization algorithm based on tree ensembles learning and moment matching optimization. This version of the algorithm relies on random forest algorithm to learn a large set of split points that conserves the relationship between attributes and the target class, and on moment matching optimization to transform this set into a reduced number of cut points matching as well as possible statistical properties of the initial set of split points. For each attribute to be discretized, the set S of its related split points extracted through random forest is mapped to a reduced set C of cut points of size k. This mapping relies on minimizing, for each continuous attribute to be discretized, the distance between the four first moments of S and the four first moments of C subject to some constraints. This non-linear optimization problem is performed using k values ranging from 2 to 'max_splits', and the best solution returned correspond to the value k which optimum solution is the lowest one over the different realizations. ForestDisc is a generalization of RFDisc discretization method initially proposed by Berrado and Runger (2009) <doi:10.1109/AICCSA.2009.5069327>, and improved by Berrado et al. in 2012 by adopting the idea of moment matching optimization related by Hoyland and Wallace (2001) <doi:10.1287/mnsc.47.2.295.9834>.
Version: | 0.1.0 |
Imports: | randomForest, nloptr, moments, stats |
Published: | 2020-03-19 |
Author: | Haddouchi Maïssae |
Maintainer: | Haddouchi Maïssae <maissaem7 at gmail.com> |
License: | GPL (≥ 3) |
NeedsCompilation: | no |
CRAN checks: | ForestDisc results |
Reference manual: | ForestDisc.pdf |
Package source: | ForestDisc_0.1.0.tar.gz |
Windows binaries: | r-devel: ForestDisc_0.1.0.zip, r-release: ForestDisc_0.1.0.zip, r-oldrel: ForestDisc_0.1.0.zip |
macOS binaries: | r-release: ForestDisc_0.1.0.tgz, r-oldrel: ForestDisc_0.1.0.tgz |
Please use the canonical form https://CRAN.R-project.org/package=ForestDisc to link to this page.