Clustering is considered as a concise data model by which from a set of data we must partition them and introduce them in data groups, which are ́an as similar as possible. If review all clustering algorithm implements in R, can see a great number of packages that implement or improve algorithm or functionality.
The Clustering package contain multiply implementations of algorithms like: gmm, kmeans-arma, kmeans-rcpp, fuzzy_cm, fuzzy_gg, fuzzy_gk, hclust, apclusterk,aggExcluster,clara, daisy, diana,fanny,gama,mona,pam, pvclust,pvpick.
Also can use differents similarity measures to calculate the distance between points like: Euclidean, Manhattan, Jaccard, Gower, Mahalanobis, Correlation and Minkowski.
Furthermore, the package offers functions to:
It’s the main method of the package.Clustering method processes a set of clustering algorithms. If we need to get information about the parameters that the method has we can do so by using the ?function or help(function). The way to load the datasets can be done in two different ways:
Once the method has been executed, we obtain the results divided into four parts:
df <- Clustering::clustering(df = Clustering::basketball,
packages = c("clusterr"), min = 4, max = 6)
Here we have a dataframe with the result of the execution. In it you can see all the algorithms, the similarity measures used, the variables classified in order of importance, the execution time of the algorithms and the evaluation metrics.
Algorithm | Distance | Clusters | Dataset | Ranking | timeExternal | entropy | variation_information | precision | recall | f_measure | fowlkes_mallows_index | connectivity | dunn | silhouette | timeInternal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gmm | gmm_euclidean | 4 | dataframe | 1 | 0.0203 | 0.3161 | 4.762 | 0.1822 | 0.451 | 0.2595 | 0.2867 | 34.09 | 0.1646 | 0.23 | 0.007 |
gmm | gmm_euclidean | 4 | dataframe | 2 | 0.0257 | 0.3085 | 4.741 | 0.1113 | 0.4005 | 0.1742 | 0.2111 | 34.09 | 0.1646 | 0.23 | 0.0071 |
gmm | gmm_euclidean | 4 | dataframe | 3 | 0.2384 | 0.0064 | 4.72 | 0 | 0 | 0 | 0 | 34.09 | 0.1646 | 0.23 | 0.009 |
gmm | gmm_euclidean | 4 | dataframe | 4 | 0.239 | 0.0032 | 4.143 | 0 | 0 | 0 | 0 | 34.09 | 0.1646 | 0.23 | 0.0091 |
gmm | gmm_euclidean | 4 | dataframe | 5 | 0.3997 | 0 | 3.671 | 0 | 0 | 0 | 0 | 34.09 | 0.1646 | 0.23 | 0.0108 |
gmm | gmm_euclidean | 5 | dataframe | 1 | 0.0245 | 0.4175 | 4.363 | 0.1637 | 0.2865 | 0.2084 | 0.2165 | 42.08 | 0.1619 | 0.25 | 0.0065 |
gmm | gmm_euclidean | 5 | dataframe | 2 | 0.0271 | 0.3857 | 4.346 | 0.1109 | 0.2823 | 0.1592 | 0.1769 | 42.08 | 0.1619 | 0.25 | 0.0071 |
gmm | gmm_euclidean | 5 | dataframe | 3 | 0.1838 | 0.0064 | 4.342 | 0 | 0 | 0 | 0 | 42.08 | 0.1619 | 0.25 | 0.0078 |
gmm | gmm_euclidean | 5 | dataframe | 4 | 0.1863 | 0.0032 | 4.321 | 0 | 0 | 0 | 0 | 42.08 | 0.1619 | 0.25 | 0.0078 |
gmm | gmm_euclidean | 5 | dataframe | 5 | 0.2108 | 0 | 4.022 | 0 | 0 | 0 | 0 | 42.08 | 0.1619 | 0.25 | 0.0153 |
gmm | gmm_euclidean | 6 | dataframe | 1 | 0.0278 | 0.433 | 4.439 | 0.1744 | 0.2791 | 0.2147 | 0.2206 | 51.46 | 0.1619 | 0.23 | 0.0064 |
gmm | gmm_euclidean | 6 | dataframe | 2 | 0.0289 | 0.4209 | 4.179 | 0.1062 | 0.2473 | 0.1486 | 0.1621 | 51.46 | 0.1619 | 0.23 | 0.0069 |
gmm | gmm_euclidean | 6 | dataframe | 3 | 0.1654 | 0.0064 | 4.159 | 0 | 0 | 0 | 0 | 51.46 | 0.1619 | 0.23 | 0.0075 |
gmm | gmm_euclidean | 6 | dataframe | 4 | 0.1811 | 0.0032 | 4.138 | 0 | 0 | 0 | 0 | 51.46 | 0.1619 | 0.23 | 0.0079 |
gmm | gmm_euclidean | 6 | dataframe | 5 | 0.1918 | 0 | 3.954 | 0 | 0 | 0 | 0 | 51.46 | 0.1619 | 0.23 | 0.0154 |
gmm | gmm_manhattan | 4 | dataframe | 1 | 0.0149 | 0.3161 | 4.762 | 0.1822 | 0.451 | 0.2595 | 0.2867 | 35.59 | 0.1348 | 0.23 | 0.0065 |
gmm | gmm_manhattan | 4 | dataframe | 2 | 0.0186 | 0.3085 | 4.741 | 0.1113 | 0.4005 | 0.1742 | 0.2111 | 35.59 | 0.1348 | 0.23 | 0.0065 |
gmm | gmm_manhattan | 4 | dataframe | 3 | 0.1536 | 0.0064 | 4.72 | 0 | 0 | 0 | 0 | 35.59 | 0.1348 | 0.23 | 0.0065 |
gmm | gmm_manhattan | 4 | dataframe | 4 | 0.1605 | 0.0032 | 4.143 | 0 | 0 | 0 | 0 | 35.59 | 0.1348 | 0.23 | 0.0068 |
gmm | gmm_manhattan | 4 | dataframe | 5 | 0.1839 | 0 | 3.671 | 0 | 0 | 0 | 0 | 35.59 | 0.1348 | 0.23 | 0.0069 |
gmm | gmm_manhattan | 5 | dataframe | 1 | 0.0257 | 0.4258 | 4.35 | 0.167 | 0.2828 | 0.21 | 0.2173 | 46.83 | 0.1322 | 0.26 | 0.0072 |
gmm | gmm_manhattan | 5 | dataframe | 2 | 0.0446 | 0.3892 | 4.338 | 0.1114 | 0.2742 | 0.1584 | 0.1747 | 46.83 | 0.1322 | 0.26 | 0.0092 |
gmm | gmm_manhattan | 5 | dataframe | 3 | 0.1582 | 0.0064 | 4.317 | 0 | 0 | 0 | 0 | 46.83 | 0.1322 | 0.26 | 0.0116 |
gmm | gmm_manhattan | 5 | dataframe | 4 | 0.2633 | 0.0032 | 4.296 | 0 | 0 | 0 | 0 | 46.83 | 0.1322 | 0.26 | 0.012 |
gmm | gmm_manhattan | 5 | dataframe | 5 | 0.4502 | 0 | 4.059 | 0 | 0 | 0 | 0 | 46.83 | 0.1322 | 0.26 | 0.0301 |
gmm | gmm_manhattan | 6 | dataframe | 1 | 0.0421 | 0.4555 | 4.298 | 0.1669 | 0.2608 | 0.2035 | 0.2085 | 54.87 | 0.1467 | 0.25 | 0.0089 |
gmm | gmm_manhattan | 6 | dataframe | 2 | 0.0918 | 0.4052 | 4.161 | 0.1148 | 0.2606 | 0.1594 | 0.173 | 54.87 | 0.1467 | 0.25 | 0.0092 |
gmm | gmm_manhattan | 6 | dataframe | 3 | 0.2203 | 0.0064 | 4.14 | 0 | 0 | 0 | 0 | 54.87 | 0.1467 | 0.25 | 0.0112 |
gmm | gmm_manhattan | 6 | dataframe | 4 | 0.3068 | 0.0032 | 4.119 | 0 | 0 | 0 | 0 | 54.87 | 0.1467 | 0.25 | 0.0187 |
gmm | gmm_manhattan | 6 | dataframe | 5 | 0.3333 | 0 | 4.102 | 0 | 0 | 0 | 0 | 54.87 | 0.1467 | 0.25 | 0.0292 |
kmeans_arma | kmeans_arma | 4 | dataframe | 1 | 0.0009 | 0 | 0 | 0 | 0 | 0 | 0 | 44.21 | 0.1495 | 0.23 | 0.0088 |
kmeans_arma | kmeans_arma | 4 | dataframe | 2 | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 | 44.21 | 0.1495 | 0.23 | 0.009 |
kmeans_arma | kmeans_arma | 4 | dataframe | 3 | 0.0013 | 0 | 0 | 0 | 0 | 0 | 0 | 44.21 | 0.1495 | 0.23 | 0.0099 |
kmeans_arma | kmeans_arma | 4 | dataframe | 4 | 0.0016 | 0 | 0 | 0 | 0 | 0 | 0 | 44.21 | 0.1495 | 0.23 | 0.0103 |
kmeans_arma | kmeans_arma | 4 | dataframe | 5 | 0.0023 | 0 | 0 | 0 | 0 | 0 | 0 | 44.21 | 0.1495 | 0.23 | 0.0128 |
kmeans_arma | kmeans_arma | 5 | dataframe | 1 | 0.0008 | 0 | 0 | 0 | 0 | 0 | 0 | 49.22 | 0.1538 | 0.26 | 0.0082 |
kmeans_arma | kmeans_arma | 5 | dataframe | 2 | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 | 49.22 | 0.1538 | 0.26 | 0.0107 |
kmeans_arma | kmeans_arma | 5 | dataframe | 3 | 0.0011 | 0 | 0 | 0 | 0 | 0 | 0 | 49.22 | 0.1538 | 0.26 | 0.0111 |
kmeans_arma | kmeans_arma | 5 | dataframe | 4 | 0.0012 | 0 | 0 | 0 | 0 | 0 | 0 | 49.22 | 0.1538 | 0.26 | 0.0172 |
kmeans_arma | kmeans_arma | 5 | dataframe | 5 | 0.0017 | 0 | 0 | 0 | 0 | 0 | 0 | 49.22 | 0.1538 | 0.26 | 0.022 |
kmeans_arma | kmeans_arma | 6 | dataframe | 1 | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 | 57.63 | 0.1619 | 0.24 | 0.0081 |
kmeans_arma | kmeans_arma | 6 | dataframe | 2 | 0.0011 | 0 | 0 | 0 | 0 | 0 | 0 | 57.63 | 0.1619 | 0.24 | 0.0087 |
kmeans_arma | kmeans_arma | 6 | dataframe | 3 | 0.0012 | 0 | 0 | 0 | 0 | 0 | 0 | 57.63 | 0.1619 | 0.24 | 0.0102 |
kmeans_arma | kmeans_arma | 6 | dataframe | 4 | 0.0013 | 0 | 0 | 0 | 0 | 0 | 0 | 57.63 | 0.1619 | 0.24 | 0.0103 |
kmeans_arma | kmeans_arma | 6 | dataframe | 5 | 0.0017 | 0 | 0 | 0 | 0 | 0 | 0 | 57.63 | 0.1619 | 0.24 | 0.0124 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 1 | 0.0173 | 0.3728 | 4.627 | 0.1697 | 0.5 | 0.23 | 0.2461 | 51.04 | 0.1741 | 0.23 | 0.0073 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 2 | 0.0266 | 0.3494 | 4.606 | 0.1003 | 0.3567 | 0.1511 | 0.1753 | 51.04 | 0.1741 | 0.23 | 0.0075 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 3 | 0.1904 | 0.0032 | 4.606 | 0.0009 | 0.3065 | 0.0018 | 0.021 | 51.04 | 0.1741 | 0.23 | 0.0078 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 4 | 0.1909 | 0.0032 | 4.531 | 0 | 0 | 0 | 0 | 51.04 | 0.1741 | 0.23 | 0.0079 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 5 | 0.2305 | 0 | 3.804 | 0 | 0 | 0 | 0 | 51.04 | 0.1741 | 0.23 | 0.0143 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 1 | 0.0218 | 0.4269 | 4.551 | 0.1663 | 0.5 | 0.2104 | 0.2183 | 66.85 | 0.152 | 0.19 | 0.007 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 2 | 0.0233 | 0.4135 | 4.329 | 0.1019 | 0.2865 | 0.1457 | 0.1613 | 66.85 | 0.152 | 0.19 | 0.0078 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 3 | 0.1641 | 0.0032 | 4.308 | 0.0011 | 0.2554 | 0.0022 | 0.0232 | 66.85 | 0.152 | 0.19 | 0.0079 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 4 | 0.1798 | 0.0032 | 4.308 | 0 | 0 | 0 | 0 | 66.85 | 0.152 | 0.19 | 0.0081 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 5 | 0.2077 | 0 | 4.079 | 0 | 0 | 0 | 0 | 66.85 | 0.152 | 0.19 | 0.0097 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 1 | 0.0284 | 0.4545 | 4.331 | 0.1703 | 0.2458 | 0.2012 | 0.2046 | 74.78 | 0.1522 | 0.19 | 0.007 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 2 | 0.0335 | 0.4169 | 4.104 | 0.1152 | 0.2419 | 0.1561 | 0.167 | 74.78 | 0.1522 | 0.19 | 0.0073 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 3 | 0.1752 | 0.0064 | 4.083 | 0 | 0 | 0 | 0 | 74.78 | 0.1522 | 0.19 | 0.0074 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 4 | 0.1962 | 0.0032 | 4.062 | 0 | 0 | 0 | 0 | 74.78 | 0.1522 | 0.19 | 0.0078 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 5 | 0.2077 | 0 | 4.037 | 0 | 0 | 0 | 0 | 74.78 | 0.1522 | 0.19 | 0.0208 |
mini_kmeans | mini_kmeans | 4 | dataframe | 1 | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 | 50.35 | 0.1571 | 0.21 | 0.007 |
mini_kmeans | mini_kmeans | 4 | dataframe | 2 | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 | 50.35 | 0.1571 | 0.21 | 0.0071 |
mini_kmeans | mini_kmeans | 4 | dataframe | 3 | 0.0011 | 0 | 0 | 0 | 0 | 0 | 0 | 50.35 | 0.1571 | 0.21 | 0.0072 |
mini_kmeans | mini_kmeans | 4 | dataframe | 4 | 0.0011 | 0 | 0 | 0 | 0 | 0 | 0 | 50.35 | 0.1571 | 0.21 | 0.0073 |
mini_kmeans | mini_kmeans | 4 | dataframe | 5 | 0.0016 | 0 | 0 | 0 | 0 | 0 | 0 | 50.35 | 0.1571 | 0.21 | 0.0077 |
mini_kmeans | mini_kmeans | 5 | dataframe | 1 | 0.0009 | 0 | 0 | 0 | 0 | 0 | 0 | 76.4 | 0.1216 | 0.17 | 0.0073 |
mini_kmeans | mini_kmeans | 5 | dataframe | 2 | 0.0009 | 0 | 0 | 0 | 0 | 0 | 0 | 76.4 | 0.1216 | 0.17 | 0.0074 |
mini_kmeans | mini_kmeans | 5 | dataframe | 3 | 0.0009 | 0 | 0 | 0 | 0 | 0 | 0 | 76.4 | 0.1216 | 0.17 | 0.0076 |
mini_kmeans | mini_kmeans | 5 | dataframe | 4 | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 | 76.4 | 0.1216 | 0.17 | 0.0077 |
mini_kmeans | mini_kmeans | 5 | dataframe | 5 | 0.0014 | 0 | 0 | 0 | 0 | 0 | 0 | 76.4 | 0.1216 | 0.17 | 0.0087 |
mini_kmeans | mini_kmeans | 6 | dataframe | 1 | 0.0008 | 0 | 0 | 0 | 0 | 0 | 0 | 76.53 | 0.15 | 0.17 | 0.007 |
mini_kmeans | mini_kmeans | 6 | dataframe | 2 | 0.0008 | 0 | 0 | 0 | 0 | 0 | 0 | 76.53 | 0.15 | 0.17 | 0.0072 |
mini_kmeans | mini_kmeans | 6 | dataframe | 3 | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 | 76.53 | 0.15 | 0.17 | 0.0076 |
mini_kmeans | mini_kmeans | 6 | dataframe | 4 | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 | 76.53 | 0.15 | 0.17 | 0.008 |
mini_kmeans | mini_kmeans | 6 | dataframe | 5 | 0.0012 | 0 | 0 | 0 | 0 | 0 | 0 | 76.53 | 0.15 | 0.17 | 0.0106 |
This property tells us if we have made an internal evaluation of the groups
#> [1] TRUE
This property tells us if we have made an external evaluation of the groups
#> [1] TRUE
Algorithms executed
#> [1] "gmm" "kmeans_arma" "kmeans_rcpp" "mini_kmeans"
Similarity Metrics
#> [1] "gmm_euclidean" "gmm_manhattan" "kmeans_arma" "kmeans_rcpp"
#> [5] "mini_kmeans"
If we want to obtain the classified variables instead of the values we must use the variable property
df_variable <- Clustering::clustering(df = Clustering::basketball,
packages = c("clusterr"), min = 4, max = 6, variables = TRUE)
Algorithm | Distance | Clusters | Dataset | Ranking | timeExternal | entropy | variation_information | precision | recall | f_measure | fowlkes_mallows_index | connectivity | dunn | silhouette | timeInternal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gmm | gmm_euclidean | 4 | dataframe | 1 | 4 | 2 | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 4 |
gmm | gmm_euclidean | 4 | dataframe | 2 | 2 | 4 | 5 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 5 |
gmm | gmm_euclidean | 4 | dataframe | 3 | 5 | 3 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 2 |
gmm | gmm_euclidean | 4 | dataframe | 4 | 1 | 5 | 4 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 1 |
gmm | gmm_euclidean | 4 | dataframe | 5 | 3 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
gmm | gmm_euclidean | 5 | dataframe | 1 | 3 | 2 | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 2 |
gmm | gmm_euclidean | 5 | dataframe | 2 | 1 | 4 | 4 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 4 |
gmm | gmm_euclidean | 5 | dataframe | 3 | 4 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 1 |
gmm | gmm_euclidean | 5 | dataframe | 4 | 2 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 3 |
gmm | gmm_euclidean | 5 | dataframe | 5 | 5 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
gmm | gmm_euclidean | 6 | dataframe | 1 | 3 | 2 | 4 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 3 |
gmm | gmm_euclidean | 6 | dataframe | 2 | 1 | 4 | 3 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 5 |
gmm | gmm_euclidean | 6 | dataframe | 3 | 4 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 1 |
gmm | gmm_euclidean | 6 | dataframe | 4 | 2 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 2 |
gmm | gmm_euclidean | 6 | dataframe | 5 | 5 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
gmm | gmm_manhattan | 4 | dataframe | 1 | 5 | 2 | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 2 |
gmm | gmm_manhattan | 4 | dataframe | 2 | 1 | 4 | 5 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 1 |
gmm | gmm_manhattan | 4 | dataframe | 3 | 3 | 3 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 4 |
gmm | gmm_manhattan | 4 | dataframe | 4 | 2 | 5 | 4 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 5 |
gmm | gmm_manhattan | 4 | dataframe | 5 | 4 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
gmm | gmm_manhattan | 5 | dataframe | 1 | 5 | 2 | 4 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 3 |
gmm | gmm_manhattan | 5 | dataframe | 2 | 1 | 4 | 3 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 2 |
gmm | gmm_manhattan | 5 | dataframe | 3 | 3 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 4 |
gmm | gmm_manhattan | 5 | dataframe | 4 | 2 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 5 |
gmm | gmm_manhattan | 5 | dataframe | 5 | 4 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 1 |
gmm | gmm_manhattan | 6 | dataframe | 1 | 4 | 2 | 4 | 2 | 4 | 2 | 2 | 1 | 1 | 1 | 2 |
gmm | gmm_manhattan | 6 | dataframe | 2 | 1 | 4 | 3 | 4 | 2 | 4 | 4 | 2 | 2 | 2 | 1 |
gmm | gmm_manhattan | 6 | dataframe | 3 | 5 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 3 |
gmm | gmm_manhattan | 6 | dataframe | 4 | 2 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 5 |
gmm | gmm_manhattan | 6 | dataframe | 5 | 3 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
kmeans_arma | kmeans_arma | 4 | dataframe | 1 | 5 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 |
kmeans_arma | kmeans_arma | 4 | dataframe | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
kmeans_arma | kmeans_arma | 4 | dataframe | 3 | 4 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 |
kmeans_arma | kmeans_arma | 4 | dataframe | 4 | 2 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 |
kmeans_arma | kmeans_arma | 4 | dataframe | 5 | 1 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 1 |
kmeans_arma | kmeans_arma | 5 | dataframe | 1 | 5 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
kmeans_arma | kmeans_arma | 5 | dataframe | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 |
kmeans_arma | kmeans_arma | 5 | dataframe | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 |
kmeans_arma | kmeans_arma | 5 | dataframe | 4 | 1 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 5 |
kmeans_arma | kmeans_arma | 5 | dataframe | 5 | 4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 2 |
kmeans_arma | kmeans_arma | 6 | dataframe | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 |
kmeans_arma | kmeans_arma | 6 | dataframe | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 |
kmeans_arma | kmeans_arma | 6 | dataframe | 3 | 5 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 |
kmeans_arma | kmeans_arma | 6 | dataframe | 4 | 2 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 1 |
kmeans_arma | kmeans_arma | 6 | dataframe | 5 | 4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 2 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 1 | 3 | 4 | 5 | 2 | 3 | 2 | 2 | 1 | 1 | 1 | 4 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 2 | 1 | 2 | 1 | 4 | 2 | 4 | 4 | 2 | 2 | 2 | 5 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 3 | 4 | 3 | 3 | 3 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 4 | 2 | 5 | 4 | 1 | 1 | 1 | 1 | 4 | 4 | 4 | 1 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 5 | 5 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 2 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 1 | 3 | 2 | 4 | 2 | 3 | 2 | 2 | 1 | 1 | 1 | 4 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 2 | 1 | 4 | 5 | 4 | 2 | 4 | 4 | 2 | 2 | 2 | 5 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 3 | 4 | 3 | 1 | 3 | 4 | 3 | 3 | 3 | 3 | 3 | 3 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 4 | 2 | 5 | 3 | 1 | 1 | 1 | 1 | 4 | 4 | 4 | 2 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 5 | 5 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 1 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 1 | 4 | 2 | 4 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 2 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 2 | 1 | 4 | 3 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 5 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 3 | 3 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 1 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 4 | 2 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 5 | 5 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
mini_kmeans | mini_kmeans | 4 | dataframe | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
mini_kmeans | mini_kmeans | 4 | dataframe | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 4 |
mini_kmeans | mini_kmeans | 4 | dataframe | 3 | 5 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 |
mini_kmeans | mini_kmeans | 4 | dataframe | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 5 |
mini_kmeans | mini_kmeans | 4 | dataframe | 5 | 4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
mini_kmeans | mini_kmeans | 5 | dataframe | 1 | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
mini_kmeans | mini_kmeans | 5 | dataframe | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 |
mini_kmeans | mini_kmeans | 5 | dataframe | 3 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 |
mini_kmeans | mini_kmeans | 5 | dataframe | 4 | 5 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 5 |
mini_kmeans | mini_kmeans | 5 | dataframe | 5 | 1 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
mini_kmeans | mini_kmeans | 6 | dataframe | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
mini_kmeans | mini_kmeans | 6 | dataframe | 2 | 5 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 5 |
mini_kmeans | mini_kmeans | 6 | dataframe | 3 | 4 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 |
mini_kmeans | mini_kmeans | 6 | dataframe | 4 | 1 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 |
mini_kmeans | mini_kmeans | 6 | dataframe | 5 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
If we only want to obtain the best classified variables or values for the external variables we execute the following method:
|
We also obtain the best classified values for internal evaluation
|
In order to obtain the best evaluation by algorithm
Algorithm | Distance | timeExternal | entropy | variation_information | precision | recall | f_measure | fowlkes_mallows_index |
---|---|---|---|---|---|---|---|---|
gmm | gmm_euclidean | 0.0278 | 0.433 | 4.762 | 0.1822 | 0.451 | 0.2595 | 0.2867 |
gmm | gmm_manhattan | 0.0421 | 0.4555 | 4.762 | 0.1822 | 0.451 | 0.2595 | 0.2867 |
kmeans_arma | kmeans_arma | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 |
kmeans_rcpp | kmeans_rcpp | 0.0284 | 0.4545 | 4.627 | 0.1703 | 0.5 | 0.23 | 0.2461 |
mini_kmeans | mini_kmeans | 0.001 | 0 | 0 | 0 | 0 | 0 | 0 |
Based on the results obtained we can see that the gmm algorithm behaves better.
From the algorithm with the best rating we can select the most appropriate number of clusters.
Algorithm | Clusters | timeExternal | entropy | variation_information | precision | recall | f_measure | fowlkes_mallows_index |
---|---|---|---|---|---|---|---|---|
gmm | 4 | 0.0203 | 0.3161 | 4.762 | 0.1822 | 0.451 | 0.2595 | 0.2867 |
gmm | 5 | 0.0257 | 0.4258 | 4.363 | 0.167 | 0.2865 | 0.21 | 0.2173 |
gmm | 6 | 0.0421 | 0.4555 | 4.439 | 0.1744 | 0.2791 | 0.2147 | 0.2206 |
The same checks performed for external evaluation metrics, we can perform for internal evaluation.
Algorithm | Distance | timeInternal | connectivity | dunn | silhouette |
---|---|---|---|---|---|
gmm | gmm_euclidean | 0.007 | 51.46 | 0.1646 | 0.25 |
gmm | gmm_manhattan | 0.0089 | 54.87 | 0.1467 | 0.26 |
kmeans_arma | kmeans_arma | 0.0088 | 57.63 | 0.1619 | 0.26 |
kmeans_rcpp | kmeans_rcpp | 0.0073 | 74.78 | 0.1741 | 0.23 |
mini_kmeans | mini_kmeans | 0.0073 | 76.53 | 0.1571 | 0.21 |
In this case we can see that depending on the evaluation you want to make, one algorithm or another is chosen.
If we want to see graphically the representation of any metric as a function of the number of clusters and algorithm we can do it in the following way depending if the evaluation metric is internal or external