Rcmdr Fuzzy Clustering Plugin Analysis

Achmad Fauzi bagus F

August 9th, 2016

This package provide plug-in for fuzzy clustering analysis via Rcmdr. Although it’s plugin package, you can easy analyze via command line/console on your R.

This package consist Fuzzy C-Means and Gustafson Kessel Clustering. For stability, use ensemble with vote approach. Optimal cluster via validation index, and manova analysis via Pillai Statistic. Visualize your object with biplot and radar plot.

Configuratio to Rcmdr

install this package first. And then type library(Rcmdr) to launch R commander aplication. On Tools menu choose “load plugin” and choose RcmdrPlugin.FuzzyClust. It will restart the R Commander application.

Insert your data and perform your analysis from Statistics -> Dimensional -> Clustering -> Fuzzy Clustering.

Fuzzy C-Means

fuzzy.CM() perform fuzzy c-means analysis. More description of this function (parameter setting, description, and return value) explained via ?fuzzy.CM

library(RcmdrPlugin.FuzzyClust)
data(iris)
fuzzy.CM(X=iris[,1:4],K = 3,m = 2,RandomNumber = 1234)->cl
## Call:
## fuzzy.CM(X = iris[, 1:4], K = 3, m = 2, RandomNumber = 1234)
## 
## Objective Function: 60.50571
## fuzzifier: 2
## Centroid:
##      Sepal.Length Sepal.Width Petal.Length Petal.Width
## [1,]     5.888925    2.761067     4.363941   1.3973097
## [2,]     5.003966    3.414089     1.482815   0.2535461
## [3,]     6.775003    3.052380     5.646771   2.0535425
## 
## Cluster Label:
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18 
##   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2 
##  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36 
##   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2 
##  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54 
##   2   2   2   2   2   2   2   2   2   2   2   2   2   2   3   1   3   1 
##  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90 
##   1   1   1   1   1   3   1   1   1   1   1   1   1   1   1   1   1   1 
##  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108 
##   1   1   1   1   1   1   1   1   1   1   3   1   3   3   3   3   1   3 
## 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 
##   3   3   3   3   3   1   3   3   3   3   3   1   3   1   3   1   3   3 
## 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 
##   1   1   3   3   3   3   3   1   3   3   3   3   1   3   3   3   1   3 
## 145 146 147 148 149 150 
##   3   3   1   3   3   1

Gustafson Kessel

fuzzy.GK() perform Gustafson Kessel clustering. The main differences of this method with fuzzy c-means is the distance function. GK use covarians matrix and FCM use Euclideans distances. And this function implemented the modification of GK algorithm that invented by Babuska (2002). Details and parameter use ?fuzzy.GK()

data(iris)
fuzzy.GK(X=iris[,1:4],K = 3,m = 2,RandomNumber = 1234,gamma=0)->cl
## Call:
## fuzzy.GK(X = iris[, 1:4], K = 3, m = 2, RandomNumber = 1234, 
##     gamma = 0)
## 
## Objective Function: 31.52668
## fuzzifier: 2
## Centroid:
##      Sepal.Length Sepal.Width Petal.Length Petal.Width
## [1,]     6.127994    2.801893     4.510275    1.402073
## [2,]     5.014119    3.437941     1.465400    0.244071
## [3,]     6.397884    2.975170     5.304825    2.014691
## 
## Cluster Label:
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18 
##   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2 
##  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36 
##   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2 
##  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54 
##   2   2   2   2   2   2   2   2   2   2   2   2   2   2   1   1   1   1 
##  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   3   1 
##  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90 
##   1   1   1   1   1   1   1   1   1   1   1   3   3   1   1   1   1   1 
##  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108 
##   1   1   1   1   1   1   1   1   1   1   3   3   3   3   3   1   3   1 
## 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 
##   1   3   3   3   3   3   3   3   3   1   1   1   3   3   1   3   3   1 
## 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 
##   3   3   3   1   1   1   3   1   3   3   3   3   3   3   3   3   3   3 
## 145 146 147 148 149 150 
##   3   3   3   3   3   3

Soft Vote Ensemble

GK and FCM use randomization for initialize the membership matrix. So for stabilize the result this package provide ensemble clustering with SUM RULE Voting aproach. Details use ?soft.vote.ensemble

soft.vote.ensemble(iris[,1:4],seed=3,method="FCM",K=3,m=2,core=1)->Cl
## Call:
## soft.vote.ensemble(data = iris[, 1:4], seed = 3, method = "FCM", 
##     K = 3, m = 2, core = 1)
## 
## Objective Function: 60.50571
## fuzzifier: 2
## Centroid:
##         Sepal.Length Sepal.Width Petal.Length Petal.Width
## Clust 1     5.003966    3.414089     1.482815   0.2535461
## Clust 2     5.888927    2.761068     4.363945   1.3973114
## Clust 3     6.775005    3.052381     5.646774   2.0535438
## 
## Cluster Label:
##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
##  [71] 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 3 3 3
## [106] 3 2 3 3 3 3 3 3 2 3 3 3 3 3 2 3 2 3 2 3 3 2 2 3 3 3 3 3 2 3 3 3 3 2 3
## [141] 3 3 2 3 3 3 2 3 3 2

Validation

The hardest question of clustering analysis is validation technique. This package provide several index that can be use to validate your result.

fuzzy.CM(X=iris[,1:4],K = 3,m = 2,RandomNumber = 1234)->cl
## Call:
## fuzzy.CM(X = iris[, 1:4], K = 3, m = 2, RandomNumber = 1234)
## 
## Objective Function: 60.50571
## fuzzifier: 2
## Centroid:
##      Sepal.Length Sepal.Width Petal.Length Petal.Width
## [1,]     5.888925    2.761067     4.363941   1.3973097
## [2,]     5.003966    3.414089     1.482815   0.2535461
## [3,]     6.775003    3.052380     5.646771   2.0535425
## 
## Cluster Label:
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18 
##   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2 
##  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36 
##   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2 
##  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54 
##   2   2   2   2   2   2   2   2   2   2   2   2   2   2   3   1   3   1 
##  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90 
##   1   1   1   1   1   3   1   1   1   1   1   1   1   1   1   1   1   1 
##  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108 
##   1   1   1   1   1   1   1   1   1   1   3   1   3   3   3   3   1   3 
## 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 
##   3   3   3   3   3   1   3   3   3   3   3   1   3   1   3   1   3   3 
## 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 
##   1   1   3   3   3   3   3   1   3   3   3   3   1   3   3   3   1   3 
## 145 146 147 148 149 150 
##   3   3   1   3   3   1
validation.index(cl)
## Validation Index
## MPC Index    : 0.6750958
## CE Index : 0.3954918
## XB Index : 0.1369082
## S Index  : 0.1369082

MANOVA Analysis

For analysis to proof there is a significant differences among cluster use MANOVA analysis. The statistic pillai is chosen cause the robustness for assumption.

checkManova(cl)
##                  Df   Pillai approx F num Df den Df       Pr(>F)
## factor(Cluster)   2 1.272997 63.47448      8    290 2.592269e-59
## Residuals       147       NA       NA     NA     NA           NA

Visualize your result

Visualize your result with biplot and radar plot for easy interpretation your cluster result.

biploting(cl) -> biplotcluster

radar.plotting(cl) ->radarplot
## Using Cluster as id variables