1 Abstract

The muHVT package is a collection of R functions for vector quantization and construction of hierarchical voronoi tessellations as a data visualization tool to visualize cells using quantization. The hierarchical cells are computed using Hierarchical K-means where a quantization threshold governs the levels in the hierarchy for a set \(k\) parameter (the maximum number of cells at each level). The package is particularly helpful to visualize rich mutlivariate data.

This package additionally provides functions for computing the Sammon’s projection and plotting the heat map of the variables on the tiles of the tessellations.

3 Voronoi Tessellations

A Voronoi diagram is a way of dividing space into a number of regions. A set of points (called seeds, sites, or generators) is specified beforehand and for each seed, there will be a corresponding region consisting of all points within proximity of that seed. These regions are called Voronoi cells. It is complementary to Delaunay triangulation.

3.1 Sammon’s projection

Sammon’s projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality while attempting to preserve the structure of inter-point distances in the projection. It is particularly suited for use in exploratory data analysis and is usually considered a non-linear approach since the mapping cannot be represented as a linear combination of the original variables. The centroids are plotted in 2D after performing Sammon’s projection at every level of the tessellation.

Denoting the distance between \(i^{th}\) and \(j^{th}\) objects in the original space by \(d_{ij}^*\), and the distance between their projections by \(d_{ij}\). Sammon’s mapping aims to minimize the below error function, which is often referred to as Sammon’s stress or Sammon’s error

\[E=\frac{1}{\sum_{i<j} d_{ij}^*}\sum_{i<j}\frac{(d_{ij}^*-d_{ij})^2}{d_{ij}^*}\]

The minimization of this can be performed either by gradient descent, as proposed initially, or by other means, usually involving iterative methods. The number of iterations need to be experimentally determined and convergent solutions are not always guaranteed. Many implementations prefer to use the first Principal Components as a starting configuration.

3.2 Constructing Voronoi Tesselations

In this package, we use sammons from the package MASS to project higher dimensional data to a 2D space. The function hvq called from the HVT function returns hierarchical quantized data which will be the input for construction of the tesselations. The data is then represented in 2D coordinates and the tessellations are plotted using these coordinates as centroids. We use the package deldir for this purpose. The deldir package computes the Delaunay triangulation (and hence the Dirichlet or Voronoi tesselation) of a planar point set according to the second (iterative) algorithm of Lee and Schacter. For subsequent levels, transformation is performed on the 2D coordinates to get all the points within its parent tile. Tessellations are plotted using these transformed points as centroids. The lines in the tessellations are chopped in places so that they do not protrude outside the parent polygon. This is done for all the subsequent levels.

3.2.1 Example Usage 1

In this section, we will use the Prices of Personal Computers dataset. This dataset contains 6259 observations and 10 features. The dataset observes the price from 1993 to 1995 of 486 personal computers in the US. The variables are price, speed, ram, screen, cd, etc. The dataset can be downloaded from here.

In this example, we will compress this dataset by using hierarhical VQ via k-means and visualize the Voronoi Tesselation plots using Sammons projection. Later on, we will overlay price, speed and screen variables as a heatmap to generate further insights.

Here, we load the data and store into a variable computers.

set.seed(240)
# Load data from csv files
computers <- read.csv("https://raw.githubusercontent.com/SangeetM/dataset/master/Computers.csv")

Let’s have a look at some of the data

# Quick peek
Table(head(computers))

X	price	speed	hd	ram	screen	cd	multi	premium	ads	trend
1	1499	25	80	4	14	no	no	yes	94	1
2	1795	33	85	2	14	no	no	yes	94	1
3	1595	25	170	4	15	no	no	yes	94	1
4	1849	25	170	8	14	no	no	no	94	1
5	3295	33	340	16	14	no	no	yes	94	1
6	3695	66	340	16	14	no	no	yes	94	1

Now let us check the structure of the data

str(computers)
#> 'data.frame':    6259 obs. of  11 variables:
#>  $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ price  : int  1499 1795 1595 1849 3295 3695 1720 1995 2225 2575 ...
#>  $ speed  : int  25 33 25 25 33 66 25 50 50 50 ...
#>  $ hd     : int  80 85 170 170 340 340 170 85 210 210 ...
#>  $ ram    : int  4 2 4 8 16 16 4 2 8 4 ...
#>  $ screen : int  14 14 15 14 14 14 14 14 14 15 ...
#>  $ cd     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 2 1 1 1 ...
#>  $ multi  : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ premium: Factor w/ 2 levels "no","yes": 2 2 2 1 2 2 2 2 2 2 ...
#>  $ ads    : int  94 94 94 94 94 94 94 94 94 94 ...
#>  $ trend  : int  1 1 1 1 1 1 1 1 1 1 ...

Let’s get a summary of the data

summary(computers)
#>        X            price          speed              hd        
#>  Min.   :   1   Min.   : 949   Min.   : 25.00   Min.   :  80.0  
#>  1st Qu.:1566   1st Qu.:1794   1st Qu.: 33.00   1st Qu.: 214.0  
#>  Median :3130   Median :2144   Median : 50.00   Median : 340.0  
#>  Mean   :3130   Mean   :2220   Mean   : 52.01   Mean   : 416.6  
#>  3rd Qu.:4694   3rd Qu.:2595   3rd Qu.: 66.00   3rd Qu.: 528.0  
#>  Max.   :6259   Max.   :5399   Max.   :100.00   Max.   :2100.0  
#>       ram             screen        cd       multi      premium   
#>  Min.   : 2.000   Min.   :14.00   no :3351   no :5386   no : 612  
#>  1st Qu.: 4.000   1st Qu.:14.00   yes:2908   yes: 873   yes:5647  
#>  Median : 8.000   Median :14.00                                   
#>  Mean   : 8.287   Mean   :14.61                                   
#>  3rd Qu.: 8.000   3rd Qu.:15.00                                   
#>  Max.   :32.000   Max.   :17.00                                   
#>       ads            trend      
#>  Min.   : 39.0   Min.   : 1.00  
#>  1st Qu.:162.5   1st Qu.:10.00  
#>  Median :246.0   Median :16.00  
#>  Mean   :221.3   Mean   :15.93  
#>  3rd Qu.:275.0   3rd Qu.:21.50  
#>  Max.   :339.0   Max.   :35.00

Let us first split the data into train and test. We will use 80% of the data as train and remaining as test.

noOfPoints <- dim(computers)[1]
trainLength <- as.integer(noOfPoints * 0.8)

trainComputers <- computers[1:trainLength,]
testComputers <- computers[(trainLength+1):noOfPoints,]

K-means is not suitable for factor variables as the sample space for factor variables is discrete. A Euclidean distance function on such a space isn’t really meaningful. Hence, we will delete the factor variables in our dataset.

Here we keep the original trainComputers and testComputers as we will use the price variable from this dataset to overlay as heatmap and generate some insights.

trainComputers <-
  trainComputers %>% dplyr::select(-c(X, cd, multi, premium, trend))
testComputers <-
  testComputers %>% dplyr::select(-c(X, cd, multi, premium, trend))

Let us try to understand the HVT function first.

muHVT::HVT(
  dataset,
  nclust,
  depth,
  quant.err,
  projection.scale,
  normalize = T,
  distance_metric = c("L1_Norm", "L2_Norm"),
  error_metric = c("mean", "max")
)

Each of the parameters have been explained below

dataset - A dataframe with numeric columns
nlcust - An integer indicating the number of cells per hierarchy (level)
depth - An integer indicating the number of levels. (1 = No hierarchy, 2 = 2 levels, etc …)
quant.error - A number indicating the quantization error threshold. A cell will only breakdown into further cells if the quantization error of the cell is above the defined quantization error threshold
projection.scale - A number indicating the scale factor for the tesselations so as to visualize the sub-tesselations efficiently
normalize - A logical value indicating whether the columns in your dataset need to be normalized. Default value is TRUE. The algorithm supports Z-score normalization
distance_metric - The distance metric can be L1_Norm or L2_Norm. L1_Norm is selected by default. The distance metric is used to calculate the distance between an n dimensional point and centroid. The user can also pass a custom function to calculate this distance
error_metric - The error metric can be mean or max. max is selected by default. max will return the max of m values and mean will take mean of m values where each value is a distance between a point and centroid of the cell. Moreover, the user can also pass a custom function to calculate the error metric

First we will perform hierarchical Vector Quantization at level 1 by setting the parameter depth to 1 and the number of cells to 15. Here, level 1 signifies no hierarchy.

set.seed(240)
hvt.results <- list()
hvt.results <- muHVT::HVT(trainComputers,
                          nclust = 15,
                          depth = 1,
                          quant.err = 0.2,
                          projection.scale = 10,
                          normalize = T,
                          distance_metric = "L1_Norm",
                          error_metric = "mean")

Now let’s try to understand plotHVT function. The parameters have been explained in detail below

muHVT::plotHVT(hvt.results, line.width, color.vec, pch1 = 21, centroid.size = 3, title = NULL, maxDepth = 1)

hvt.results - A list containing the ouput of the HVT function which has the details of the tessellations to be plotted
line.width - A vector indicating the line widths of the tessellation boundaries for each level
color.vec - A vector indicating the colors of the tessellations boundaries at each level
pch1 - Symbol type of the centroids of the tessellations (parent levels). Refer points (default = 21)
centroid.size - Size of centroids of first level tessellations (default = 3)
title - Set a title for the plot (default = NULL)

Let’s plot the voronoi tesselation

# Voronoi tesselation plot for level one
muHVT::plotHVT(hvt.results,
        line.width = c(1.2), 
        color.vec = c("#141B41"),
        maxDepth = 1)

Figure 2: The Voronoi Tessellation for level 1 shown for the 15 cells in the dataset ’computers’

As per the manual, hvt.results[[3]] gives us detailed information about the hierarchical vector quantized data.

hvt.results[[3]][['summary']] gives a nice tabular data containing no of points, Quantization Error and the codebook.

Now let us understand what each column in the summary table means

Segment Level - Level of the cell. In this case, we have performed Vector Quantization for depth 1. Hence Segment Level is 1
Segment Parent - Parent segment of the cell
Segment Child - The children of a particular cell. In this case, first level has 15 cells hence we can see Segment Child 1,2,3,4,5 ,..,15.
n - No of points in each cell
Quant.Error - Quantization Error for each cell

All the columns after this will contain centroids for each cell. They can also be called a codebook, which represents a collection of all centroids or codewords.

summaryTable(hvt.results[[3]][['summary']])

Segment.Level	Segment.Parent	Segment.Child	n	Quant.Error	price	speed	hd	ram	screen	ads
1	1	1	480	0.33	0.69	0.70	0.24	-0.02	0.06	0.57
1	1	2	390	0.49	0.83	0.21	0.05	0.10	2.88	0.10
1	1	3	145	0.35	0.27	2.67	0.17	-0.20	-0.17	0.71
1	1	4	505	0.26	-0.17	-0.80	0.24	-0.04	-0.31	0.42
1	1	5	241	0.28	-0.34	0.66	-0.73	-0.75	-0.40	-0.40
1	1	6	150	0.49	0.90	-0.55	2.71	2.32	0.29	-0.60
1	1	7	286	0.23	0.75	-0.71	0.79	1.61	-0.41	0.35
1	1	8	258	0.3	-0.39	0.76	0.71	0.00	-0.16	-0.54
1	1	9	324	0.25	-1.08	-0.79	-0.56	-0.69	-0.38	-0.76
1	1	10	401	0.29	-0.54	0.56	-0.62	-0.76	-0.32	0.76
1	1	11	288	0.34	1.19	1.24	0.74	1.61	0.13	0.38
1	1	12	917	0.22	-0.98	-0.91	-0.82	-0.77	-0.44	0.55
1	1	13	229	0.45	1.09	0.33	-0.16	0.33	-0.15	-1.94
1	1	14	97	0.57	2.01	1.24	3.36	2.46	0.20	0.01
1	1	15	296	0.29	-0.33	-0.53	-0.81	-0.51	-0.43	-2.16

Let’s have a look at Quant.Error variable in the above table. It seems that none of the cells have hit the quantization threshold error.

Now let’s check the compression summary. The table below shows no of cells, no of cells having quantization error below threshold and percentage of cells having quantization error below threshold for each level.

compressionSummaryTable(hvt.results[[3]]$compression_summary)

segmentLevel	noOfCells	noOfCellsBelowQuantizationError	percentOfCellsBelowQuantizationErrorThreshold
1	15	0	0

As it can be seen in the table above, percentage of cells in level 1 having Quantization Error below threshold is 0%. Hence, we can go one level deeper and try to compress it further.

We will now overlay the Quant.Error variable as heatmap over the Voronoi Tesselation plot to visualize the quantization error better.

Let’s have look at the function hvtHmap which we will use to overlay a variable as heatmap.

muHVT::hvtHmap(hvt.results, dataset, child.level, hmap.cols, color.vec ,line.width, palette.color = 6)

hvt.results - A list of hvt.results obtained from the HVT function
dataset - A dataframe containing the variables to overlay as a heatmap. The user can pass an external dataset or the dataset that was used to perform hierarchical vector quantization. The dataset should have the same number of points as the dataset used to perform hierarchical Vector Quantization in the HVT function
child.level - A number indicating the level for which the heat map is to be plotted
hmap.cols - The column number of column name from the dataset indicating the variables for which the heat map is to be plotted. To plot the quantization error as heatmap, pass 'quant_error'. Similary to plot the no of points in each cell as heatmap, pass 'no_of_points' as a parameter
color.vec - A color vector such that length(color.vec) = child.level (default = NULL)
line.width - A line width vector such that length(line.width) = child.level (default = NULL)
palette.color - A number indicating the heat map color palette. 1 - rainbow, 2 - heat.colors, 3 - terrain.colors, 4 - topo.colors, 5 - cm.colors, 6 - BlCyGrYlRd (Blue,Cyan,Green,Yellow,Red) color (default = 6)
show.points - A boolean indicating whether the centroids should be plotted on the tesselations (default = FALSE)

Now let’s plot the quantization error for each cell at level one as a heatmap.

muHVT::hvtHmap(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "Quant.Error",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 3,
  show.points = T,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 3: The Voronoi Tessellation with the heat map overlaid for variable ’quant_error’ in the ’computers’ dataset

Now let’s go one level deeper and perform hierarchical vector quantization.

set.seed(240)
hvt.results2 <- list()
# depth=2 is used for level2 in the hierarchy
hvt.results2 <- muHVT::HVT(
  trainComputers,
  nclust = 15,
  depth = 2,
  quant.err = 0.2,
  projection.scale = 10,
  normalize = T,
  distance_metric = "L1_Norm",
  error_metric = "mean"
)

Let’s plot the voronoi tesselation for both the levels.

# Voronoi tesselation plot for level two
muHVT::plotHVT(
  hvt.results2,
  line.width = c(1.2, 0.8),
  color.vec = c("#141B41", "#0582CA"),
  maxDepth = 2
)

Figure 4: The Voronoi Tessellation for level 2 shown for the 225 cells in the dataset ’computers’

In the table below, Segment Level signifies the depth.

Level 1 has 15 cells

Level 2 has 225 cells .i.e. each cell in level 1 is divided into 15 cells each

Let’s analyze the summary table again for Quant.Error and see if any of the cells in the 2nd level have Quantization Error below the Quantization Error threshold. In the table below, the values for Quant.Error of the cells which have hit the Quantization Error threshold are shown in red. Here we are showing just top 50 rows for the sake of brevity.

summaryTable(hvt.results2[[3]][['summary']],limit = 50)

Segment.Level	Segment.Parent	Segment.Child	n	Quant.Error	price	speed	hd	ram	screen	ads
1	1	1	480	0.33	0.69	0.70	0.24	-0.02	0.06	0.57
1	1	2	390	0.49	0.83	0.21	0.05	0.10	2.88	0.10
1	1	3	145	0.35	0.27	2.67	0.17	-0.20	-0.17	0.71
1	1	4	505	0.26	-0.17	-0.80	0.24	-0.04	-0.31	0.42
1	1	5	241	0.28	-0.34	0.66	-0.73	-0.75	-0.40	-0.40
1	1	6	150	0.49	0.90	-0.55	2.71	2.32	0.29	-0.60
1	1	7	286	0.23	0.75	-0.71	0.79	1.61	-0.41	0.35
1	1	8	258	0.3	-0.39	0.76	0.71	0.00	-0.16	-0.54
1	1	9	324	0.25	-1.08	-0.79	-0.56	-0.69	-0.38	-0.76
1	1	10	401	0.29	-0.54	0.56	-0.62	-0.76	-0.32	0.76
1	1	11	288	0.34	1.19	1.24	0.74	1.61	0.13	0.38
1	1	12	917	0.22	-0.98	-0.91	-0.82	-0.77	-0.44	0.55
1	1	13	229	0.45	1.09	0.33	-0.16	0.33	-0.15	-1.94
1	1	14	97	0.57	2.01	1.24	3.36	2.46	0.20	0.01
1	1	15	296	0.29	-0.33	-0.53	-0.81	-0.51	-0.43	-2.16
2	1	1	41	0.17	0.98	0.88	-0.31	-0.18	0.55	0.44
2	1	2	55	0.11	0.48	0.92	0.59	0.05	0.55	0.54
2	1	3	57	0.13	0.40	0.08	-0.02	0.01	0.55	0.51
2	1	4	22	0.19	1.95	0.77	0.68	-0.01	-0.61	0.58
2	1	5	53	0.11	0.78	0.92	0.31	0.03	-0.61	0.70
2	1	6	45	0.13	0.15	1.03	0.34	-0.01	-0.61	1.04
2	1	7	31	0.17	0.85	0.81	0.00	-0.16	-0.61	0.01
2	1	8	28	0.15	-0.01	0.70	0.42	0.04	0.55	1.37
2	1	9	50	0.12	0.51	0.09	0.19	0.00	-0.61	0.62
2	1	10	35	0.18	2.13	0.87	0.64	-0.16	0.55	0.48
2	1	11	39	0.1	-0.03	0.92	-0.06	0.04	0.55	0.50
2	1	12	24	0.2	1.10	0.64	0.33	0.00	0.55	-0.21
2	1	13	0	NA	NA	NA	NA	NA	NA	NA
2	1	14	0	NA	NA	NA	NA	NA	NA	NA
2	1	15	0	NA	NA	NA	NA	NA	NA	NA
2	2	1	24	0.19	2.32	0.92	0.32	-0.03	2.88	0.53
2	2	2	12	0.32	1.25	-0.49	1.29	1.37	2.88	-1.03
2	2	3	19	0.22	1.25	0.31	-0.31	-0.43	2.88	0.80
2	2	4	6	0.32	3.80	2.08	1.45	0.06	2.88	-0.25
2	2	5	16	0.18	0.73	-0.67	-0.52	-0.13	2.88	-1.75
2	2	6	19	0.31	0.76	0.96	1.19	1.22	2.88	-0.33
2	2	7	56	0.2	-0.52	-0.90	-0.50	-0.71	2.88	0.40
2	2	8	63	0.22	0.46	-0.80	-0.11	-0.23	2.88	0.51
2	2	9	23	0.18	1.21	0.52	-0.54	-0.14	2.88	-2.06
2	2	10	17	0.18	-0.04	0.19	-0.12	-0.30	2.88	0.46
2	2	11	43	0.24	2.40	0.72	0.26	1.63	2.88	0.29
2	2	12	20	0.21	-0.28	0.71	0.02	-0.33	2.88	-0.46
2	2	13	48	0.16	0.71	0.80	0.14	0.02	2.88	0.39
2	2	14	13	0.24	0.96	2.67	0.49	0.55	2.88	0.56
2	2	15	11	0.15	0.65	0.77	0.32	0.06	2.88	1.34
2	3	1	6	0.15	1.53	2.67	1.98	0.06	-0.61	-0.49
2	3	2	14	0.16	0.08	2.67	0.25	-0.27	-0.53	-0.41
2	3	3	38	0.2	-0.19	2.67	-0.78	-0.83	-0.37	0.99
2	3	4	14	0.09	0.37	2.67	0.59	0.06	-0.61	0.33
2	3	5	27	0.09	0.50	2.67	0.32	0.06	-0.61	1.33

The users can look at the compression summary to get a quick summary on the compression as it becomes quite cumbersome to look at the summary table above as we go deeper.

compressionSummaryTable(hvt.results2[[3]]$compression_summary)

segmentLevel	noOfCells	noOfCellsBelowQuantizationError	percentOfCellsBelowQuantizationErrorThreshold
1	15	0	0
2	134	118	0.88

As it can be seen in the table above, only 5% cells in the 2nd level have Quantization Error below threshold. Therefore, we can go another level deeper and try to compress the data further.

We will look at the heatmap for Quantization Error for level 2.

muHVT::hvtHmap(
  hvt.results2,
  trainComputers,
  child.level = 2,
  hmap.cols = "Quant.Error",
  line.width = c(0.8, 0.2),
  color.vec = c("#141B41", "#0582CA"),
  palette.color = 6,
  centroid.size = 2,
  show.points = T,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 5: The Voronoi Tessellation with the heat map overlaid for variable ’quant_error’ in the ’computers’ dataset

As the Quantization Error criteria is not met, let’s perform hierarchical Vector Quantization at level 3.

set.seed(240)
hvt.results3 <- list()
# depth=3 is used for level3 in the hierarchy
hvt.results3 <- muHVT::HVT(
  trainComputers,
  nclust = 15,
  depth = 3,
  quant.err = 0.2,
  projection.scale = 10,
  normalize = T,
  distance_metric = "L1_Norm",
  error_metric = "mean"
)

Let’s plot the Voronoi Tesselation for all 3 levels.

# Voronoi tesselation plot for level three
muHVT::plotHVT(
  hvt.results3,
  line.width = c(1.2, 0.8, 0.4),
  color.vec = c("#141B41", "#0582CA", "#8BA0B4"),
  centroid.size = 3,
  maxDepth = 3
)

Figure 6: The Voronoi Tessellation for level 3 shown for the 1905 cells in the dataset ’computers’

Each of the 225 cells whose quantization is above the defined threshold in level 2 will break down into 15 cells each in level 3. Hence, as it can be seen below, level 3 has 3375 rows. So it will have 3615 rows in total. We will only show first 500 rows here.

summaryTable(hvt.results3[[3]][['summary']],scroll = T,limit = 500)

Segment.Level	Segment.Parent	Segment.Child	n	Quant.Error	price	speed	hd	ram	screen	ads
1	1	1	480	0.33	0.69	0.70	0.24	-0.02	0.06	0.57
1	1	2	390	0.49	0.83	0.21	0.05	0.10	2.88	0.10
1	1	3	145	0.35	0.27	2.67	0.17	-0.20	-0.17	0.71
1	1	4	505	0.26	-0.17	-0.80	0.24	-0.04	-0.31	0.42
1	1	5	241	0.28	-0.34	0.66	-0.73	-0.75	-0.40	-0.40
1	1	6	150	0.49	0.90	-0.55	2.71	2.32	0.29	-0.60
1	1	7	286	0.23	0.75	-0.71	0.79	1.61	-0.41	0.35
1	1	8	258	0.3	-0.39	0.76	0.71	0.00	-0.16	-0.54
1	1	9	324	0.25	-1.08	-0.79	-0.56	-0.69	-0.38	-0.76
1	1	10	401	0.29	-0.54	0.56	-0.62	-0.76	-0.32	0.76
1	1	11	288	0.34	1.19	1.24	0.74	1.61	0.13	0.38
1	1	12	917	0.22	-0.98	-0.91	-0.82	-0.77	-0.44	0.55
1	1	13	229	0.45	1.09	0.33	-0.16	0.33	-0.15	-1.94
1	1	14	97	0.57	2.01	1.24	3.36	2.46	0.20	0.01
1	1	15	296	0.29	-0.33	-0.53	-0.81	-0.51	-0.43	-2.16
2	1	1	41	0.17	0.98	0.88	-0.31	-0.18	0.55	0.44
2	1	2	55	0.11	0.48	0.92	0.59	0.05	0.55	0.54
2	1	3	57	0.13	0.40	0.08	-0.02	0.01	0.55	0.51
2	1	4	22	0.19	1.95	0.77	0.68	-0.01	-0.61	0.58
2	1	5	53	0.11	0.78	0.92	0.31	0.03	-0.61	0.70
2	1	6	45	0.13	0.15	1.03	0.34	-0.01	-0.61	1.04
2	1	7	31	0.17	0.85	0.81	0.00	-0.16	-0.61	0.01
2	1	8	28	0.15	-0.01	0.70	0.42	0.04	0.55	1.37
2	1	9	50	0.12	0.51	0.09	0.19	0.00	-0.61	0.62
2	1	10	35	0.18	2.13	0.87	0.64	-0.16	0.55	0.48
2	1	11	39	0.1	-0.03	0.92	-0.06	0.04	0.55	0.50
2	1	12	24	0.2	1.10	0.64	0.33	0.00	0.55	-0.21
2	1	13	0	NA	NA	NA	NA	NA	NA	NA
2	1	14	0	NA	NA	NA	NA	NA	NA	NA
2	1	15	0	NA	NA	NA	NA	NA	NA	NA
2	2	1	24	0.19	2.32	0.92	0.32	-0.03	2.88	0.53
2	2	2	12	0.32	1.25	-0.49	1.29	1.37	2.88	-1.03
2	2	3	19	0.22	1.25	0.31	-0.31	-0.43	2.88	0.80
2	2	4	6	0.32	3.80	2.08	1.45	0.06	2.88	-0.25
2	2	5	16	0.18	0.73	-0.67	-0.52	-0.13	2.88	-1.75
2	2	6	19	0.31	0.76	0.96	1.19	1.22	2.88	-0.33
2	2	7	56	0.2	-0.52	-0.90	-0.50	-0.71	2.88	0.40
2	2	8	63	0.22	0.46	-0.80	-0.11	-0.23	2.88	0.51
2	2	9	23	0.18	1.21	0.52	-0.54	-0.14	2.88	-2.06
2	2	10	17	0.18	-0.04	0.19	-0.12	-0.30	2.88	0.46
2	2	11	43	0.24	2.40	0.72	0.26	1.63	2.88	0.29
2	2	12	20	0.21	-0.28	0.71	0.02	-0.33	2.88	-0.46
2	2	13	48	0.16	0.71	0.80	0.14	0.02	2.88	0.39
2	2	14	13	0.24	0.96	2.67	0.49	0.55	2.88	0.56
2	2	15	11	0.15	0.65	0.77	0.32	0.06	2.88	1.34
2	3	1	6	0.15	1.53	2.67	1.98	0.06	-0.61	-0.49
2	3	2	14	0.16	0.08	2.67	0.25	-0.27	-0.53	-0.41
2	3	3	38	0.2	-0.19	2.67	-0.78	-0.83	-0.37	0.99
2	3	4	14	0.09	0.37	2.67	0.59	0.06	-0.61	0.33
2	3	5	27	0.09	0.50	2.67	0.32	0.06	-0.61	1.33
2	3	6	46	0.15	0.39	2.67	0.47	0.06	0.55	0.72
2	3	7	0	NA	NA	NA	NA	NA	NA	NA
2	3	8	0	NA	NA	NA	NA	NA	NA	NA
2	3	9	0	NA	NA	NA	NA	NA	NA	NA
2	3	10	0	NA	NA	NA	NA	NA	NA	NA
2	3	11	0	NA	NA	NA	NA	NA	NA	NA
2	3	12	0	NA	NA	NA	NA	NA	NA	NA
2	3	13	0	NA	NA	NA	NA	NA	NA	NA
2	3	14	0	NA	NA	NA	NA	NA	NA	NA
2	3	15	0	NA	NA	NA	NA	NA	NA	NA
2	4	1	227	0.16	-0.20	-0.77	0.27	-0.05	-0.61	0.75
2	4	2	147	0.19	-0.02	-0.82	0.25	-0.01	-0.61	-0.10
2	4	3	131	0.19	-0.28	-0.82	0.18	-0.04	0.55	0.43
2	4	4	0	NA	NA	NA	NA	NA	NA	NA
2	4	5	0	NA	NA	NA	NA	NA	NA	NA
2	4	6	0	NA	NA	NA	NA	NA	NA	NA
2	4	7	0	NA	NA	NA	NA	NA	NA	NA
2	4	8	0	NA	NA	NA	NA	NA	NA	NA
2	4	9	0	NA	NA	NA	NA	NA	NA	NA
2	4	10	0	NA	NA	NA	NA	NA	NA	NA
2	4	11	0	NA	NA	NA	NA	NA	NA	NA
2	4	12	0	NA	NA	NA	NA	NA	NA	NA
2	4	13	0	NA	NA	NA	NA	NA	NA	NA
2	4	14	0	NA	NA	NA	NA	NA	NA	NA
2	4	15	0	NA	NA	NA	NA	NA	NA	NA
2	5	1	16	0.08	-1.06	0.92	-0.54	-0.72	-0.61	-0.65
2	5	2	19	0.17	-0.27	0.66	-0.78	-0.74	0.55	-0.08
2	5	3	22	0.08	-0.25	0.92	-1.16	-1.02	-0.61	-0.02
2	5	4	10	0.08	-0.21	0.09	-0.66	-0.64	-0.61	-1.10
2	5	5	9	0.08	-0.72	0.09	-0.66	-0.76	-0.61	-0.71
2	5	6	13	0.11	-0.79	0.92	-0.91	-0.78	-0.61	-1.27
2	5	7	16	0.12	-0.09	0.92	-0.56	-0.52	-0.61	-0.88
2	5	8	17	0.11	-0.19	0.92	-0.32	-0.58	-0.61	-0.02
2	5	9	21	0.06	-0.85	0.92	-0.80	-0.72	-0.61	0.06
2	5	10	6	0.07	-0.08	0.92	-0.89	-0.72	-0.61	-1.57
2	5	11	20	0.07	0.10	0.09	-0.64	-0.72	-0.61	-0.05
2	5	12	12	0.15	-0.87	0.57	-0.48	-0.75	0.55	-0.55
2	5	13	20	0.09	-0.57	0.09	-1.10	-1.01	-0.61	-0.03
2	5	14	26	0.09	0.40	0.92	-0.60	-0.72	-0.61	-0.14
2	5	15	14	0.2	-0.27	0.56	-0.77	-0.63	0.55	-1.18
2	6	1	5	0.01	0.51	-0.78	1.73	1.63	0.55	-1.30
2	6	2	5	0.01	0.19	-0.78	1.73	1.63	0.55	-1.30
2	6	3	5	0.14	1.24	-0.78	4.82	-0.09	-0.38	0.49
2	6	4	7	0.03	0.50	-0.78	1.73	1.63	0.55	-0.68
2	6	5	3	0.17	1.30	-0.78	3.06	1.63	-0.61	-0.03
2	6	6	20	0.14	0.55	0.38	1.82	1.63	0.55	-1.06
2	6	7	51	0.2	1.29	-0.54	3.06	3.19	0.82	-0.87
2	6	8	11	0.2	0.03	-0.78	3.16	0.06	-0.40	-0.36
2	6	9	32	0.08	1.19	-0.78	3.06	3.19	-0.54	0.10
2	6	10	6	0.01	0.12	-0.78	1.73	1.63	0.55	-0.84
2	6	11	5	0.01	0.19	-0.78	1.73	1.63	0.55	-0.62
2	6	12	0	NA	NA	NA	NA	NA	NA	NA
2	6	13	0	NA	NA	NA	NA	NA	NA	NA
2	6	14	0	NA	NA	NA	NA	NA	NA	NA
2	6	15	0	NA	NA	NA	NA	NA	NA	NA
2	7	1	45	0.15	0.62	-0.83	0.75	1.63	0.55	0.43
2	7	2	54	0.17	1.18	0.01	0.76	1.51	-0.61	0.54
2	7	3	149	0.13	0.65	-0.91	0.85	1.63	-0.61	0.53
2	7	4	38	0.19	0.66	-0.81	0.65	1.63	-0.46	-0.69
2	7	5	0	NA	NA	NA	NA	NA	NA	NA
2	7	6	0	NA	NA	NA	NA	NA	NA	NA
2	7	7	0	NA	NA	NA	NA	NA	NA	NA
2	7	8	0	NA	NA	NA	NA	NA	NA	NA
2	7	9	0	NA	NA	NA	NA	NA	NA	NA
2	7	10	0	NA	NA	NA	NA	NA	NA	NA
2	7	11	0	NA	NA	NA	NA	NA	NA	NA
2	7	12	0	NA	NA	NA	NA	NA	NA	NA
2	7	13	0	NA	NA	NA	NA	NA	NA	NA
2	7	14	0	NA	NA	NA	NA	NA	NA	NA
2	7	15	0	NA	NA	NA	NA	NA	NA	NA
2	8	1	37	0.2	-0.53	0.72	0.87	0.00	0.55	-1.02
2	8	2	57	0.16	-0.37	0.87	0.57	0.04	0.55	-0.15
2	8	3	64	0.12	-0.27	0.92	0.47	0.02	-0.61	-0.10
2	8	4	61	0.15	-0.61	0.90	0.47	-0.09	-0.61	-0.83
2	8	5	13	0.19	-0.11	0.54	3.06	0.06	-0.26	-0.81
2	8	6	26	0.17	-0.19	-0.08	0.76	0.06	-0.53	-0.95
2	8	7	0	NA	NA	NA	NA	NA	NA	NA
2	8	8	0	NA	NA	NA	NA	NA	NA	NA
2	8	9	0	NA	NA	NA	NA	NA	NA	NA
2	8	10	0	NA	NA	NA	NA	NA	NA	NA
2	8	11	0	NA	NA	NA	NA	NA	NA	NA
2	8	12	0	NA	NA	NA	NA	NA	NA	NA
2	8	13	0	NA	NA	NA	NA	NA	NA	NA
2	8	14	0	NA	NA	NA	NA	NA	NA	NA
2	8	15	0	NA	NA	NA	NA	NA	NA	NA
2	9	1	29	0.14	-1.33	-0.66	-0.07	-0.70	0.55	-0.54
2	9	2	37	0.18	-0.85	-0.82	-0.78	-0.68	0.55	-0.93
2	9	3	258	0.2	-1.08	-0.79	-0.58	-0.69	-0.61	-0.76
2	9	4	0	NA	NA	NA	NA	NA	NA	NA
2	9	5	0	NA	NA	NA	NA	NA	NA	NA
2	9	6	0	NA	NA	NA	NA	NA	NA	NA
2	9	7	0	NA	NA	NA	NA	NA	NA	NA
2	9	8	0	NA	NA	NA	NA	NA	NA	NA
2	9	9	0	NA	NA	NA	NA	NA	NA	NA
2	9	10	0	NA	NA	NA	NA	NA	NA	NA
2	9	11	0	NA	NA	NA	NA	NA	NA	NA
2	9	12	0	NA	NA	NA	NA	NA	NA	NA
2	9	13	0	NA	NA	NA	NA	NA	NA	NA
2	9	14	0	NA	NA	NA	NA	NA	NA	NA
2	9	15	0	NA	NA	NA	NA	NA	NA	NA
2	10	1	65	0.2	-0.62	0.38	0.07	-0.62	-0.61	0.72
2	10	2	75	0.16	-0.71	0.99	-0.98	-0.94	-0.61	0.92
2	10	3	45	0.15	-0.69	0.09	-0.52	-0.72	0.55	0.62
2	10	4	92	0.15	-0.70	0.09	-0.90	-0.82	-0.61	0.74
2	10	5	67	0.11	-0.02	0.89	-0.66	-0.68	-0.61	0.76
2	10	6	57	0.18	-0.49	0.95	-0.53	-0.68	0.55	0.76
2	10	7	0	NA	NA	NA	NA	NA	NA	NA
2	10	8	0	NA	NA	NA	NA	NA	NA	NA
2	10	9	0	NA	NA	NA	NA	NA	NA	NA
2	10	10	0	NA	NA	NA	NA	NA	NA	NA
2	10	11	0	NA	NA	NA	NA	NA	NA	NA
2	10	12	0	NA	NA	NA	NA	NA	NA	NA
2	10	13	0	NA	NA	NA	NA	NA	NA	NA
2	10	14	0	NA	NA	NA	NA	NA	NA	NA
2	10	15	0	NA	NA	NA	NA	NA	NA	NA
2	11	1	19	0.1	0.97	1.14	0.82	1.63	-0.61	1.27
2	11	2	32	0.12	1.51	0.79	0.68	1.63	0.55	0.62
2	11	3	21	0.13	0.82	0.98	0.63	1.63	0.55	1.25
2	11	4	6	0.14	1.39	0.92	0.20	1.63	0.16	-1.08
2	11	5	15	0.38	1.44	2.67	1.16	1.21	0.01	-0.46
2	11	6	22	0.13	1.33	0.80	0.56	1.63	0.55	-0.24
2	11	7	20	0.15	1.22	2.67	1.05	1.63	-0.61	0.72
2	11	8	18	0.17	1.31	2.67	0.79	1.63	0.94	1.12
2	11	9	9	0.11	0.40	0.97	0.71	1.63	0.55	-0.74
2	11	10	5	0.29	2.72	0.92	0.59	1.94	0.32	0.13
2	11	11	18	0.12	0.65	0.92	1.07	1.63	-0.61	0.04
2	11	12	15	0.07	0.62	0.95	0.77	1.63	0.55	0.37
2	11	13	11	0.07	0.72	0.92	1.74	1.63	0.55	-0.66
2	11	14	43	0.09	1.51	0.92	0.77	1.63	-0.61	0.38
2	11	15	34	0.09	1.21	0.92	0.09	1.63	0.55	0.44
2	12	1	140	0.18	-0.92	-0.90	-0.76	-0.80	0.55	0.59
2	12	2	364	0.14	-0.61	-0.88	-0.83	-0.74	-0.61	0.41
2	12	3	413	0.17	-1.32	-0.94	-0.83	-0.79	-0.61	0.67
2	12	4	0	NA	NA	NA	NA	NA	NA	NA
2	12	5	0	NA	NA	NA	NA	NA	NA	NA
2	12	6	0	NA	NA	NA	NA	NA	NA	NA
2	12	7	0	NA	NA	NA	NA	NA	NA	NA
2	12	8	0	NA	NA	NA	NA	NA	NA	NA
2	12	9	0	NA	NA	NA	NA	NA	NA	NA
2	12	10	0	NA	NA	NA	NA	NA	NA	NA
2	12	11	0	NA	NA	NA	NA	NA	NA	NA
2	12	12	0	NA	NA	NA	NA	NA	NA	NA
2	12	13	0	NA	NA	NA	NA	NA	NA	NA
2	12	14	0	NA	NA	NA	NA	NA	NA	NA
2	12	15	0	NA	NA	NA	NA	NA	NA	NA
2	13	1	11	0.16	0.99	0.84	-0.21	-0.01	-0.61	-1.29
2	13	2	14	0.16	1.34	0.50	0.08	1.63	-0.61	-2.10
2	13	3	14	0.12	0.57	0.39	-0.54	-0.05	0.55	-2.24
2	13	4	13	0.14	0.80	0.92	-0.36	-0.06	0.55	-1.35
2	13	5	11	0.23	2.40	0.92	0.63	-0.01	0.23	-1.60
2	13	6	15	0.24	1.09	0.13	-0.20	1.63	0.55	-1.93
2	13	7	18	0.14	0.49	0.04	-0.35	-0.02	0.55	-1.37
2	13	8	15	0.16	0.55	-0.20	0.27	0.06	-0.61	-1.32
2	13	9	39	0.12	0.45	0.58	-0.59	0.06	-0.61	-2.20
2	13	10	11	0.14	2.58	-0.31	0.40	0.06	-0.61	-2.20
2	13	11	15	0.11	2.68	0.92	0.52	0.06	-0.61	-2.16
2	13	12	18	0.14	1.30	0.92	-0.36	-0.28	0.55	-2.31
2	13	13	9	0.17	1.24	0.55	-0.75	-0.37	-0.61	-2.35
2	13	14	20	0.13	0.87	-0.91	0.06	1.63	-0.56	-2.12
2	13	15	6	0.14	0.69	-0.78	-0.37	0.06	0.36	-2.13
2	14	1	3	0.04	1.52	0.92	3.06	3.19	0.55	0.08
2	14	2	6	0.29	2.14	0.92	3.25	2.15	2.88	-0.39
2	14	3	2	0.06	2.26	0.92	8.30	1.63	0.55	0.27
2	14	4	3	0.04	2.01	2.67	3.06	1.63	0.55	1.18
2	14	5	7	0.05	2.13	0.80	3.06	3.19	0.55	-0.62
2	14	6	10	0.05	1.53	0.92	3.06	3.19	-0.61	0.27
2	14	7	8	0.18	3.51	0.18	3.42	1.63	-0.61	0.77
2	14	8	19	0.12	1.99	2.67	3.06	3.11	-0.43	0.06
2	14	9	5	0.02	1.52	0.92	3.06	3.19	0.55	-1.30
2	14	10	6	0.24	1.92	0.92	4.58	-0.07	-0.42	0.49
2	14	11	4	0.02	1.24	0.92	3.06	3.19	0.55	-0.84
2	14	12	3	0.02	5.26	0.92	4.01	4.76	2.88	0.46
2	14	13	14	0.22	1.24	0.92	3.25	1.40	0.47	0.25
2	14	14	2	0.04	2.89	0.92	3.06	1.63	-0.61	-1.37
2	14	15	5	0.02	1.57	0.92	3.06	3.19	-0.61	-0.30
2	15	1	23	0.08	-0.20	0.92	-1.07	-0.80	-0.61	-2.28
2	15	2	100	0.15	-0.84	-0.94	-0.99	-0.77	-0.57	-2.21
2	15	3	58	0.15	-0.21	0.09	-0.86	-0.58	-0.59	-2.16
2	15	4	74	0.16	0.02	-0.83	-0.55	-0.14	-0.61	-2.11
2	15	5	41	0.19	0.05	-0.72	-0.62	-0.28	0.55	-2.08
2	15	6	0	NA	NA	NA	NA	NA	NA	NA
2	15	7	0	NA	NA	NA	NA	NA	NA	NA
2	15	8	0	NA	NA	NA	NA	NA	NA	NA
2	15	9	0	NA	NA	NA	NA	NA	NA	NA
2	15	10	0	NA	NA	NA	NA	NA	NA	NA
2	15	11	0	NA	NA	NA	NA	NA	NA	NA
2	15	12	0	NA	NA	NA	NA	NA	NA	NA
2	15	13	0	NA	NA	NA	NA	NA	NA	NA
2	15	14	0	NA	NA	NA	NA	NA	NA	NA
2	15	15	0	NA	NA	NA	NA	NA	NA	NA
3	1	1	0	NA	NA	NA	NA	NA	NA	NA
3	1	2	0	NA	NA	NA	NA	NA	NA	NA
3	1	3	0	NA	NA	NA	NA	NA	NA	NA
3	1	4	0	NA	NA	NA	NA	NA	NA	NA
3	1	5	0	NA	NA	NA	NA	NA	NA	NA
3	1	6	0	NA	NA	NA	NA	NA	NA	NA
3	1	7	0	NA	NA	NA	NA	NA	NA	NA
3	1	8	0	NA	NA	NA	NA	NA	NA	NA
3	1	9	0	NA	NA	NA	NA	NA	NA	NA
3	1	10	0	NA	NA	NA	NA	NA	NA	NA
3	1	11	0	NA	NA	NA	NA	NA	NA	NA
3	1	12	0	NA	NA	NA	NA	NA	NA	NA
3	1	13	0	NA	NA	NA	NA	NA	NA	NA
3	1	14	0	NA	NA	NA	NA	NA	NA	NA
3	1	15	0	NA	NA	NA	NA	NA	NA	NA
3	2	1	0	NA	NA	NA	NA	NA	NA	NA
3	2	2	0	NA	NA	NA	NA	NA	NA	NA
3	2	3	0	NA	NA	NA	NA	NA	NA	NA
3	2	4	0	NA	NA	NA	NA	NA	NA	NA
3	2	5	0	NA	NA	NA	NA	NA	NA	NA
3	2	6	0	NA	NA	NA	NA	NA	NA	NA
3	2	7	0	NA	NA	NA	NA	NA	NA	NA
3	2	8	0	NA	NA	NA	NA	NA	NA	NA
3	2	9	0	NA	NA	NA	NA	NA	NA	NA
3	2	10	0	NA	NA	NA	NA	NA	NA	NA
3	2	11	0	NA	NA	NA	NA	NA	NA	NA
3	2	12	0	NA	NA	NA	NA	NA	NA	NA
3	2	13	0	NA	NA	NA	NA	NA	NA	NA
3	2	14	0	NA	NA	NA	NA	NA	NA	NA
3	2	15	0	NA	NA	NA	NA	NA	NA	NA
3	3	1	0	NA	NA	NA	NA	NA	NA	NA
3	3	2	0	NA	NA	NA	NA	NA	NA	NA
3	3	3	0	NA	NA	NA	NA	NA	NA	NA
3	3	4	0	NA	NA	NA	NA	NA	NA	NA
3	3	5	0	NA	NA	NA	NA	NA	NA	NA
3	3	6	0	NA	NA	NA	NA	NA	NA	NA
3	3	7	0	NA	NA	NA	NA	NA	NA	NA
3	3	8	0	NA	NA	NA	NA	NA	NA	NA
3	3	9	0	NA	NA	NA	NA	NA	NA	NA
3	3	10	0	NA	NA	NA	NA	NA	NA	NA
3	3	11	0	NA	NA	NA	NA	NA	NA	NA
3	3	12	0	NA	NA	NA	NA	NA	NA	NA
3	3	13	0	NA	NA	NA	NA	NA	NA	NA
3	3	14	0	NA	NA	NA	NA	NA	NA	NA
3	3	15	0	NA	NA	NA	NA	NA	NA	NA
3	4	1	0	NA	NA	NA	NA	NA	NA	NA
3	4	2	0	NA	NA	NA	NA	NA	NA	NA
3	4	3	0	NA	NA	NA	NA	NA	NA	NA
3	4	4	0	NA	NA	NA	NA	NA	NA	NA
3	4	5	0	NA	NA	NA	NA	NA	NA	NA
3	4	6	0	NA	NA	NA	NA	NA	NA	NA
3	4	7	0	NA	NA	NA	NA	NA	NA	NA
3	4	8	0	NA	NA	NA	NA	NA	NA	NA
3	4	9	0	NA	NA	NA	NA	NA	NA	NA
3	4	10	0	NA	NA	NA	NA	NA	NA	NA
3	4	11	0	NA	NA	NA	NA	NA	NA	NA
3	4	12	0	NA	NA	NA	NA	NA	NA	NA
3	4	13	0	NA	NA	NA	NA	NA	NA	NA
3	4	14	0	NA	NA	NA	NA	NA	NA	NA
3	4	15	0	NA	NA	NA	NA	NA	NA	NA
3	5	1	0	NA	NA	NA	NA	NA	NA	NA
3	5	2	0	NA	NA	NA	NA	NA	NA	NA
3	5	3	0	NA	NA	NA	NA	NA	NA	NA
3	5	4	0	NA	NA	NA	NA	NA	NA	NA
3	5	5	0	NA	NA	NA	NA	NA	NA	NA
3	5	6	0	NA	NA	NA	NA	NA	NA	NA
3	5	7	0	NA	NA	NA	NA	NA	NA	NA
3	5	8	0	NA	NA	NA	NA	NA	NA	NA
3	5	9	0	NA	NA	NA	NA	NA	NA	NA
3	5	10	0	NA	NA	NA	NA	NA	NA	NA
3	5	11	0	NA	NA	NA	NA	NA	NA	NA
3	5	12	0	NA	NA	NA	NA	NA	NA	NA
3	5	13	0	NA	NA	NA	NA	NA	NA	NA
3	5	14	0	NA	NA	NA	NA	NA	NA	NA
3	5	15	0	NA	NA	NA	NA	NA	NA	NA
3	6	1	0	NA	NA	NA	NA	NA	NA	NA
3	6	2	0	NA	NA	NA	NA	NA	NA	NA
3	6	3	0	NA	NA	NA	NA	NA	NA	NA
3	6	4	0	NA	NA	NA	NA	NA	NA	NA
3	6	5	0	NA	NA	NA	NA	NA	NA	NA
3	6	6	0	NA	NA	NA	NA	NA	NA	NA
3	6	7	0	NA	NA	NA	NA	NA	NA	NA
3	6	8	0	NA	NA	NA	NA	NA	NA	NA
3	6	9	0	NA	NA	NA	NA	NA	NA	NA
3	6	10	0	NA	NA	NA	NA	NA	NA	NA
3	6	11	0	NA	NA	NA	NA	NA	NA	NA
3	6	12	0	NA	NA	NA	NA	NA	NA	NA
3	6	13	0	NA	NA	NA	NA	NA	NA	NA
3	6	14	0	NA	NA	NA	NA	NA	NA	NA
3	6	15	0	NA	NA	NA	NA	NA	NA	NA
3	7	1	0	NA	NA	NA	NA	NA	NA	NA
3	7	2	0	NA	NA	NA	NA	NA	NA	NA
3	7	3	0	NA	NA	NA	NA	NA	NA	NA
3	7	4	0	NA	NA	NA	NA	NA	NA	NA
3	7	5	0	NA	NA	NA	NA	NA	NA	NA
3	7	6	0	NA	NA	NA	NA	NA	NA	NA
3	7	7	0	NA	NA	NA	NA	NA	NA	NA
3	7	8	0	NA	NA	NA	NA	NA	NA	NA
3	7	9	0	NA	NA	NA	NA	NA	NA	NA
3	7	10	0	NA	NA	NA	NA	NA	NA	NA
3	7	11	0	NA	NA	NA	NA	NA	NA	NA
3	7	12	0	NA	NA	NA	NA	NA	NA	NA
3	7	13	0	NA	NA	NA	NA	NA	NA	NA
3	7	14	0	NA	NA	NA	NA	NA	NA	NA
3	7	15	0	NA	NA	NA	NA	NA	NA	NA
3	8	1	0	NA	NA	NA	NA	NA	NA	NA
3	8	2	0	NA	NA	NA	NA	NA	NA	NA
3	8	3	0	NA	NA	NA	NA	NA	NA	NA
3	8	4	0	NA	NA	NA	NA	NA	NA	NA
3	8	5	0	NA	NA	NA	NA	NA	NA	NA
3	8	6	0	NA	NA	NA	NA	NA	NA	NA
3	8	7	0	NA	NA	NA	NA	NA	NA	NA
3	8	8	0	NA	NA	NA	NA	NA	NA	NA
3	8	9	0	NA	NA	NA	NA	NA	NA	NA
3	8	10	0	NA	NA	NA	NA	NA	NA	NA
3	8	11	0	NA	NA	NA	NA	NA	NA	NA
3	8	12	0	NA	NA	NA	NA	NA	NA	NA
3	8	13	0	NA	NA	NA	NA	NA	NA	NA
3	8	14	0	NA	NA	NA	NA	NA	NA	NA
3	8	15	0	NA	NA	NA	NA	NA	NA	NA
3	9	1	0	NA	NA	NA	NA	NA	NA	NA
3	9	2	0	NA	NA	NA	NA	NA	NA	NA
3	9	3	0	NA	NA	NA	NA	NA	NA	NA
3	9	4	0	NA	NA	NA	NA	NA	NA	NA
3	9	5	0	NA	NA	NA	NA	NA	NA	NA
3	9	6	0	NA	NA	NA	NA	NA	NA	NA
3	9	7	0	NA	NA	NA	NA	NA	NA	NA
3	9	8	0	NA	NA	NA	NA	NA	NA	NA
3	9	9	0	NA	NA	NA	NA	NA	NA	NA
3	9	10	0	NA	NA	NA	NA	NA	NA	NA
3	9	11	0	NA	NA	NA	NA	NA	NA	NA
3	9	12	0	NA	NA	NA	NA	NA	NA	NA
3	9	13	0	NA	NA	NA	NA	NA	NA	NA
3	9	14	0	NA	NA	NA	NA	NA	NA	NA
3	9	15	0	NA	NA	NA	NA	NA	NA	NA
3	10	1	0	NA	NA	NA	NA	NA	NA	NA
3	10	2	0	NA	NA	NA	NA	NA	NA	NA
3	10	3	0	NA	NA	NA	NA	NA	NA	NA
3	10	4	0	NA	NA	NA	NA	NA	NA	NA
3	10	5	0	NA	NA	NA	NA	NA	NA	NA
3	10	6	0	NA	NA	NA	NA	NA	NA	NA
3	10	7	0	NA	NA	NA	NA	NA	NA	NA
3	10	8	0	NA	NA	NA	NA	NA	NA	NA
3	10	9	0	NA	NA	NA	NA	NA	NA	NA
3	10	10	0	NA	NA	NA	NA	NA	NA	NA
3	10	11	0	NA	NA	NA	NA	NA	NA	NA
3	10	12	0	NA	NA	NA	NA	NA	NA	NA
3	10	13	0	NA	NA	NA	NA	NA	NA	NA
3	10	14	0	NA	NA	NA	NA	NA	NA	NA
3	10	15	0	NA	NA	NA	NA	NA	NA	NA
3	11	1	0	NA	NA	NA	NA	NA	NA	NA
3	11	2	0	NA	NA	NA	NA	NA	NA	NA
3	11	3	0	NA	NA	NA	NA	NA	NA	NA
3	11	4	0	NA	NA	NA	NA	NA	NA	NA
3	11	5	0	NA	NA	NA	NA	NA	NA	NA
3	11	6	0	NA	NA	NA	NA	NA	NA	NA
3	11	7	0	NA	NA	NA	NA	NA	NA	NA
3	11	8	0	NA	NA	NA	NA	NA	NA	NA
3	11	9	0	NA	NA	NA	NA	NA	NA	NA
3	11	10	0	NA	NA	NA	NA	NA	NA	NA
3	11	11	0	NA	NA	NA	NA	NA	NA	NA
3	11	12	0	NA	NA	NA	NA	NA	NA	NA
3	11	13	0	NA	NA	NA	NA	NA	NA	NA
3	11	14	0	NA	NA	NA	NA	NA	NA	NA
3	11	15	0	NA	NA	NA	NA	NA	NA	NA
3	12	1	0	NA	NA	NA	NA	NA	NA	NA
3	12	2	0	NA	NA	NA	NA	NA	NA	NA
3	12	3	0	NA	NA	NA	NA	NA	NA	NA
3	12	4	0	NA	NA	NA	NA	NA	NA	NA
3	12	5	0	NA	NA	NA	NA	NA	NA	NA
3	12	6	0	NA	NA	NA	NA	NA	NA	NA
3	12	7	0	NA	NA	NA	NA	NA	NA	NA
3	12	8	0	NA	NA	NA	NA	NA	NA	NA
3	12	9	0	NA	NA	NA	NA	NA	NA	NA
3	12	10	0	NA	NA	NA	NA	NA	NA	NA
3	12	11	0	NA	NA	NA	NA	NA	NA	NA
3	12	12	0	NA	NA	NA	NA	NA	NA	NA
3	12	13	0	NA	NA	NA	NA	NA	NA	NA
3	12	14	0	NA	NA	NA	NA	NA	NA	NA
3	12	15	0	NA	NA	NA	NA	NA	NA	NA
3	13	1	0	NA	NA	NA	NA	NA	NA	NA
3	13	2	0	NA	NA	NA	NA	NA	NA	NA
3	13	3	0	NA	NA	NA	NA	NA	NA	NA
3	13	4	0	NA	NA	NA	NA	NA	NA	NA
3	13	5	0	NA	NA	NA	NA	NA	NA	NA
3	13	6	0	NA	NA	NA	NA	NA	NA	NA
3	13	7	0	NA	NA	NA	NA	NA	NA	NA
3	13	8	0	NA	NA	NA	NA	NA	NA	NA
3	13	9	0	NA	NA	NA	NA	NA	NA	NA
3	13	10	0	NA	NA	NA	NA	NA	NA	NA
3	13	11	0	NA	NA	NA	NA	NA	NA	NA
3	13	12	0	NA	NA	NA	NA	NA	NA	NA
3	13	13	0	NA	NA	NA	NA	NA	NA	NA
3	13	14	0	NA	NA	NA	NA	NA	NA	NA
3	13	15	0	NA	NA	NA	NA	NA	NA	NA
3	14	1	0	NA	NA	NA	NA	NA	NA	NA
3	14	2	0	NA	NA	NA	NA	NA	NA	NA
3	14	3	0	NA	NA	NA	NA	NA	NA	NA
3	14	4	0	NA	NA	NA	NA	NA	NA	NA
3	14	5	0	NA	NA	NA	NA	NA	NA	NA
3	14	6	0	NA	NA	NA	NA	NA	NA	NA
3	14	7	0	NA	NA	NA	NA	NA	NA	NA
3	14	8	0	NA	NA	NA	NA	NA	NA	NA
3	14	9	0	NA	NA	NA	NA	NA	NA	NA
3	14	10	0	NA	NA	NA	NA	NA	NA	NA
3	14	11	0	NA	NA	NA	NA	NA	NA	NA
3	14	12	0	NA	NA	NA	NA	NA	NA	NA
3	14	13	0	NA	NA	NA	NA	NA	NA	NA
3	14	14	0	NA	NA	NA	NA	NA	NA	NA
3	14	15	0	NA	NA	NA	NA	NA	NA	NA
3	15	1	0	NA	NA	NA	NA	NA	NA	NA
3	15	2	0	NA	NA	NA	NA	NA	NA	NA
3	15	3	0	NA	NA	NA	NA	NA	NA	NA
3	15	4	0	NA	NA	NA	NA	NA	NA	NA
3	15	5	0	NA	NA	NA	NA	NA	NA	NA
3	15	6	0	NA	NA	NA	NA	NA	NA	NA
3	15	7	0	NA	NA	NA	NA	NA	NA	NA
3	15	8	0	NA	NA	NA	NA	NA	NA	NA
3	15	9	0	NA	NA	NA	NA	NA	NA	NA
3	15	10	0	NA	NA	NA	NA	NA	NA	NA
3	15	11	0	NA	NA	NA	NA	NA	NA	NA
3	15	12	0	NA	NA	NA	NA	NA	NA	NA
3	15	13	0	NA	NA	NA	NA	NA	NA	NA
3	15	14	0	NA	NA	NA	NA	NA	NA	NA
3	15	15	0	NA	NA	NA	NA	NA	NA	NA
3	16	1	0	NA	NA	NA	NA	NA	NA	NA
3	16	2	0	NA	NA	NA	NA	NA	NA	NA
3	16	3	0	NA	NA	NA	NA	NA	NA	NA
3	16	4	0	NA	NA	NA	NA	NA	NA	NA
3	16	5	0	NA	NA	NA	NA	NA	NA	NA
3	16	6	0	NA	NA	NA	NA	NA	NA	NA
3	16	7	0	NA	NA	NA	NA	NA	NA	NA
3	16	8	0	NA	NA	NA	NA	NA	NA	NA
3	16	9	0	NA	NA	NA	NA	NA	NA	NA
3	16	10	0	NA	NA	NA	NA	NA	NA	NA
3	16	11	0	NA	NA	NA	NA	NA	NA	NA
3	16	12	0	NA	NA	NA	NA	NA	NA	NA
3	16	13	0	NA	NA	NA	NA	NA	NA	NA
3	16	14	0	NA	NA	NA	NA	NA	NA	NA
3	16	15	0	NA	NA	NA	NA	NA	NA	NA
3	17	1	3	0.05	0.96	0.09	1.73	1.63	2.88	-0.92
3	17	2	3	0.03	0.76	-0.78	1.73	1.63	2.88	-0.69
3	17	3	2	0.02	0.83	-0.78	1.73	1.63	2.88	-1.30
3	17	4	2	0.07	1.76	-0.35	0.87	0.06	2.88	-1.08
3	17	5	2	0.06	2.32	-0.78	-0.06	1.63	2.88	-1.37
3	17	6	0	NA	NA	NA	NA	NA	NA	NA
3	17	7	0	NA	NA	NA	NA	NA	NA	NA
3	17	8	0	NA	NA	NA	NA	NA	NA	NA
3	17	9	0	NA	NA	NA	NA	NA	NA	NA
3	17	10	0	NA	NA	NA	NA	NA	NA	NA
3	17	11	0	NA	NA	NA	NA	NA	NA	NA
3	17	12	0	NA	NA	NA	NA	NA	NA	NA
3	17	13	0	NA	NA	NA	NA	NA	NA	NA
3	17	14	0	NA	NA	NA	NA	NA	NA	NA
3	17	15	0	NA	NA	NA	NA	NA	NA	NA
3	18	1	7	0.09	1.31	0.09	-0.08	0.06	2.88	0.85
3	18	2	5	0.08	1.09	0.92	-0.73	-0.72	2.88	0.83
3	18	3	7	0.11	1.31	0.09	-0.25	-0.72	2.88	0.73
3	18	4	0	NA	NA	NA	NA	NA	NA	NA
3	18	5	0	NA	NA	NA	NA	NA	NA	NA

Let’s check the compression summary to check how many cells in each level are above the quantization error threshold.

compressionSummaryTable(hvt.results3[[3]]$compression_summary)

segmentLevel	noOfCells	noOfCellsBelowQuantizationError	percentOfCellsBelowQuantizationErrorThreshold
1	15	0	0
2	134	118	0.88
3	50	50	1

As it can be seen from the compression summary table above, the Quantization Error for most of the cells in level 3 fall below the defined quantization threshold. Hence, we were succesfully able to compress 89% of the data.

muHVT::hvtHmap(
  hvt.results3,
  trainComputers,
  child.level = 3,
  hmap.cols = "Quant.Error",
  line.width = c(1.2, 0.8, 0.4),
  color.vec = c("#141B41", "#6369D1", "#D8D2E1"),
  palette.color = 6,
  show.points = T,
  centroid.size = 1,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 7: The Voronoi tessellation with the heat map overlaid for variable ’quant_error’ in the ’computers’ dataset

3.3 Overlay Heatmap

Now we will try to get more insights from the cells by overlaying heatmap for variable price at different levels.

Let’s do it for level one.

In the plot below, a heatmap for the variable price is overlayed on a level one tesselation plot. We calculate the mean price for each cell and represent it as a heatmap.

The heatmap for the price variable for different cells at level 1 can be seen in the plot below.

muHVT::hvtHmap(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "price",
  line.width = c(0.8),
  color.vec = c("#141B41"),
  palette.color = 6,
  show.points = T,
  centroid.size = 3,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 8: The Voronoi Tessellation with the heat map overlaid for variable ’price’ at level 1 from ’computers’ dataset

Now we will go one level deeper and overlay heatmap for price at level 2. This should give us better insight about the price distribution for different cells.

In the plot below, we have overlayed heatmap for the variable price at level 2.

muHVT::hvtHmap(
  hvt.results2,
  trainComputers,
  child.level = 2,
  hmap.cols = "price",
  line.width = c(0.8, 0.2),
  color.vec = c("#141B41", "#0582CA"),
  palette.color = 6,
  show.points = T,
  centroid.size = 2,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 9: The Voronoi tessellation with the heat map overlaid for the variable ’price’ at level 2 from the ’computer’ dataset

Let us go one level deeper and overlay heatmap for price at level 3.

In the plot below, we have overlayed heatmap for variable price on level 3.

muHVT::hvtHmap(
  hvt.results3,
  trainComputers,
  child.level = 3,
  hmap.cols = "price",
  line.width = c(1.2, 0.8, 0.4),
  color.vec = c("#141B41", "#6369D1", "#D8D2E1"),
  palette.color = 6,
  show.points = T,
  centroid.size = 1,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 10: The Voronoi tessellation with the heat map overlaid for the variable ’price’ at level 3 from the ’computer’ dataset

Let’s repeat the steps above for the speed variable

The heatmap for speed variable for different cells at level 1 can be seen in the plot below.

muHVT::hvtHmap(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "speed",
  line.width = c(0.8),
  color.vec = c("#141B41"),
  palette.color = 6,
  show.points = T,
  centroid.size = 3,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 11: The Voronoi Tessellation with the heat map overlaid for variable ’speed’ at level 1 from ’computers’ dataset

Now we will go one level deeper and overlay heatmap for speed at level 2

muHVT::hvtHmap(
  hvt.results2,
  trainComputers,
  child.level = 2,
  hmap.cols = "speed",
  line.width = c(0.8, 0.2),
  color.vec = c("#141B41", "#0582CA"),
  palette.color = 6,
  show.points = T,
  centroid.size = 2,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 12: The Voronoi Tessellation with the heat map overlaid for the variable ’speed’ at level 2 from the ’computer’ dataset

Let us go one level deeper and overlay heatmap for speed at level 3.

In the plot below, we have overlayed heatmap for variable speed on level 3.

muHVT::hvtHmap(
  hvt.results3,
  trainComputers,
  child.level = 3,
  hmap.cols = "speed",
  line.width = c(1.2, 0.8, 0.4),
  color.vec = c("#141B41", "#6369D1", "#D8D2E1"),
  palette.color = 6,
  show.points = T,
  centroid.size = 1,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 13: The Voronoi Tessellation with the heat map overlaid for the variable ’speed’ at level 3 from the ’computer’ dataset

3.4 Predict

Now once we have built the model, let us try to predict using our test dataset which cell and which level each point belongs to.

muHVT::predictHVT(data,
                  hvt.results,
                  hmap.cols = NULL,
                  child.level = 1,
                  ...)

The important parameters for the function predictHVT are as below

data - A dataframe containing the test dataset. The dataframe should have atleast one variable used for training. The variables from this dataset can also be used to overlay as heatmap
hvt.results - A list of hvt.results obtained from the HVT function while performing hierarchical vector quantization on training data
hmap.cols - The column number of column name from the dataset indicating the variables for which the heat map is to be plotted. A heatmap won’t be plotted if NULL is passed (Default = NULL)
child.level - A number indicating the level for which the heat map is to be plotted (Only used if hmap.cols is not NULL)
... - color.vec and line.width can be passed from here

set.seed(240)
predictions <- muHVT::predictHVT(
  testComputers,
  hvt.results3,
  hmap.cols = "Quant.Error",
  child.level = 3,
  line.width = c(1.2, 0.8, 0.4),
  color.vec = c("#141B41", "#6369D1", "#D8D2E1"),
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

3.4.1 Prediction Algorithm

The prediction algorithm recursively calculates the distance between each point in the test dataset and the cell centroids for each level. The following steps explain the prediction method for a single point in the test dataset :

Calculate the distance between the point and the centroid of all the cells in the first level
Find the cell whose centroid has minimum distance to the point
Check if the cell drills down further to form more cells
If it doesn’t, return the path. Or else repeat steps 1 to 4 till we reach a level at which the cell doesn’t drill down further

Let’s see which cell and level each point belongs to. For the sake of brevity, we will only show the first 10 rows

Table(predictions$predictions,scroll = T,limit = 10)

price	speed	hd	ram	screen	ads	Cell_path	Segment.Level	Segment.Parent	Segment.Child
1540	33	214	4	15	191	2->9->2	2	9	2
3094	50	1000	24	15	191	2->6->7	2	6	7
1794	50	214	4	14	191	2->5->5	2	5	5
2408	100	270	4	14	191	2->3->2	2	3	2
2454	66	720	16	15	191	2->11->13	2	11	13
1969	66	1000	8	14	191	2->8->5	2	8	5
2904	50	1000	24	15	191	2->6->7	2	6	7
1545	66	340	8	14	191	2->8->4	2	8	4
1718	66	340	4	14	191	2->5->1	2	5	1
1604	33	214	4	14	191	2->9->3	2	9	3

We can see the predictions for some of the points in the table above. The variable cell_path shows us the level and the cell that each point is mapped to. The centroid of the cell that the point is mapped to is the codeword (predictor) for that cell.

The prediction algorithm will not work if some of the variables used to perform quantization are missing. Let’s try it out. In the test dataset, we should not remove any features.

set.seed(240)
# testComputers <- testComputers %>% dplyr::select(-c(screen,ads))
predictions <- muHVT::predictHVT(
  testComputers,
  hvt.results3,
  hmap.cols = "Quant.Error",
  child.level = 3,
  line.width = c(1.2, 0.8, 0.4),
  color.vec = c("#141B41", "#6369D1", "#D8D2E1"),
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

predictions[["predictPlot"]]


Table(predictions$predictions, scroll = T, limit = 10)

price	speed	hd	ram	screen	ads	Cell_path	Segment.Level	Segment.Parent	Segment.Child
1540	33	214	4	15	191	2->9->2	2	9	2
3094	50	1000	24	15	191	2->6->7	2	6	7
1794	50	214	4	14	191	2->5->5	2	5	5
2408	100	270	4	14	191	2->3->2	2	3	2
2454	66	720	16	15	191	2->11->13	2	11	13
1969	66	1000	8	14	191	2->8->5	2	8	5
2904	50	1000	24	15	191	2->6->7	2	6	7
1545	66	340	8	14	191	2->8->4	2	8	4
1718	66	340	4	14	191	2->5->1	2	5	1
1604	33	214	4	14	191	2->9->3	2	9	3

3.4.2 Example Usage 2

In this section, we will see how we can use the package to visualize mutlidimensional data by projecting them to two dimensions using Sammon’s projection.

3.4.2.1 Torus (Donut)

First of all, let us see how to generate data for torus. We are using a library geozoo for this purpose. Geo Zoo (stands for Geometric Zoo) is a compilation of geometric objects ranging from three to 10 dimensions. Geo Zoo contains regular or well-known objects, eg cube and sphere, and some abstract objects, e.g. Boy’s surface, Torus and Hyper-Torus.

Here we will generate a 3D torus with 1000 points.

set.seed(240)
# Here p reprensents dimension of object
# n reperesents number of points
torus <- geozoo::torus(p = 3,n = 1000)
torus_df <- data.frame(torus$points)
colnames(torus_df) <- c("x","y","z")

Now let’s do some EDA on the data. First of all, we will see what the data looks like

Table(head(torus_df))

x	y	z
-2.628238	0.5655770	-0.7253285
-1.417917	-0.8902793	0.9454533
-1.030820	1.1066495	-0.8730506
1.884711	0.1894905	0.9943888
-1.950608	-2.2506838	0.2070521
-1.482371	0.9228529	0.9672467

Now let’s have a look at summary and structure of the data.

str(torus_df)
#> 'data.frame':    1000 obs. of  3 variables:
#>  $ x: num  -2.63 -1.42 -1.03 1.88 -1.95 ...
#>  $ y: num  0.566 -0.89 1.107 0.189 -2.251 ...
#>  $ z: num  -0.725 0.945 -0.873 0.994 0.207 ...

summary(torus_df)
#>        x                   y                  z           
#>  Min.   :-2.987623   Min.   :-2.94860   Min.   :-0.99999  
#>  1st Qu.:-1.183928   1st Qu.:-1.07173   1st Qu.:-0.71059  
#>  Median :-0.058107   Median : 0.10131   Median : 0.05833  
#>  Mean   :-0.004517   Mean   : 0.05757   Mean   : 0.02782  
#>  3rd Qu.: 1.131752   3rd Qu.: 1.12652   3rd Qu.: 0.74747  
#>  Max.   : 2.996720   Max.   : 2.98584   Max.   : 1.00000

Now let’s try to visualize the object in a 3D Space.

#plot_torus <- plotly::plot_ly(torus_df, x= ~x, y= ~y, z = ~z, color = ~z) %>% add_markers()
#plot_torus
knitr::include_graphics('torus.png')

Figure 14: 3D Torus

Now let’s try to use the package and project the above 3D object in a 2D Space. We will start with number of cells as 100.

set.seed(240)
hvt.torus <- muHVT::HVT(
  torus_df,
  nclust = 100,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = T,
  distance_metric = "L1_Norm",
  error_metric = "mean"
)

muHVT::plotHVT(
  hvt.torus,
  line.width = c(0.8),
  color.vec = c("#141B41"),
  centroid.size = 1,
  maxDepth = 1
)

Figure 15: The Voronoi tessellation for level 1 shown for the 100 cells in the dataset ’torus’

Let’s checkout the compression summary for torus.

compressionSummaryTable(hvt.torus[[3]]$compression_summary)

segmentLevel	noOfCells	noOfCellsBelowQuantizationError	percentOfCellsBelowQuantizationErrorThreshold
1	100	77	0.77

As it can be seen in the table above, none of the 100 cells hit the quantization threshold error.

Let’s overlay the heatmap for quantization error for level 1.

muHVT::hvtHmap(
  hvt.torus,
  torus_df,
  child.level = 1,
  hmap.cols = "Quant.Error",
  line.width = c(0.8),
  color.vec = c("#141B41"),
  palette.color = 6,
  show.points = T,
  centroid.size = 2,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 16: The Voronoi Tessellation for level 1 with the heat map overlaid for variable ’quant_error’ in the ’torus’ dataset

Now let’s double the number of cells to 200 and try again.

set.seed(240)
hvt.torus2 <- muHVT::HVT(
  torus_df,
  nclust = 200,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = T,
  distance_metric = "L1_Norm",
  error_metric = "mean"
)
muHVT::plotHVT(
  hvt.torus2,
  line.width = c(0.8),
  color.vec = c("#141B41"),
  centroid.size = 1,
  maxDepth = 1
)

Figure 17: The Voronoi tessellation for level 1 shown for the 200 cells in the dataset ’torus’

Let’s checkout the compression summary for torus.

compressionSummaryTable(hvt.torus2[[3]]$compression_summary)

segmentLevel	noOfCells	noOfCellsBelowQuantizationError	percentOfCellsBelowQuantizationErrorThreshold
1	200	194	0.97

It can be observed from the table above that only 24 cells out of 200 i.e. 12% of the cells hit the Quantization Error threshold.

Let’s understand this visually by overlaying the heatmap for Quantization Error at level2.

muHVT::hvtHmap(
  hvt.torus2,
  torus_df,
  child.level = 1,
  hmap.cols = "Quant.Error",
  line.width = c(0.8),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 2,
  show.points = T,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 18: The Voronoi tessellation for level 2 with the heat map overlaid for variable ’quant_error’ in the ’torus’ dataset

Let’s increase the number of cells to 500.

set.seed(240)
hvt.torus3 <- muHVT::HVT(
  torus_df,
  nclust = 500,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = T,
  distance_metric = "L1_Norm",
  error_metric = "mean"
)

muHVT::plotHVT(
  hvt.torus3,
  line.width = c(0.8),
  color.vec = c("#141B41"),
  centroid.size = 1,
  maxDepth = 1
)

Figure 19: The Voronoi tessellation for level 1 shown for the 500 cells in the dataset ’torus’

Let’s check the compression summary for torus.

compressionSummaryTable(hvt.torus3[[3]]$compression_summary)

segmentLevel	noOfCells	noOfCellsBelowQuantizationError	percentOfCellsBelowQuantizationErrorThreshold
1	500	500	1

By increasing the number of cells to 500, we were successfully able to compress 73% of the data.

Let’s checkout the heatmap for quantization error.

muHVT::hvtHmap(
  hvt.torus3,
  torus_df,
  child.level = 1,
  hmap.cols = "Quant.Error",
  line.width = c(0.4),
  color.vec = c("#141B41"),
  palette.color = 6,
  show.points = T,
  centroid.size = 2,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 20: The Voronoi tessellation with the heat map overlaid for variable ’quant_error’ in the ’torus’ dataset

Let’s use the hierarchical Vector Quantization technique and go one level deeper, keeping the number of cells as 20.

set.seed(240)
hvt.torus4 <- muHVT::HVT(
  torus_df,
  nclust = 200,
  depth = 2,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = T,
  distance_metric = "L1_Norm",
  error_metric = "mean"
)

muHVT::plotHVT(
  hvt.torus4,
  line.width = c(0.8, 0.3),
  color.vec = c("#141B41", "#0582CA"),
  centroid.size = 2,
  maxDepth = 2
)

Figure 21: The Voronoi tessellation for level 2 shown for the 400 cells in the dataset ’torus’

Let’s check the compression summary for torus.

compressionSummaryTable(hvt.torus4[[3]]$compression_summary)

segmentLevel	noOfCells	noOfCellsBelowQuantizationError	percentOfCellsBelowQuantizationErrorThreshold
1	200	194	0.97
2	15	15	1

From the above table, we can observe that we were able to compress 58% of the data using hierarchical Vector Quantization.

Let’s also observe the Quantization Error heatmap.

muHVT::hvtHmap(
  hvt.torus4,
  torus_df,
  child.level = 2,
  hmap.cols = "Quant.Error",
  line.width = c(0.8, 0.4),
  color.vec = c("#141B41", "#6369D1"),
  palette.color = 6,
  show.points = T,
  centroid.size = 1.5,
  quant.error.hmap = 0.2,
  nclust.hmap = 15
)

Figure 22: The Voronoi tessellation with the heat map overlaid for variable ’quant_error’ in the ’torus’ dataset

A similar process can be followed for sphere. The code for the same can be found in the Appendix below.

muHVT: Collection of functions used for vector quantization and construction of hierarchical Voronoi Tessellations for data analysis in R

Sangeet Moy Das, Zubin Dowlaty, Meet Dave, Avinash Joshi

2020-08-04

1 Abstract

2 Vector Quantization

2.1 Hierarchical VQ using k-means

2.1.1 k-means

2.1.2 Hierarchical VQ using k-means

2.1.3 Quantization Error

3 Voronoi Tessellations

3.1 Sammon’s projection

3.2 Constructing Voronoi Tesselations

3.2.1 Example Usage 1

3.3 Overlay Heatmap

3.4 Predict

3.4.1 Prediction Algorithm

3.4.2 Example Usage 2

3.4.2.1 Torus (Donut)

4 Applications

5 References

6 Appendix

6.1 Constructing 3D Sphere