Life Data Analysis Part III - Mixture Models

Segmented Regression and EM-Algorithm

Tim-Gunnar Hensel

2019-01-29

In this vignette two methods for the separation of mixture models are presented. A mixture model can be assumed, if the points in a probability plot show one or more changes in slope, depict one or several saddle points or follow an S-shape. A mixed distribution often represents the combination of multiple failure modes and thus must be splitted in its components to get reasonable results in further analyses.

Segmented regression aims to detect breakpoints in the sample data from whom a split in subgroups can be made. The EM-Algorithm is a computation-intensive method that iteratively tries to maximize a likelihood function, which is weighted by the posterior probability, the conditional probability that an observation belongs to subgroup k.

In the following we will focus on the application of these methods and their visualizations using functions mixmod_regression(), mixmod_em(), plot_prob_mix() and plot_mod_mix(), which are implemented in weibulltools.

Data: Voltage Stress Test

To apply the introduced methods we will use a dataset where units were passed to a high voltage stress test. hours indicates the number of hours until a failure occurs, or the number of hours until a unit was taken out of the test and has not failed. state is a flag variable and describes the condition of a unit. If a unit failed the flag is 1 and 0 otherwise. Data was taken from Reliability Analysis by Failure Mode 1.

Probability Plot for Voltage Stress Test Data

To get an intuition whether we can assume the presence of a mixture model, we will construct a Weibull probability plot.

# Data: 
hours <- c(2, 28, 67, 119, 179, 236, 282, 317, 348, 387, 3, 31, 69, 135,
           191, 241, 284, 318, 348, 392, 5, 31, 76, 144, 203, 257, 286,
           320, 350, 412, 8, 52, 78, 157, 211, 261, 298, 327, 360, 446,
           13, 53, 104, 160, 221, 264, 303, 328, 369, 21, 64, 113, 168,
           226, 278, 314, 328, 377)

state <- c(1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,
           1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0,
           1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
           0, 1, 1, 1, 1, 1, 1)

id <- 1:length(hours)

# Estimating failure probabilities: 
df_john <- johnson_method(id = id, x = hours, event = state)

# Probability plot: 
weibull_plot <- plot_prob(x = df_john$characteristic, y = df_john$prob, 
                          event = df_john$status, id = df_john$id, 
                          distribution = "weibull", 
                          title_main = "Weibull Probability Plot", 
                          title_x = "Time in Hours", 
                          title_y = "Probability of Failure in %",
                          title_trace = "Defect Items")
weibull_plot

Figure 1: Plotting positions in weibull grid.


Since there is an obvious slope change in the Weibull probability plot of Figure 1, the appearance of a mixture model is strengthened.

Segmented Regression with Package weibulltools

In package weibulltools the method of segmented regression is implemented in function mixmod_regression(). If a breakpoint was detected, the failure data is separated by that point. After breakpoint detection the function rank_regression() is called inside mixmod_regression() and is used to estimate the distribution parameters of the subgroups.
The visualization of the obtained results is done by functions plot_prob_mix() and plot_mod_mix().
The produced graph of plot_prob_mix() is pretty similar to the graph provided by plot_prob(), but the difference is, that the detected subgroups are colored differently.
plot_mod_mix() then is used to add the estimated regression line of every sub- distribution.
In the following the described procedure is expressed with code.

# Applying mixmod_regression(): 
mixreg_weib <- mixmod_regression(x = df_john$characteristic, y = df_john$prob, 
  event = df_john$status, distribution = "weibull")

# Using plot_prob_mix(). 
mix_reg_plot <- plot_prob_mix(x = hours, event = state, id = id, 
  distribution = "weibull", mix_output = mixreg_weib, 
  title_main = "Weibull Mixture Regression", title_x = "Time in Hours", 
  title_y = "Probability of Failure", title_trace = "Subgroup")
mix_reg_plot

Figure 2: Subgroup-specific plotting positions.

# Using plot_mod_mix() to visualize regression lines of subgroups: 
mix_reg_lines <- plot_mod_mix(mix_reg_plot, x = hours, event = state, 
  mix_output = mixreg_weib, distribution = "weibull", title_trace = "Fitted Line")
mix_reg_lines

Figure 3: Subgroup-specific regression lines.


Without specifying the number of mixed components (k) this method has splitted the data in two groups. This can bee seen in Figure 2 and Figure 3.
To sum up, an upside of this function is that one does not have to specify the number of mixing components, since segmentation happens in an automated fashion. Nevertheless the intention of this function is to give a hint for the existence of a mixture model. An in-depth analysis should be done afterwards.

EM-Algorithm with Package weibulltools

The EM-Algorithm can be applied through the usage of the function mixmod_em(). In comparison to mixmod_regression() one has to specify k, the number of subgroups.
The obtained results can be visualized by functions plot_prob_mix() and plot_mod_mix(), too.

# Applying mixmod_regression(): 
mixem_weib <- mixmod_em(x = hours, event = state, distribution = "weibull",
                        conf_level = 0.95, k = 2, method = "EM", n_iter = 150)

# Using plot_prob_mix(): 
mix_em_plot <- plot_prob_mix(x = hours, event = state, id = id, 
  distribution = "weibull", mix_output = mixem_weib, 
  title_main = "Weibull Mixture EM", title_x = "Time in Hours", 
  title_y = "Probability of Failure", title_trace = "Subgroup")
mix_em_plot

Figure 4: Subgroup-specific plotting positions.


# Using plot_mod_mix() to visualize regression lines of subgroups: 
mix_em_lines <- plot_mod_mix(mix_em_plot, x = hours, event = state, 
  mix_output = mixem_weib, distribution = "weibull", title_trace = "Fitted Line")
mix_em_lines

Figure 5: Subgroup-specific regression lines.


In comparison to mixmod_regression() the EM-Algorithm can also assign censored items to a specific subgroup. Hence, an individual analysis of the mixing components, depicted in Figure 4 and Figure 5, is possible.
In conclusion an analysis of a mixture model using mixmod_em() is statistically founded. A drawback of this function is, that the identification of the number of subgroups can not be determined automatically.


  1. Doganaksoy, N.; Hahn, G.; Meeker, W. Q.: Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002