XLSTAT - Latent Class is a powerful tool that uses Latent Classes. It is based on two modules from Latent GOLD® 5.0: Latent Class Cluster models and Latent Class Regression models. Both model families offer unique features compared to traditional clustering or regression approaches. XLSTAT - LG offers a wide variety of easily implementable options that allow the user to gain full control over the Latent Class models.

**XLSTAT - Base is needed to run XLSTAT - LatentClass!**

## XLSTAT-LatentClass - Latent Class Clustering and Regression

**XLSTAT's Productline was completely overhauled! More information in our News-Section, click here.**

Latent class analysis involves the construction of Latent Classes which are unobserved (latent) subgroups or segments of cases. The latent classes are constructed based on the observed (manifest) responses of the cases on a set of indicator variables. Cases within the same latent class are homogeneous with respect to their responses on these indicators, while cases in different latent classes differ in their response patterns. Formally, latent classes are represented by K distinct categories of a nominal latent variable X.. Since the latent variable is categorical, Latent Class modeling differs from more traditional latent variable approaches such as factor analysis, structural equation models, and random-effects regression models since these approaches are based on continuous latent variables.

**For this module you will need at least the XLSTAT-Base license!**

### Latent Class Cluster Models

A Latent Class cluster model:

- Includes a nominal latent variable X with K categories, each category representing a cluster.
- ach cluster contains a homogeneous group of persons (cases) who share common interests, values, characteristics, and/or behavior (i.e., share common model parameters).
- These interest, values, characteristics, and/or behavior constitute the observed variables (indicators) Y upon which the latent clusters are derived.

XLSTAT-LatentClass allows lauching computations automatically on different models according to different number of classes. It is also possible to optimize Bayes constants, sets of random starting values, as well iteration parameters for both the Expectation-Maximization and Newton-Raphson algorithms, which are used for model estimation.

### Further Information

**XLSTAT-LatentClass on the Addinsoft website****Important: For this module you will need at least the XLSTAT-Base license!**

## Systemrequirements for the software XLSTAT

Windows® | Mac | |

Further Requirements | Microsoft Excel 2003, 2007, 2010, 2013, 2016requires a XLSTAT-Solution! |
Microsoft Excel 2011requires a XLSTAT-Solution! |

Operating System | Windows XP, Vista, 7, 8, 10 (32-/64-Bit) | OS X |

Minimum CPU | 800 MHz | 800 MHz |

Min. RAM | 128 MB | 128 MB |

Disk Space | 150 MB | 150 MB |

## Functions in XLSTAT - LG

XLSTAT - LG provides one section per model (each model being represented by a specific number of classes):

**Model Summary Statistics:** Number of cases used in model estimation, number of distinct parameters estimated, seed and best seed that can reproduce the current model more quickly using the number of starting sets = 0.

**Estimation Summary:** for each of the Expectation-Maximization and Newton-Raphson algorithms, XLSTAT reports the number of iterations used, the log-posterior value, the likelihood-ratio goodness-of-fit value, as well as the final convergence value.

### Chi-Square Statistics:

- Likelihood-ratio goodness-of-fit value (L
^{2}) for the current model and the associated bootstrap p-value - X
^{2}and Cressie-Read. These are alternatives to L^{2}that should yield a similar p-value according to large sample theory if the model specified is valid and the data is not sparse - BIC, AIC, AIC3 and CAIC and SABIC (based on L
^{2}). These statistics (information criteria) weight fit and parsimony by adjusting the LL to account for the number of parameters in the model. The lower the value, the better the model - Dissimilarity index: A descriptive measure indicating how much the observed and estimated cell frequencies differ from one another. It indicates the proportion of the sample that needs to be movedto another cell to get a perfect fit

### Log-likelihood Statistics:

- log-likelihood (LL), log-prior (associated to Bayes constants) as well as the log-posterior
- BIC, AIC, AIC3, CAIC and SABIC (based on LL). these statistics (information criteria) weight fit and parsimony by adjusting the LL to account for the number of parameters in the model. The lower the value, the better the model

### Classification Statistics:

- Classification errors (based on modal assignment)
- Reduction of errors (Lambda), entropy R
^{2}, standard R^{2}. These pseudo R-squared statistics indicate how well one can predict class memberships based on the observed variables (indicators and covariates). The closer these values are to 1 the better the predictions - Classification log-likelihood under the assumption that the true class membership is known
- AWE (similar to BIC, but also takes into account classification performance)
- Entropy
- CLC

### Classification Table:

- Modal table: Cross-tabulates modal class assignments
- Proportional table: Cross-tabulates probabilistic class assignments

### Prediction statistics table:

The columns in this table correspond to:

- prediction error of the baseline model (also referred to as null-model)
- Model: the prediction error of the estimated model
- R
^{2}: the proportional reduction of errors in the estimated model compared to the baseline model

The rows in this table correspond to:

- Squared Error: Average prediction error based on squared error
- Minus Log-likelihood: Average prediction error based on minus the log-likelihood
- Absolute Error: Average prediction error based on absolute error
- Prediction error: Average prediction error based on proportion of prediction errors (for categorical variables only)

**Prediction Table:** For nominal and ordinal dependent variables, a prediction table that cross-classifies observed and against estimated values is provided.

### Parameters table:

- R
^{2}: class-specific and overall R^{2}values. The overall R^{2}indicates how well the dependent variable is overall predicted by the model (same measure as appearing in Prediction Statistics). For ordinal, continuous, and (binomial) counts, these are standard R^{2}measures. For nominal dependent variables, these can be seen as weighted averages of separate R^{2}measures for each category treated as a separate dichotomous response variable. - Intercept: intercept of the linear regression equation
- s.e.: standard errors of the parameters
- z-value: z-test statistics corresponding to the parameter tests
- Wald: Wald statistics are provided in the output to assess the statistical significance of the set of parameter estimates associated with a given variable. Specifically, for each variable, the Wald statistic tests the restriction that each of the parameter estimates in that set equals zero (for variables specified as Nominal, the set includes parameters for each category of the variable). For Regression models, by default, two Wald statistics (Wald, Wald(=)) are provided in the table when more than 1 class has been estimated. For each set of parameter estimates, the Wald(=) statistic considers the subset associated with each class and tests the restriction that each parameter in that subset equals the corresponding parameter in the subsets associated with each of the other classes. That is, the Wald(=) statistic tests the equality of each set of regression effects across classes.
- p-value: measures of significance for the estimates
- Mean: means for the regression coefficients
- Std.Dev: standard deviations for the regression coefficients

**Classification:** Outputs for each observation the posterior class memberships and the modal assignment based on the current model.

### Classification Table:

- Modal table: Cross-tabulates modal class assignments
- Proportional table: Cross-tabulates probabilistic class assignments

### Profile table, which includes:

- Number of clusters
- Indiatorsc: The body of the table contains (marginal) conditional probabilities that show how the clusters are related to the Nominal or Ordinal indicator variables. These probabilities sum to 1. For indicators specified as Continuous, the body of the table contains means instead of probabilities. For indicators specified as Ordinal, means are displayed in addition to the conditional probabilities within each cluster (column).
- Standard errors for the (marginal) conditional probabilities

Probabilities and means that appear in the Profile Output, are displayed graphically in a Profile Plot.

### Frequencies / Residuals:

Table of observed vs. estimated expected frequencies (and residuals). Note: Residuals having magnitude greater than 2 are statistically significant. This output is not reported in the case of 1 or more continuous indicators.

**Bivariate Residuals:** a table containing the bivariate residuals (BVRs) for a model. Large BVRs suggest violation of the local independence assumption.

**Scoring equation:** regression coefficients associated with the multinomial logit model.

**Classification:** Outputs for each observation the posterior class memberships and the modal assignment based on the current model.