Knowledge Center

Latent Class vs Individual Choice Estimation

by Marco Hoogerbrugge – Research Director SKIM Analytical

1. Latent Class – cluster analysis based methodology

Next to calculating clusters of respondents, Latent Class can also estimate individual utilities. Take the following example, 16 respondents are displayed with their actual but unknown individual utility values. Latent Class has classified 14 respondents into 2 clusters, and has calculated the mean utility values of each cluster.


cluster graphic A

Respondent A is just in the middle of the two clusters. This information is also provided by Latent Class that tells that respondent A belongs to cluster 1 with a probability of 50% and to cluster 2 with a probability of 50%. This makes it possible to calculate the individual utilities of respondent A: it is 50% of the utility values of cluster 1 plus 50% of the utility values of cluster 2. And that is correct as we see in the figure.

This calculation is very simple but it does not always work that easily. Let's look at respondents B and C. They have different actual individual utility values (B is more extreme in choice behaviour). However, Latent Class will assign both respondents to cluster 2 with a probability of 100%, just because they are both far away from the other cluster centre. By using the above calculation method, both respondents will get the same individual utility values, which in fact are not their true utility values.

2. Individual Choice Estimation – factor analysis based methodology

The coordinates of the respondents (their true utility values) in the next figure are the same as in the previous figure.


cluster graphic B

ICE is a factor analysis based approach, i.e. all respondents will have a certain score (or weight) on a number of vectors. ICE calculates vectors in such a way that it will show the main differences between respondents. In the example here only one vector is calculated (the diagonal with coordinates [1,1]). Assuming that the scores range from –1 (lower left corner) to +1 (upper left corner), we can check that respondent A has a score (or weight as called in ICE) of –0.2, respondent C +0.4 and respondent B +0.6. So the information provided by ICE indicates indeed that respondent B and C are close relative to respondent A (as Latent Class does as well) but ICE also gives the indication that respondent B is more extreme than respondent C.
The final utility values will be calculated by multiplying the individual weights with each vector. In the example it will result in utility values of [-0.2, -0.2] for respondent A, etc.
3. Final remarks

Both methods are a simplification from reality: in Latent Class different respondents may be regarded as the same, while in ICE the respondents are forced to lie on certain vectors. Neither of the two is exact reality. But that is something that follows directly from the fact that in CBC you gather too little information per respondent to make exact calculations.

The differences between Latent Class and ICE are in more respects the same as between normal cluster analysis and normal factor analysis: N factors (or dimensions as they are called in ICE) provide about the same information as 2N clusters. So, when you usually try 3-10 clusters, it is similar to extracting 2-4 dimensions in ICE. More dimensions may result in overfitting, especially when there is little information available per respondent (10 or less choice tasks).

 


Applied methodologies
Technical papers

#03
International Market Research Agency
#03 "The overall performance of SKIM is great, they met our objectives and we were impressed that SKIM could deliver the data so fast"