What sort of data can I use in cluster analysis?


Care needs to be taken with the selection of consumer data that you intend to use in the cluster analysis process to form market segments. There are two aspects to this:

  1. firstly the structure/scale of the data and
  2. secondly its relevance to consumers and their behavior.

Scale of the data for cluster analysis

As it is just a statistical process, cluster analysis attempts to group the data that is provided on the basis of Euclidean distance between the points. Through its calculations it tries to find segment/groups that minimize this distance (or SSE). Therefore, it is important that the data provided has some logical order to it.

It is very important to note that you cannot use a nominal data scale. An example of a nominal scale is gender – where male = 1 and female = 2. Cluster analysis cannot make sense of the distance. It is not something that can be averaged – we cannot have a market segment that is 1.4 male for instance.

You need to use data that it is in some form of order – usually in an ordinal scale or an interval scale. An example of an ordinal scale could be age group – where 18-24 = 1, 25-34 = 2, and so on, perhaps up to 75+ = 7. When we use an ordered scale, then it makes more sense to the cluster analysis process – we know that seven is the oldest age group, or we know that five is a little bit older than four and a lot of older than one. An example of an interval scale would be a one-nine scale to measure customer satisfaction, where there are perceived equal distances between each data point.

I would recommend against using real numbers in your calculations. An example here could be income levels. Rather than using a respondent’s actual income level, simply group them into an ordinal scale (like the age group example above) as they would be more helpful for the cluster analysis calculation.

Therefore, generally you should try to use data that is in some form of scale – such as 1-5 or 1-9 or something similar.

For more information on scales, please refer to this external reference for a basic discussion of data scales.

Type of consumer data to use

Keep in mind that the end result is to form market segments that are useful in developing marketing strategy. Therefore, various psychographic, attitudinal, behavioral descriptions of the consumers would probably yield the best results. This is because we want to create a good understanding of how consumers think and act – we are less interested in their demographics (but we should include demographics in the end segment profile).

Some marketing variables that you could use that would be likely to yield some interesting market segments include:

  • Level of customer satisfaction
  • Brand awareness levels
  • Loyalty – switching behavior
  • Various attitudes to brand/s (you could incorporate results from an image survey)
  • Various attitudes (agree/disagree) to general life issues (measuring their values)
  • Degree of high/low purchase involvement in decisions
  • Heavy – light usage levels
  • Recency and frequency of purchase
  • Importance of price in the purchase decision
  • Importance of advertising to the consumer (degree of influence)
  • Level of media consumption
  • Use of opinion leaders and word-of-mouth for information

Remember to ensure that these marketing variables are at least scaled on an orderly (ordinal) basis – as discussed above. The combination of these factors, when used with cluster analysis, should yield some interesting market segment possibilities.

More information on consumer descriptors and segmentation bases.

For further information, below is an excerpt from the Market Segmentation Study Guide that outlines the main choice of segmentation bases. If you continue reading it will take you to that site and you will need to hit the back button on your browser to return to this page.

Choice of Consumer Segmentation Bases