August 24, 2009 Posted by: Roy Marsten
I want to apply the discussion of entropy to the features of a configurable product. But first we have to introduce the important concept of a “take rate”. In different industries this is called an “attach rate”, or a “penetration rate”. The idea is very simple: the take rate of an option is the fraction of units sold that include that option.
The take rate of option x is the number of units sold with option x, divided by the total number of units sold. So if 70% of our cars are sold with cloth seats and 30% with leather seats, then cloth has a take rate of 0.7 and leather has a take rate of 0.3.
In the case of a feature with two options, like cloth and leather, this looks just like a coin toss with two options, tails and heads. Recall that coins may not be fair. If I send you a message about a customer’s choice of seat, the entropy of that message is the same as for the outcome of one toss of a suitably biased (.3 to .7) coin. So take rates can be interpreted as probabilities.
Some features have more than two options. For example a backhoe feature called Feet has four different options: none, Flip, Flip Guard, and Street Guard. Each of these options has a take rate, and as long as we include the “none” option, these take rates have to add up to 1.0. So perhaps 30% of customers do not order Feet, 40% order Flip, 20% order Flip Guard, and 10% order Street Guard. The take rates are 0.4, 0.3, 0.2, and 0.1, respectively, which add up to 1.0.
With four options we lose the connection to coin tosses. We could use a loaded die to talk about features with six options, but an all purpose metaphor is the roulette wheel. Think of a spinning roulette wheel, or a stationary wheel with a spinning arrow as in many children’s games.
The wheel represents a feature, and there is a pie-slice for each option. The size of the pie-slice is proportional to the take rate. An example is shown above for the Feet feature of our backhoe. We can simulate a customer’s choice by spinning this wheel (or spinning an arrow). With this metaphor we can have any number of options, with any take rates. The “none” choice must be included to get a full pie (or there may not be a “none” choice).
To summarize, a product is a collection of features. Each feature has some mutually exclusive options, each of which has a take rate. These take rates add to one.
August 21, 2009 Posted by: Roy Marsten
A product is a collection of features, and each feature has mutually exclusive options. If a feature has only two options, then the choice is like a coin toss. The information contained in that choice is measured by entropy.
Entropy is a concept from classical thermodynamics that deals with the amount of disorder in a physical system (see http://en.wikipedia.org/wiki/Entropy). It was extended to information theory by Claude Shannon (see http://en.wikipedia.org/wiki/Entropy_(information_theory)). Shannon used entropy as a measure of the amount of information in a message. The simplest example is a coin toss. If we toss a fair coin, there is a 50% chance of getting tails, and a 50% chance of getting heads. Shannon defined the outcome of this experiment as having an entropy, or information content, of one bit. If I send a message (say 0 or 1) to tell you the result (tail or head), that message contains one bit of information.
Things start to get interesting when the coin is not fair. Consider a two-headed coin. The tossing experiment always results in heads, and the message will always be 1. According to Shannon, the information content of this message is zero.
If the coin is weighted so that the probability of tails is 25% and the probability of heads is 75%, then Shannon assigns an entropy of 0.811278. There is some information in knowing the outcome of the coin toss, but not as much as for a fair coin, because we already know that it will probably be heads. The graph below shows the entropy as a function of the probability of getting heads. When this probability is zero or one, the entropy is zero. The entropy reaches its maximum of one when the coin is fair (50%).
Where did the 0.811278 come from? How is the entropy actually computed?

We can’t answer this without introducing logarithms to the base two. In English, two to the third power is eight, so three is the logarithm of eight to the base two. We can write “blog” to mean log to the base 2, or binary log. If p denotes the probability of heads, then entropy is computed by the formula:
Entropy = -p*blog(p) – (1-p)*blog(1-p).
Logarithms to the base 2 arise naturally because one coin toss (2 outcomes) has entropy one, two coin tosses (4 outcomes) has entropy two, three coin tosses (8 outcomes) has entropy three, and so forth.
Comments Off