Truncated thoughts: Benford's Law and Package Sizes

Saturday, November 21, 2015

Benford's Law and Package Sizes

I'm putting this up quickly because Andrew Gelman has a nice blog post about Benford's Law.

My intent has been to write this up for formally, but that's been my intention for a while. I generated this data in 2008. So it's obviously not tops on my bucket list.

The question here is whether the distribution of package sizes (for packages you'd find in a grocery store) follow Benford's Law.

What is Benford's Law?

Wikipedia has a nice entry that I'll quote from to provide a bit of background:

Benford's law, also called the First-Digit Law, is a phenomenological law about the frequency distribution of leading digits in many (but not all) real-life sets of numerical data. That law states that in many naturally occurring collections of numbers the small digits occur disproportionately often as leading significant digits.^[1] For example, in sets which obey the law the number 1 would appear as the most significant digit about 30% of the time, while larger digits would occur in that position less frequently: 9 would appear less than 5% of the time. If all digits were distributed uniformly, they would each occur about 11.1% of the time.^[2] Benford's law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.
It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants,

and a nice graph of the expected distribution:

Does this fit products you'd find in a grocery store?

Products come in different sizes, and often product sizes are standardized for either historical or logistical reasons. For example, you'd typically find bottles of beer in a 6 pack, 12 pack, or 24 pack. No 7 packs!

This makes it unlikely that individual categories follow Benford's Law, and in fact the individual categories do not show much of a fit. (the Benford's Law expectation is the thick blue line).

But when we combine the results across categories, we do get a pretty good fit. The Unweighted Average counts each product category equally (so the fact that there are a lot more beers than razors doesn't affect the results). The Grand Total just looks across products, so beer counts a lot more than razors. The total number products tabulated is 134,484 (387 for razors, 10,417 for beer).

The fit is clearly not perfect: there are two many 5s and too few 2s for example. The 5's are easy to see: where the volume equivalent is pounds, an 8 ounce or 9 ounce package will have a first digit of 5. A 1, 10, or 11 ounce package will have a first digit of 6.

How this was done:

To look at this issue, we used the product data from the IRI Marketing Data Set [1]. This contains data from 31 large consumer product goods categories. IRI keeps standardized size data, called a volume equivalent for the category. This might be 1 pound, 288 ounces, or some other unit appropriate to the category. For example, the volume equivalent for beer is 288 ounces, so a 6 pack of 12 ounce bottles (72 ounces) would have a volume equivalent of .25, and a first digit of 2.

This may be different from the various measurements that might be found on the package, which might be 18 ounces, 1 pound 2 ounces, 4 packages, 8 servings, 510.3 grams, or all of the above.

We excluded deodorant and toothbrushes because the volume equivalent is 1 package, so 100% show up as 1. We excluded cigarettes for similar reasons (the volume equivalent is 1 carton, so 1 pack is .1 of a carton; 93% of the items are either cartons or packs.

We are not weighting by sales volume, so unusual sizes that may have been sold only briefly in a few stores are weighted just as heavily as the standard items in the category.

In the next post on Benford's Law, we will look at whether the first digit of sales (in units and dollars) follows Benford's Law.

[1] Bronnenberg, B. J., Kruger, M. W., & Mela, C. F. (2008). Database Paper—The IRI Marketing Data Set. Marketing Science, 27(4), 745–748. http://doi.org/10.1287/mksc.1080.0450

Truncated thoughts

Why not subscribe?

Saturday, November 21, 2015

Benford's Law and Package Sizes

What is Benford's Law?

Does this fit products you'd find in a grocery store?

How this was done:

No comments:

Post a Comment

Labels

Blog Archive