42  Generalisability: context

Recall the jam experiment we discussed previously.

On two Saturdays in a California supermarket, Sheena Iyengar and Mark Lepper (2000) set up tasting displays of either six or 24 jars of jam. Consumers could taste as many jams as they wished, and if they approached the tasting table they received a $1 discount coupon to buy the jam.

For attracting initial interest, the large display of 24 jams did a better job, with 60 per cent of people who passed the display stopping. Forty per cent of people stopped at the six jam display. But only three per cent of those who stopped at the 24 jam display purchased any of the jam, compared with almost 30 per cent who stopped at the six jam display.

This experiment has gained famed as showing the paradox of choice. But how much weight should we place on this one experiment? In his book Uncontrolled, Jim Manzi (2012) writes:

First, note that all of the inference is built on the purchase of a grand total of thirty-five jars of jam. Second, note that if the results of the jam experiment were valid and applicable with the kind of generality required to be relevant as the basis for economic or social policy, it would imply that many stores could eliminate 75 percent of their products and cause sales to increase by 900 percent. That would be a fairly astounding result—and indicates that there may be a problem with the measurement.

Measurement problems could easily arise because the experiment was done for a total of ten hours in only one store, and shoppers were grouped in hourly chunks. There could be all kinds of reasons that those people who happened to show up during the five hours of limited assortment could have systematically different propensity to respond to $ 1 off a specific line of jams than those who arrived in the other five-hour period: a soccer game finished at some specific time, and several of the parents who share similar propensities versus the average shopper came in nearly together; a bad traffic jam in one part of town with non-average propensity to respond to the coupon dissuaded several people from going to the store at one time versus another; etc. This is one reason retail experiments for such in-store promotional tactics are typically executed for twenty or thirty randomly assigned stores for a period of weeks.

Benjamin Scheibehenne and friends (2010) surveyed the broader literature on the choice overload hypothesis. In some cases, choice increased purchases. In others it reduced them. Scheibehenne and friends determined that the mean effect size of changing the number of choices across the studies was effectively zero.

From this result, Manzi continues:

First, individual experiments need to encompass as much variation in background conditions as is feasible. It is almost impossible to run an experiment in one store that can produce valid conclusions. Social science RFTs that are executed across several school districts, court systems, welfare offices, or whatever are a much more reliable guide to action than single-site experiments. Experiments also need to run long enough to encompass changing background conditions over time. The combination of more sites and more time creates many more observations, and therefore reliability.

Second, the ultimate test of the validity of causal conclusions derived from an experiment is the ability to predict the results of future tests. We need to build the kind of distribution of multiple experiments that were summarized for the impact of breadth of choice on sales and satisfaction in Scheibehenne’s meta-analysis. Such a distribution allows us to measure the scope (if any) of reliable prediction based on some sequence of experiments. In the case of the jam experiment, the researchers in the original experiment themselves were careful about their explicit claims of generalizability, and significant effort has been devoted to the exact question of finding conditions under which choice overload occurs consistently, but popularizers telescoped the conclusions derived from one coupon-plus-display promotion in one store on two Saturdays, up through assertions about the impact of product selection for jam for this store, to the impact of product selection for jam for all grocery stores in America, to claims about the impact of product selection for all retail products of any kind in every store, ultimately to fairly grandiose claims about the benefits of choice to society.