Consumer Demand for Fair Trade: Evidence from a Multistore Field Experiment

Abstract We provide new evidence on consumer demand for ethical products from experiments conducted in a U.S. grocery store chain. We find that sales of the two most popular coffees rose by almost 10% when they carried a Fair Trade label as compared to a generic placebo label. Demand for the higher-priced coffee remained steady when its price was raised by 8%, but demand for the lower-priced coffee was elastic: a 9% price increase led to a 30% decline in sales. While consumers attach value to ethical sourcing, there is significant heterogeneity in willingness to pay for it.


I. Introduction
E THICAL product labels and marketing messages are increasingly common in retail settings, calling attention to particular aspects of the way goods have been made (e.g., labor practices, environmental standards, the treatment of animals) and to particular causes that stand to benefit when the goods are purchased (e.g., research on HIV/AIDs, the provision of clean drinking water). The Fair Trade label, which aims to guarantee a "better deal" for poor farmers in developing countries, is perhaps the best-known ethical label. Fair Trade coffee, tea, and chocolate are now marketed not just on college campuses and in fashionable cafés, but also in many major supermarket chains across the United States and in Europe (Walmart, Target, Safeway, Giant, Tesco, and Sainsbury's, among others), and global sales of Fair Trade products have risen by around 30% annually over the past decade (Fair Trade Labeling Organizations International, 2012). This is a new form of politicized consumption in which citizen-consumers vote with their shopping dollar to influence firm behavior and bring about political and social change. Its potential long-term impact in terms of the size of the market and the associated effects on firm behavior is difficult to assess. Skeptics dismiss Fair Trade and other ethically labeled products as cheap public relations ploys by companies and highlight the fact that such products account for a tiny share of retail sales. Supporters argue that if it continues to grow at the current rate, politicized consumption could have a large impact on firm behavior. Much attention has been devoted to survey evidence showing that a majority of consumers say they would prefer, and would be willing to pay extra for, any products they could identify as being made in ethical ways (Elliott & Freeman, 2003). As yet, however, there is no clear evidence that consumers will actually behave this way when they are shopping, thus giving firms strong incentives to change their behavior and invest in ethical labeling (Devinney, Auger, & Eckhart, 2010).
This paper reports new evidence on the impact of ethical labels on consumers' willingness to pay from a field experiment conducted among actual consumers in 26 stores of a major U.S. grocery store chain. The tests reveal that the Fair Trade label has a substantial positive effect on sales. Sales of the two most popular bulk coffees sold in the stores rose by almost 10% when the coffees carried a Fair Trade label as compared to a generic placebo label. Yet consumers also reveal different levels of price sensitivity when informed of the ethical product attribute. Demand for the higher-priced coffee was less elastic: sales of the Fair Trade-labeled coffee remained fairly steady when its price was raised by 8%. Demand for the lower-priced coffee was more elastic: a 9% increase in its price led to a 30% decline in sales as buyers switched to low-priced unlabeled alternatives. Overall, the findings suggest that consumers value ethical labeling as an important product attribute in the absence of any price differential relative to similar unlabeled products. However, in the presence of a price premium, we observe significant heterogeneity in the weight that different consumers place on ethical sourcing when making their purchasing decisions. Such behavioral responses to ethical labels might be driven by several factors, including differences in social preferences or in levels of information about the importance of ethical sourcing.
This study makes several contributions. First, our results have implications for an extensive literature in industrial organization and applied microeconomics that attempts to understand consumer behavior, how firms respond to consumer preferences, and how this interaction affects firm profits, market structure, and consumer welfare (Spence, 1976;Carlton, 1978;Hausman, Leonard, & Zona, 1994;Berry, Levinsohn, & Pakes, 1995;Nevo, 2010). The proliferation of ethical branding is based on the assumption that this is an effective means of product differentiation given altruistic consumers. Our results provide evidence of consumer heterogeneity in the valuation of the Fair Trade label, suggesting that firm-level marketing strategies can be designed to optimally account for market segmentation based on the complex interaction of price, ethical labels, and other product attributes. Second, to the best of our knowledge, this is the first paper to report results from a field experiment in which the researchers simultaneously manipulate product attributes like prices and labels to estimate demand effects across multiple retail stores. 1 Previous empirical research in the industrial organization literature has relied almost exclusively on estimating models of demand using observational data with a variety of techniques (and restrictions) applied to account for the endogeneity of pricing and marketing. Our tests highlight the advantages from the field experimental approach applied to a multistore setting. Third, our findings add new empirical evidence to complement a growing theoretical literature on the extent and implications of social preferences (Fehr & Schmidt, 1999;Andreoni, 2006;Benabou & Tirole, 2006).

II. Fair Trade and Consumer Demand for Ethically Certified Products
The Fair Trade certification and labeling program was developed by a group of humanitarian organizations aiming to alleviate poverty and promote sustainable development in developing countries by establishing more direct relationships between producers in those countries and sympathetic consumers in developed economies. Fair Trade-certified farmers receive a guaranteed minimum price for their crops and a price premium above the minimum or the current market price for the commodity, whichever is higher. 2 In addition, Fair Trade-certified importers must agree to longterm (minimum of one year) contracts with farmers and make available preharvest credit (up to 60% of the contract value). Fair Trade certification prohibits forced and child labor on farms along with ethnic and other forms of discrimination, and it restricts the use of potentially hazardous chemicals. Certification is generally restricted to small, family-owned farms and requires that farmers organize into cooperatives that decide democratically how to distribute or invest the fair trade premium paid on each contract. 3 As with other types of third-party certification and labeling, the Fair Trade program can be seen as a way to remove a market inefficiency that exists due to incomplete information 1 While Hilger, Rafaert, and Villas-Boas (2011) conduct an experiment that manipulates the labels of different types of wines in a retail setting, there is no price variation associated with the label.
2 For example, the minimum price for coffee (Arabica, unwashed) is currently $1.35 per pound, and the premium over the current market price is 20 cents per pound.
3 The program is administered by a collection of nonprofit Fairtrade Labelling Organizations (FLO) that oversees certification and licenses the use of the Fair Trade trademark in each national market (in the United States, certification and licensing is organized by Fair Trade USA, formerly known as TransfairUSA and a member of FLO until 2011, when it began operating independently). It has developed standards for production and trade for a range of agricultural products, including coffee, tea, cocoa, bananas, sugar, rice, and cotton (see http://www.fairtrade.net/generic_standards.html). It conducts inspections of producers in developing countries, examines contracts, and monitors the chain of custody by which the certified goods are supplied to traders and retailers who are licensed to use the Fair Trade label and logo only when all the standards have been met. In 2012 the program included over 1.3 million farmers in seventy nations in Africa, Asia, and Latin America, with annual global sales of certified products exceeding $6.6 billion in 2011. FLO estimates that approximately $103 million in premium payments was distributed to communities in 2012 for use in community development (FLO, 2012). on the part of consumers about the manner in which goods are produced. Removing this information asymmetry can facilitate product differentiation that increases consumer welfare by introducing additional product variety (Elliott & Freeman, 2003;Becchetti & Solferino, 2005) and enabling the fulfilment of social preferences (Camerer, 2002;Sobel, 2002). In the simplest type of models, lack of information about the ethical quality of goods available to consumers can lead to welfare losses, as consumers who prefer goods with high ethical quality cannot identify (and thus adequately reward) high-quality producers, and the latter are driven from the market by low-quality producers, which face lower costs (Bonroy & Constantatos, 2003, 2008. To a large degree, the success of the Fair Trade model hinges on the depth and strength of support for ethically labeled goods among consumers. 4 At present there is a great deal of uncertainty about whether the Fair Trade market can become large enough to have a substantial impact across a range of producers in the developing world. Total sales of Fair Trade goods in the United States in 2011 amounted to roughly $1.4 billion (FLO, 2012). This represents only about one-fortieth of the U.S. market for certified organic products and less than $5 per person annually. But the average annual rate of growth in U.S. sales of Fair Trade-certified goods was close to 40% between 1999 and 2008. By way of comparison, U.S. sales of certified organic products grew by around 20% annually between 1990, when certification began, and 2002 (Dimitri & Green, 2002). Fair Trade coffee, the largestselling certified product, accounts for over 3% of the total retail market for coffee and for close to 20% of the market for specialty coffees, the fastest-growing segment of the U.S. coffee market (TransFair USA, 2009a, 2009b. 5 Survey data suggest that a majority of consumers prefer, and are willing to pay substantially more for, products they can identify as being made in an ethical way. 6 Several of these studies have focused specifically on Fair Trade coffee and 4 A second necessary condition for the sustainability of this model is that producers in the developing world actually benefit from participating in the fair trade system. Research to date has provided only crude assessments of the impact of Fair Trade certification among developing country producers in the form of case studies of certified farmers that do not provide general measures of impact (Ronchi, 2002;Murray, Raynolds, & Taylor, 2006) and surveys of certified and noncertified producers that do not account for the nonrandom selection of farmers into certification (Arnould, Plastina, & Ball, 2006;Becchetti & Constantino, 2008;Bacon et al., 2008). 5 Fair Trade coffee is available in major coffee and food retailers, such as Starbucks Coffee, Peet's Coffee and Tea, Seattle's Best Coffee, Einstein Bros Bagels, Dunkin' Donuts, and McDonald's, as well as in many large supermarket chains, including Walmart, Target, Safeway, Giant, Costco, Trader Joe's, and Whole Foods Market. 6 For example, a survey administered in 1999 by the Program on International Policy Attitudes found that 76% of respondents indicated they were willing to pay $25 for a $20 garment that was certified as not being made in a sweatshop (Program on International Policy Attitude, 2000). A poll conducted in the same year by the National Bureau of Economic Research found that roughly 80% of surveyed individuals said they were willing to pay more for an item if assured it was made under good working conditions (Elliott & Freeman, 2003). A growing number of survey studies have provided additional evidence of consumers' stated willingness to pay for ethical qualities of products and for the ethical behavior of firms (Auger et al., 2003(Auger et al., , 2008Dickson, 2001;Mohr & Webb, 2005).

244
THE REVIEW OF ECONOMICS AND STATISTICS report that consumers are willing to pay a sizable premium for Fair Trade certification (De Pelsmacker, Driesen, & Rayp, 2005). Hertel, Scruggs, and Heidkamp (2009) found that over 75% of surveyed coffee buyers in the United States in 2006 said they would be willing to pay at least 50 cents more per pound for Fair Trade coffee versus noncertified coffee (a premium of roughly 16% over the average price of coffee at the time), and more than half said they would pay a premium of a dollar or more. But survey findings most likely reflect some degree of social desirability bias. What is required is direct evidence on how consumers actually behave when they encounter Fair Trade labels while shopping and deciding how to spend their own money.
A small number of empirical studies have examined relationships between observed sales or prices of goods and their ethical characteristics. For instance, Teisl, Roe, and Hicks (2002) examined scanner data on U.S. retail sales of canned tuna and found that market share (relative to other canned seafood and meat) rose substantially after the introduction of the dolphin-safe label in April 1990. Elfenbein and McManus (2010) found a price premium for items sold in eBay's Giving Works program (in which sellers direct a portion of the sale price to charity) compared with prices for similar items sold on eBay, and the premium was increasing in the amount donated to charity. On the Fair Trade label, and coffee specifically, Galarraga and Markandya (2004) gathered data on retail prices of coffee sold in major supermarkets in Britain and estimated that an average premium of around 11% was charged for coffee with a "green" label (they combined Fair Trade, organic, and shade-grown labels in this category). While such studies are suggestive of consumer support for ethically labeled products, because the observed outcomes reflect pricing and distribution decisions by sellers as well as consumer behavior, it is difficult for this type of approach to provide clear inferences about the real impact of ethical labels on consumer choices.
To date, very limited evidence is available from field experiments indicating whether and how consumers might alter their spending behavior when given the opportunity to distinguish Fair Trade or other ethically labeled products from alternatives. Kimeldorf et al. (2004) placed two identical groups of athletic socks in a department store, labeled one group as being made under "good working conditions" and altered the price of the labeled socks over several months. Hiscox and Smyth (2006) introduced a "fair and square" label describing ethical labor standards in facilities that manufactured brands of towels and candles sold in a retail store in New York City and then compared sales of labeled brands with sales of alternative brands. Arnot, Boxall, and Cash (2006) conducted tests with a university campus coffee vendor, adjusting the prices of two freshly brewed coffees, a Fair Trade-certified coffee from Nicaragua and a similar quality Colombian coffee, over the course of several days. In each of these field experiments, weaknesses in design made it impossible for the researchers to isolate the effects of the ethical labels from potential time-varying or product-specific confounding factors, or to compare the effects of ethical product labels with the effects of alternative types of marketing labels. 7 The experiments we report were designed specifically to overcome these problems and to gather new, direct evidence on how shoppers behave when encountering Fair Trade labels and making spending decisions in a multistore retail setting.

A. Model of Consumer Behavior
We employ a standard model of consumer behavior in which individuals may derive utility from a variety of characteristics of goods (Lancaster, 1971;Gorman, 1980). We assume consumers maximize their utility when choosing from a set of alternative products (e.g., types of coffee) available in a particular market. Each consumer's utility from buying a particular good depends on the observed product characteristics, which may include Fair Trade certification, as well as price.
Consumers may differ in how they evaluate the different product characteristics. Our tests are designed to measure average responses among consumers when certain key product characteristics-Fair Trade certification and price-are manipulated experimentally for specific products. We allow consumers to place different values on Fair Trade certification and to be more, or less, sensitive to prices charged for Fair Trade goods than they are to prices of unlabeled goods. We do not make specific assumptions about the motives of these consumers. The simplest type of assumption is that these consumers derive a "warm glow" satisfaction from supporting a program that is helping poor coffee farmers. This type of assumption is adopted in existing models of markets for ethically labeled goods (Richardson & Stähler, 2007;Baron, 2009). There are other motives that could generate a preference for purchasing ethically labeled products, some of them much less altruistic than others; however, our tests are not designed to assess the relative importance of alternative motivations among consumers favoring ethically labeled goods.
In general, the standards under which a good is made can be classified as "credence" attributes and are distinct from other types of product characteristics in that they cannot be directly assessed by the consumer's examining or using the item. Other product characteristics, such as price, size, and color, can be evaluated by consumers before they purchase the good; these are sometimes called "search" attributes. Characteristics such as quality, durability, and taste can be assessed by consumers after they have purchased the good and are known as "experience" attributes. 8 Although these experi-7 Comparing sales for products with and without the Fair Trade label cannot rule out the possibility of a pure label effect (i.e., one that is irrespective of the informational content of the label). The ideal design would compare the product with the Fair Trade label versus a noninformative placebo label. ence attributes are often not known to consumers at the point of purchase, firms can use a variety of methods to send credible signals about them, including guarantees, warranties, advertising, and investments in brand reputations. The information asymmetry problem for experience attributes is also partly alleviated by the fact that consumers can punish firms for poor quality by making no further purchases of their products (Akerlof, 1970;Shapiro, 1983;Palfrey & Romer, 1983). In the case of credence attributes, however, which are never directly observed by consumers before or after purchasing the product, firms find it much harder to make credible assurances. Firms that have incurred higher costs to produce goods with these characteristics can make claims about them to consumers, but competing firms can incur no additional costs and make similar claims. Certification and labeling of specific credence attributes of goods (e.g., Fair Trade standards) by an independent third party (e.g., FLO), can mitigate this problem, effectively transforming the credence attributes into search attributes (Caswell & Mojduszka, 1996). 9

B. The Setting
We investigated consumer demand for the Fair Trade label by conducting two experiments in 26 stores of a major U.S. grocery store chain, located in Connecticut, Massachusetts, Maine, and Rhode Island. The experiments took place in 2008 and 2009. The first test, the label experiment, examined the impact of the Fair Trade label on sales of goods at existing prices. The second test, the price experiment, investigated the price elasticity of demand for Fair Trade-labeled goods. The experiments focused on the two biggest-selling Fair Trade coffees sold in the stores, the French Roast (FR) Regular and a coffee blend (CB). Consumers could purchase coffee in the stores either from self-service bulk bins containing roasted coffee beans or in a separate section of shelves of packaged (whole and ground) coffee beans and instant coffees. All bulk coffee was supplied by the same company and during our experiments our test coffees-FR Regular and CB-were the only Fair Trade bulk coffees available. Sales of bulk coffee beans were about twice as large as sales of packaged coffee beans in the stores, and they accounted for over half the total coffee market in the stores (including sales of instant coffees).
Besides Fair Trade standards for farmers, other familiar examples of credence attributes include organic standards for production of food and fiber, exclusion of genetically modified organisms from foods, dolphin-safe methods for catching tuna, humane treatment of animals on farms, and various forms of environmental management standards adopted by firms to help to sustain forests and fisheries. 9 The value of the Fair Trade label to firms and consumers will depend in part on the degree to which consumers regard the particular third-party certifier as trustworthy. It is worth noting that our tests were not designed to assess the importance of third-party certification per se or the trustworthiness of FLO in the eyes of consumers (relative to the trustworthiness of the grocery store partner).

Figure 1.-Treatment and Control Condition for the Label Experiment
The labels were 2 inches by 2 inches.

C. The Label Experiment
In the label experiment the intervention consisted of attaching a 2-by-2-inch Fair Trade label to the bulk coffee bins containing the FR Regular and CB coffees in all stores assigned to the treatment condition. In stores assigned to the control condition, we attached a same-size generic placebo label to the bins containing these same coffees. The generic label was designed to be identical to the Fair Trade label in all the relevant dimensions from a marketing point of view, such as the size of the label and its color. The only difference was in the meaning of the label: the treatment label indicated the Fair Trade sourcing of the product, while the control label carried no specific information about Fair Trade and simply highlighted the name of the brand. We used the generic label for the control condition to allow for a generic label effect, unrelated to the specific informational content associated with Fair Trade, as past research has suggested that even seemingly meaningless forms of differentiation in marketing messages can affect consumer choices (Carpenter, Glazer, & Nakamoto, 1994). 10 Figure 1 shows the treatment and control labels that were displayed on the coffee bins. In the control label, we replaced the word coffee with the coffee supplier's brand name. The store sourced exclusively from this supplier, and this information was already available to consumers (the brand name was included on the standard display card on each bin that gave the price and the detailed description of each coffee type). Each coffee bin displayed the experimental label (treatment or control), the standard display card with the price and the description of the coffee type, and a sticker that indicated the time of the last roasting.
For the duration of the label experiment, we removed all other references to Fair Trade from the product descriptions of both Fair Trade bulk coffees so that the bin label was the only reference to Fair Trade in the bulk coffee sections of the stores. Prior to our experiment, the FR Regular coffee bins had displayed a small (half-inch square), black-and-white 246 THE REVIEW OF ECONOMICS AND STATISTICS Fair Trade logo beneath the coffee description on the standard display card, which was almost indiscernible to the casual shopper. These small logos were removed in all stores several weeks before the start of the first test. Packages of the FR Regular coffee, sold in the separate section of the store, carried a small Fair Trade logo on their reverse side before and during the experiment. It is not impossible that some perceptive repeat consumers did not react to the bin labels because they identified the FR Regular coffee as Fair Trade certified in the control condition of our label test if they recalled the previous label being on the bulk bins in the past or if they closely examined packages of the FR Regular coffee in the packaged coffee section of the store before shopping in the bulk section. As a consequence, the results we report can be interpreted as a lower bound of the true effect of the Fair Trade label.

D. The Price Experiment
In the price experiment, the intervention consisted of raising the prices for the Fair Trade-labeled FR Regular and CB coffees. In the treatment condition, prices were raised by $1.00 for both types of coffee. 11 Given the base price of $11.99 per pound for the FR Regular and $10.99 for the CB coffee, this represents price increases of about 8% and 9%, respectively. 12 In the control condition, prices remained at their usual levels. Notice that changing the prices changed the price ranking of the test coffees among the set of bulk coffees available at the stores. Almost all the other bulk coffees were sold at $11.99 per pound, so in the treatment condition, FR Regular moved from being an average-priced coffee to being among one of the most expensive coffees, at $12.99. Only two other specialty bulk coffees were sold at this higher price, and these accounted for a lower sales volume than the FR Regular (see the summary statistics in section IV for details of the consumers' choice set). The CB coffee was one of only two bulk coffees usually sold at $10.99 (the other was Colombian Supremo). So during the treatment period, this coffee moved from being one of the cheapest bulk coffees on offer to being an average-priced coffee, at $11.99. As we discuss, this had potentially important implications in terms of substitution effects.
In addition to the price increase, the stores in the treatment condition displayed a prominent 3-by-3-inch Fair Trade label on the bulk bins containing the FR Regular and CB coffees that carried a message aimed at inducing consumers to The labels were 3 inches by 3 inches.
connect the higher price specifically with Fair Trade certification. The label read: "A Fair Price to Support Fair Trade!" Stores in the control condition, where prices were not altered, displayed a Fair Trade label with the message: "Support Fair Trade!" The two labels are shown in figure 2. By explicitly directing consumers to associate the price premium with Fair Trade certification in the treatment condition, the test provides an assessment of their willingness to pay extra for this specific ethical product attribute. In the absence of such a message, it is possible that some customers would associate higher prices with some other type of unobserved product characteristic-an experience attribute, such as quality or flavor-thus making it more difficult to interpret the results (Bagwell & Riordan, 1991). 13 To provide a benchmark for examining consumer responses to price changes in the absence of any message prompting them to associate these changes with ethical sourcing, we examined historical data on sales of all the bulk coffees in the stores at different prices during a two-year period prior to the tests. The historical sales data allow us to estimate the price elasticities of demand for all bulk coffees in the stores in the absence of the test labels. This helps us benchmark the results from the price experiment.

E. Crossover Design
In both experiments, we relied on a two-group, two-phase crossover design (Jones & Kenward, 2003) whereby stores were randomly assigned to a sequence of treatment-control or control-treatment. In each store, the treatment or control condition was in place for an initial phase of four weeks, after which stores switched to the opposite condition for another four weeks. Thus, both experiments lasted eight weeks in total. 14 The crossover design provides higher efficiency than a simple parallel group design because we can exploit within-store variation for each store (assuming no carryover). The nocarryover assumption may be violated if perceptive repeat customers remember that the test coffees are Fair Trade certified and therefore disregard the label changes during the experimental period (in particular, in the stores in which the treatment labels are assigned in the first phase and replaced by the control labels in the second phase). Presumably this should result in an attenuation bias for the label effect, since customers who value Fair Trade certification would simply continue to purchase the test coffees even under the control condition. In section V, we report various robustness checks that support the no-carryover assumption. In particular, we find that the effects are similar when we consider only the first phase of the experiment (where no carryover is present) and when we replicate the crossover analysis while restricting the sample to sales during the last two weeks of each experimental phase when carry overeffects are less likely to occur. For the randomization, all 26 stores in our sample were initially matched into pairs on important covariates such as their history of average coffee sales, total sales, sales growth, and location characteristics. Within each pair, one store was then randomly assigned to the treatment-control and the other to the control-treatment condition, leading to a fully balanced design.

F. Data and Monitoring
To conduct the initial matching of stores, we combined store-level information on sales with socioeconomic data for the five-digit postal code areas for each store drawn from the 2000 U.S. Census. To analyze the results of the experiments, we relied on weekly register data on coffee sales in each store.
All stores received detailed instructions on how to attach the labels and change prices during the experiments. To ensure compliance with the experimental protocol of each experiment, we had our own monitors visit each of the participating stores during the first two days following the beginning of each treatment and control phase and once a week after that. Observers checked the label displays, prices, and whether there were any product stock-outs that might affect sales. At no time during the experiments were the FR Regular and CB coffees included in any promotional events or sales at the stores. 15 Store managers and coffee department personnel at the stores were extensively briefed on the experiments. Overall, compliance was high: in only a few cases were the labels switched a few days behind schedule. 15 During the experiment, one of the other bulk coffees was placed on sale. These weeklong sales promotions were routinely administered in all stores simultaneously nationwide and should therefore not lead us to reject our unconfoundedness assumption since it affected both treatment and control stores equally.

G. Randomization Checks
To verify whether the randomization successfully orthogonalized the treatment with respect to confounding factors, tables 1 and 2 display the covariate balance for a range of pretreatment characteristics. We report the mean covariate values in the treatment and in the control group as well as p-values from a two sample t-test (with unequal variances) and a bootstrapped two-sample Kolmogorov-Smirnov test (Abadie, 2002). The pretreatment characteristics include total store sales and total sales growth, as well as average dollar sales not only for each of the test coffees, but also for all bulk, all packaged coffee beans, and all instant coffee available in the stores. The averages are provided for both a 4week and a 52-week period prior to the tests. The balance tables also include a range of socioeconomic characteristics for the five-digit postal code areas in which the stores are located. For both experiments, we obtain very good balance on observed characteristics as variable means are close and none of the p-values indicate significant differences at conventional levels.

A. Statistical Model
For the estimation, we follow a standard framework in the discrete-choice literature (Ackerberg et al., 2007;Nevo, 2010). Let there be i = 1, . . . , ∞ consumers who maximize their utility by choosing one of j = 0, 1, . . . , J goods (i.e., various bulk coffees and an outside good) in t = 1, . . . , T markets. Markets are defined as store-weeks, and for both experiments, n = 1, . . . , 26 stores are observed over w = 1, . . . , 8 weeks; each store is observed for four weeks under the treatment and the control condition, respectively. Consumer i's utility from buying the jth good in market t is given by where x jt is a vector of observed product characteristics (which may include the price p jt ), ξ jt indicates product characteristics that are unobserved by the researchers (these can also be thought of as demand shocks), ν it are unobserved differences in consumer tastes, and θ is a vector of model parameters (to be estimated) that includes how sensitive consumers are to each of the observed product characteristics. 16 For identification we normalize the utility of the outside good, j = 0, to 0 and proceed with a simple logit specification where U ijt = δ jt + ε ijt with mean utility levels δ jt = x jt β + ξ jt ,   2 and 3 show means of covariates in the treatment and control group of stores. Column 4 shows the p-value from a two-sample t-test assuming unequal variances. Column 5 shows the p-value from bootstrapped Kolmogorov-Smirnov test. Census covariates refer to the postal code areas in which the stores are located (based on the five-digit postal code tabulations areas from the 2000 Census). and the error term for idiosyncratic tastes is assumed to be ε ijt iid ∼ extreme value type II. Aggregate market shares are thus given by s jt (x, β, ξ) = exp(x jt β+ξ jt ) J j=1 exp(x jt β+ξ jt ) , and following Berry (1994), we can solve for the mean utility as a function of observed market shares using δ jt = log(s jt ) − log(s 0t ), and estimate the model by regression.

THE REVIEW OF ECONOMICS AND STATISTICS
Our quantities of interest are the effects of the experimentally manipulated product characteristics (i.e., the Fair Trade label and the test price) on sales of the test coffees and on sales of the main alternative coffees that may be affected by substitution. We estimate the following model: where M is a ( J · T × J) matrix that contains 1 indicator variable for each of the inside goods M = [m j=1 , . . . , m j=J ].
For each inside good, the indicator variable is coded as 1 for store-weeks in which the treatment condition was assigned to the test coffees (i.e., the Fair Trade label or the test price) and 0 for store-weeks in which the control condition was assigned to the test coffees (i.e., the control label or the regular price). Accordingly, β = {β 1 , . . . , β J } is a ( J × 1) vector of coefficients that measures the effect of the various product characteristics on product sales. The sales effects of the experimentally manipulated product characteristics are allowed to vary across the J coffees. The ξ jn provide a full set of product and store fixed effects so that the identifying variation for the treatment effects is across time based on deviations from product and store-specific means. We also include a set of week fixed effects, ξ w , to account for weekly demand shocks that are common to all stores. 17 The key identifying assumption, E[Δξ jt |M] = 0, is supported given that the randomization orthogonalizes our treatments (i.e., the Fair Trade label and the price) with respect to all other observed or unobserved product characteristics of the test coffees and of all the competitor coffees. Unlike in almost all other studies involving demand estimation, endogeneity of product characteristics or pricing is not a concern here. For the label experiment, we include the product prices p jt as a regressor, although excluding the prices does not affect the point estimates of the label effect as expected given the randomization. For the price experiment, we omit prices because our treatment indicators measure the contrast between the test price and the control price. We use the coefficient estimates to compute own and cross-price elasticities. We cluster standard errors at the store level in order to allow for potential within-store correlation across time. For each experiment, we restrict the estimation window to the weeks when the experiment was underway. 18 17 Notice that common shocks are also directly accounted for via the balanced experimental design (i.e., at each point in time, half of the stores are assigned to treatment or control); the treatment effect coefficients are therefore unaffected by the inclusion of week fixed effects. 18 As indicated above, for the price experiment, we discard two hybrid weeks following the switch of the two conditions so that we also have eight weeks for each store-four weeks under each condition.
We include among the inside products the two test coffees, the FR Regular and CB coffees, as well as the five main alternative bulk coffees that were available across all stores: French Roast (FR) Extra Dark, Breakfast Blend, Regional Blend, Colombian Supremo, and Mexican. 19 We compute market shares by converting volume sales to pounds and dividing by the total potential number of pounds of coffee in a given market. The potential coffee market is assumed to be equal to one cup of coffee per customer per day in a given store-week. 20 Table 3 reports summary statistics for the test coffees and the alternative bulk coffees in 2009. The bulk coffees are all regularly priced at $11.99 per pound with the exception of the CB and the Colombian Supremo, which are cheaper, at $10.99 per pound. The FR Regular coffee is the best-selling bulk coffee with an average sales share of about 11% among the bulk coffees and average weekly sales of about 17.2 pounds (or $199.00) per store. The CB coffee has a sales share among the bulk coffees of about 7% and average weekly sales of about 9 pounds (or $99) per store. The alternative bulk coffees, none of them Fair Trade certified, all have weekly sales of about 11 pounds (or $115.00 to $126.00) per store, except for the Mexican coffee, for which sales are somewhat lower. It is worth noting that prior to our price experiment, the store did not impose a price premium for Fair Trade certified-coffee: the certified FR Regular and CB coffees were priced the same as similar noncertified coffees. This appears to be common practice among U.S. coffee retailers. 21

C. Results
The label experiment. Before presenting the results from the discrete choice model described above, it is useful to examine the impact of the FT label in a simple reduced-form specification. To do so, we regress the log of weekly dollar sales of the test coffees, FR Regular and CB, in each store 19 Notice that we discard about 6% of the cases (i.e., product-store-weeks) where sales are unavailable because of occasional stock-outs or bulk bin rotations. The missing observations mostly involve the less popular coffees such as the Mexican and the Colombian. There are almost no missing observations for the two test coffees. 20 The International Coffee Organization (2008) estimates that in 2007, the average coffee consumption in the United States was .40 ounces per person per day, or roughly one cup. The total number of customers per store-week is based on total stores sales divided by the average basket size. Our approach here follows previous studies that similarly approximate market potential based on population and average consumption in the relevant markets (Berry et al., 1995;Nevo, 2001). 21 Reinstein and Song (2012) report that Fair Trade coffee is typically priced the same as comparable coffees in most of the largest U.S. retailers such as Starbucks, Peet's Coffee and Tea, and Tully's. Presumably the companies are absorbing the additional costs themselves rather than passing it on directly to consumers or relying on cross-subsidization. FLO (2010b) points out that the additional costs of Fair Trade certification for the final product can be so small (a small percentage of the farm gate price of the raw commodity, that is itself often only a small percentage of the total cost of the retail item after shipping, processing, packaging, and marketing) that it is possible for firms to absorb them entirely.  Models 1-3 display regression coefficients with robust clustered standard errors in parentheses. The unit of analysis is a store week. The dependent variable in the regressions is the logged weekly dollar sales of the test coffees, FR Regular and CB. Model 1 refers to the combined sales of both test coffees; Models 2 and 3 refer to sales of the FR Regular and the CB, respectively. The independent variable is a treatment indicator coded as 1 for the four weeks in which the Fair Trade label was placed on the test coffees and 0 for the four weeks when the generic label was placed on the test coffees. The design is a fully balanced crossover experiment so each of the 26 stores is observed for four weeks under each condition (eight weeks in total). All models include a full set of store and week fixed effects. on a binary treatment indicator equaling 1 if the test coffee displays the Fair Trade label and 0 if it displays the generic placebo label. We also add a full set of store and week fixed effects, so that the average treatment effect of the Fair Trade label is identified based on deviations from store and week means. Standard errors are clustered by store. Table 4 presents the results from this reduced-form analysis. In column 1 we report that the Fair Trade label increased weekly sales of both FR and CB coffees by approximately 8% ( p < .03). Columns 2 and 3 show the effect of the Fair Trade label on each of the test coffees separately: we find that the increase in demand is estimated at 7% ( p < .10) for the FR Regular and at 12% ( p < .03) for the CB coffee.

THE REVIEW OF ECONOMICS AND STATISTICS
In table 5 we present the results from the discrete choice analysis, where we have imposed more structure in terms of the choice sets of the consumers and we can examine substitution effects. The findings reported in the first two columns confirm the results from the reduced-form analysis: the Fair Trade label has a positive and significant effect on sales of both the FR Regular and CB coffees. The first column examines combined sales for both labeled coffees: sales increased by about 10% with the Fair Trade label ( p < .01). The second column considers the effect of the Fair Trade label treatment on sales for each of the test coffees and all other inside bulk coffees. We find that the application of the label increased sales of CB coffee by about 13% ( p < .03) and increased sales of FR Regular coffee by about 8% ( p < .09).
The additional rows in column 2 in table 5 examine possible substitution effects. We find that sales of four out of the five alternative (unlabeled) bulk coffees decreased, although individually the effect is significant only at conventional levels for the FR Extra Dark, which is presumably the closest substitute to the labeled FR Regular coffee in terms of flavor. Sales for the less popular FR Extra Dark coffee decreased by about 9% ( p < .05) as a result of placing the Fair Trade label on the FR Regular counterpart. Notice that total sales of all bulk coffees increased by about 1.6% for stores in the treatment condition, although this increase is not statistically significant at conventional levels. 22 Taken together, the results provide strong evidence that consumers reacted positively to the Fair Trade label by increasing demand for labeled coffees. In the absence of customer-level data, we are unable to firmly establish the extent to which this represents new demand, as opposed to a substitution effect away from unlabeled coffees. 23 However, the relatively small elasticities of substitution suggest that part of the observed increase in demand for FR Regular and CB coffees represents new demand and not only switching by customers between Fair Trade labeled and unlabeled coffees. 24

D. The Price Experiment
Columns 3 and 4 in table 5 present the results for the price experiment. The third column examines combined sales for the FR Regular and CB coffees: sales decreased by around 17% as a result of the $1.00 price increase applied to these 22 Since the total sales increase for FR Regular is still significantly higher than the reduction in sales of FR Extra Dark, these results do not fully support the hypothesis of a "reputation stealing externality" imposed by the Fair Trade label on unlabeled competitor products (Benabou & Tirole, 2006). 23 The grocery store partner did not collect customer-level data, so we are unable to identify the average size of each customer's purchases or distinguish between repeat and new customers. 24 As an additional check, we also tested whether the Fair Trade labels on the FR Regular and CB coffees in the bulk coffee section of the stores had any impact on sales of packaged versions of these coffees sold separately. We found no significant impact on sales of packaged coffees. Models 1-4 display regression coefficients with robust clustered standard errors in parentheses. The dependent variables in the regressions are the normalized mean utility levels δjt = log(sjt ) − log(s0t ). The independent variables include treatment indicators (for the Fair Trade label and the test price accordingly) for each coffee. The estimation is restricted to eight weeks in which the experiment was underway (excluding a two-week washout period for the price experiment). All models include a full set of product and store fixed effects and week fixed effects. Models 1 and 2 also include product prices (coefficients not shown). The last three columns refer to the own price elasticities computed based on model 4 for the test coffees, where the regular unit prices are used as base prices. The experiment raised unit prices by about 8% and 9% for the CB and the FR Regular coffee, respectively. PE: point estimate; LB and UB: lower and upper bound of the 90% confidence intervals. two test coffees, but this aggregate result masks important heterogeneity of treatment effects. In column 4, we allow the price effect to vary across the different coffees. We find that for the FR Regular coffee, the 8% increase in price did not reduce sales: sales were actually 2% higher at the test price of $12.99 compared to the regular control price of $11.99. As shown in the right panel of the table, this corresponds to an own-price elasticity of .28 with a 90% confidence interval of (−1.54; 2.12) suggesting a relatively less elastic demand for this coffee when the price increase is explicitly linked to Fair Trade certification. In contrast, the 9% increase in the price of the CB bulk coffee from $10.99 to $11.99 resulted in sales falling by more than 30%, suggesting that demand for this less expensive bulk coffee is quite elastic despite the Fair Trade label with the message associating the higher price with the ethical certification. As shown in the right panel, the decline in sales of the bulk CB coffee corresponds to an estimated own-price elasticity of −3.32 (−4.26; −2.38).
The additional rows in column 4 examine the effect that the price increases for the FR Regular and CB coffees have on sales of the alternative bulk coffees in the stores. Most notably, the decline in sales for CB is matched by a strong substitution effect that increased sales of Colombian Supremo, the only other bulk coffee offered at the lowest price of $10.99 per pound. Sales for the Colombian Supremo coffee increased by almost 16%, corresponding to a substantial cross-price elasticity of 1.90 (−1.38; 5.17). There seem to be no strong substitution effects for the other competitor coffees. In particular, there was no substitution toward FR Extra Dark, the coffee that is closest in type to the FR Regular coffee and that was now being sold at a lower price relative to the FR Regular test coffee.
The results suggest that different customers react to the price increases for the Fair Trade-labeled coffees in different ways. For customers buying the more expensive and more popular FR Regular coffee, demand for the labeled coffee was significantly less elastic: this segment was willing to pay a sizable premium (8%) for Fair Trade-labeled coffee. Customers buying the cheaper CB coffee, on the other hand, appeared to switch to the less expensive alternative coffee in response to a price rise, indicating that they were not willing to pay a premium for Fair Trade. 25 In the absence of detailed consumer-level data, we are unable to fully specify the preferences of these two types of consumers and examine segmentation. Nonetheless, it is worth recalling from the summary statistics that the FR Regular coffee accounts for about 11% of total bulk coffee sales, while the CB coffee accounts for only 5.7% of sales, suggesting that the group of customers for which demand was less elastic was substantially larger. Also, note that total sales of all bulk coffees increased by about 1.8% under the test prices, although this increase is not statistically significant at conventional levels.

E. Benchmark Elasticities of Demand for Unlabeled Coffee
In order to better interpret our results from the price experiment, we investigated how shoppers responded in the past to changes in the prices of bulk coffees in the absence of 25 To test the relationship between price elasticities and income levels among customers, we broke down our store sample into higher-and lowerincome areas based on median household income data for each postal code area obtained from the Census. The results were inconclusive. In higher-income areas, we observe a marginally lower price elasticity for the FR coffee but a higher price elasticity for the CB coffee, and confidence intervals were overlapping between higher-and lower-income areas.

-Estimated Own Price Elasticities for Test Coffees and Competitor Coffees
Plots show point estimates and 90% confidence intervals for the own price elasticity of different bulk coffees (estimated from our discrete choice model). The top two estimates refer to the own price elasticity measured for the two test coffees, FR Regular and CB, during the price experiment when the price increase was linked to Fair Trade certification. The estimates below refer to own price elasticities for the two test coffees and competitor bulk coffees estimated from sales promotions using historical sales data for the 2007-2009 period. labeling that associated pricing with Fair Trade certification. We computed the own-price elasticities for all inside bulk coffees based on historical sales data. The identifying variation in prices is based on price changes that resulted from routine sales promotions. These sales promotions typically involve lowering the retail price of a single bulk coffee by $1.00 per pound for one week and are administered in all stores simultaneously nationwide. During the promotions, prices of all the other bulk coffees are held at their regular levels. Which bulk coffee is chosen for a promotion at any given time depends on a rotational schedule that is drawn up by the national sales team well in advance of implementation (there is usually a lead time of three or four months). Given the way these sales promotions are scheduled and managed at the national level, we believe that pricing endogeneity is not a significant concern when estimating the elasticity of demand for each coffee type during sales periods (and particularly not beyond the product-store mean level). 26 While these nonexperimental estimates are less ideal than a separate set of 26 Store managers do not have authority to implement sales promotions autonomously based on local conditions. As a result there is almost no between-store variation that can be exploited. This renders the use of Hausman instruments of average prices in other markets infeasible. Notice also that wholesale prices, which are sometimes used as instruments in this context, are not available. Wholesale prices do not vary between stores, or over time during the period under study. experimental results that would match the results presented above, they should still provide reliable benchmarks of own price elasticities for the same bulk coffees in the absence of messages linking prices with Fair Trade.
In order to estimate these benchmark elasticities, we use weekly sales and price data for all stores from 2007 to 2009, discarding the weeks during which our experiments took place. We estimate elasticities using a logit specification where the normal utility level δ jt is regressed on the product prices p jt , a full set of store and product fixed effects and a quadratic time trend. Standard errors are again clustered by store. Elasticities can then be estimated from the price coefficient and product shares. 27 Figure 3 shows the estimated own-price elasticities with their 90% confidence intervals based on the historical sales data, alongside the own-price elasticities for the labeled test coffees previously estimated from the Price experiment (the coefficient estimates are reported in appendix A in the online supplement). Not surprisingly, the elasticities from the historical sales data are more precisely estimated than those from the price experiment given the longer time span in the 27 In a given market, the elasticity of demand for product j with regard to a price change in product l is given by η l j = historical sales data. We find that the own-price elasticities of the unlabeled bulk coffees all tend to cluster around −4, indicating highly elastic demand. This is true also for the two test coffees, the FR Regular and the CB coffee, outside the weeks of the experiment and without the label inducing consumers to connect the price change specifically with Fair Trade certification. 28 These estimates are consistent with previous findings. While aggregate demand for coffee as a commodity is widely regarded as being inelastic (Larson, 2003), several studies have indicated that demand for specific types or brands of coffee is highly elastic with average elasticities of −7 (Krishnamurthi & Raj, 1991;Bell, Chiang, & Padmanabhan, 1999). Most important, and what stands out in figure 3, is that the estimates of price elasticities for unlabeled coffees are markedly higher (in absolute terms) than the estimated price elasticity of demand for the FR Regular coffee during the price experiment when we attached the label linking the price premium to Fair Trade certification. For the CB coffee, the price elasticity measured when the coffee was sold with the Fair Trade label was actually very similar to the estimated price elasticity at other times when it was sold without the label. Customers who buy the lower-priced CB coffee are sensitive to price, and this sensitivity is not affected by information linking price to Fair Trade certification. But for the higher-priced FR Regular coffee, customers are far less sensitive to price when the price premium is associated with Fair Trade certification than when the same coffee is sold without the Fair Trade label. The price elasticity of demand for FR Regular coffee when sold without the Fair Trade label is roughly the same as the elasticities of the other unlabeled coffees; demand for this coffee is less sensitive to price only when the price increase is associated directly with Fair Trade certification. This suggests that customers buying the FR Regular coffee responded directly to the Fair Trade label applied in the price experiment. 29

V. Discussion
In this paper we provide original data on the impact of an ethical product label on consumer behavior based on a field experiment conducted in partnership with a major U.S. grocery store chain. The first key finding is that consumers value the ethical label. Holding all other product attributes constant, 28 As a robustness check, we replicate the elasticities using a nonlinear almost ideal demand system (Deaton & Muellbauer, 1980) and the results, reported in the online appendix, are very similar to those from the logit model. 29 An important caveat to bear in mind when interpreting the results is that these benchmark consumer elasticities are estimated based on price decreases (promotions), while in the price experiment, elasticities are calculated based on a price increase. One concern would be that the higher benchmark elasticities derived from promotions were driven by a stockingup effect, whereby consumers stocked up on the product during promotion periods. However, the high frequency of coffee promotions in our stores, the relatively high unit cost of coffee; and the fairly rapid loss of coffee flavor during storage mitigate this concern (Gupta, 1988;Krishnamurthi & Raj, 1991). the Fair Trade label by itself had a positive and significant effect on sales. Sales of the two most popular items, the FR Regular and CB bulk coffees, rose by almost 10% when the coffees carried a Fair Trade label compared with a generic placebo label. Second, we find that consumers exhibit differential levels of price sensitivity when considering the Fair Trade label. Consumers buying the lower-priced CB coffee were price sensitive and were unwilling to pay a premium of 9% to support Fair Trade. Consumers buying the higherpriced FR coffee were much less price sensitive when the coffee was labeled Fair Trade. They were willing to pay a sizable premium (8% in the experiment) when the price premium was directly associated with support for Fair Trade certification.
A potential concern with our label experiment is that the generic placebo label might have had a negative impact on demand for the test products in the stores under the control condition. While we are unable to completely dismiss this possibility, we argue that it is highly unlikely. To test for this possibility, we exploit pretreatment sales data from the weeks prior to the start of the label experiment (the results from this test are reported in the online appendix). First, we focus on the sample of stores assigned to the control condition in the first phase of the experiment and compare sales of the test coffees in the period before the experiment and during the first phase of the experiment when the generic placebo label was placed on the coffees for four weeks. Combined sales of the test coffees remained stable in these stores when they entered the control condition and displayed the generic placebo label indicating that the label had no negative effect on sales (the effect estimate is 0.4%; p < .96). Next, we conduct the same comparison for the stores that were assigned the treatment condition (the Fair Trade label) in the first phase of the experiment. There we find that sales of the test coffees increased substantially once they displayed the Fair Trade label (the effect estimate is 15%, p < .03). These results strongly suggest that the treatment effect uncovered in the full crossover experiment is driven by the Fair Trade label increasing sales as opposed to the generic placebo label lowering sales.
Note that these findings also provide a robustness check that helps address concerns about potential carryover effects in the crossover design from switching from the treatment to the control conditions (or vice versa). Comparing the changes in sales from the preexperimental period to the first four weeks under the Fair Trade label and the generic placebo label yields an experimentally identified difference-in-differences estimate that implies that the Fair Trade label raised sales by 15% ( p < .13) over the generic placebo label during the first phase of the experiment. The fact that this first phase effect is similar to the effect estimated from the full crossover experiment reported above is consistent with a no-carryover assumption since the first phase is not affected by carryover from switching from the treatment to the control condition or vice versa. As another robustness check, we also replicated the crossover analysis while restricting the sample to sales 254 THE REVIEW OF ECONOMICS AND STATISTICS during the last two weeks of each experimental phase, when possible carryover effects are less likely to occur because we allow for a two-week washout period. The results are again similar to the ones for the full crossover period. The positive label effect is, if anything, slightly larger in magnitude (13%, p < .001), which is consistent with the idea that in our context, carryover primarily acts to attenuate treatment effects since perceptive repeat customers who value Fair Trade continue to buy the test coffees even when the label is switched to the generic placebo label.
Potential concerns for our price experiment are the possibilities that consumers of the FR Regular coffee (but not the CB coffee) might have perceived the higher price as signaling higher product quality, independent of the Fair Trade label, or that consumers of the FR Regular coffee (but not the CB coffee) might have strong taste-based preferences and hence inelastic demand. While our design prevents us from definitively ruling out either possibility, we again argue that both are highly unlikely. Our analysis of historical sales data shows that demand for all coffee types (including the FR Regular and the CB coffee) exhibited similarly high (and negative) price elasticities, indicating that customers typically substitute among the various coffee types in response to changes in prices. This is consistent with findings from previous empirical studies of elasticities of demand for coffee types and brands, as we noted. Moreover, our analysis suggests that there was nothing distinctive or exceptional about demand for the FR Regular coffee prior to the experiment: the price elasticity for the FR Regular coffee was similar to price elasticities of all the other coffee types, including those typically sold at the same price and those sold at slightly lower prices (such as the CB coffee). This suggests that consumers of the FR Regular were not distinctive in either the way they interpreted signals about quality or the strength of their preferences.
Overall our findings suggest substantial consumer support for Fair Trade, although some price-sensitive shoppers, accounting for a smaller volume of sales relative to the Fair Trade supporters in our sample, will not pay a large premium for the Fair Trade label. The suggested heterogeneity in consumer willingness to pay for ethical labels highlights the importance of having a clearer understanding of how different consumers assess different product attributes. How generalizable are our findings? We conducted the experiments in partnership with a grocery retailer that is associated with, among other things, relatively high prices compared with other grocery chains and stronger support for organic farming and social and environmental causes. Shoppers in our stores may thus tend to have higher incomes and more interest in social and environmental causes than the average consumer. 30 It is difficult to generalize from our results to other settings and other sets of consumers, and we do not claim that our 30 Data from the 2000 U.S. Census indicate that the median household income for postal codes in which our stores were located in the Northeast region was $60,111, compared with a median income of $54,140 for the Northeast Census region as a whole.
shoppers are representative of the universe of shoppers in terms of their preferences and sensitivity to prices. The overall direction of the potential bias is, however, not obvious. Individuals with higher incomes may be more likely than others to donate money to help people in need, since they have additional resources and less anxiety about their own economic circumstances. On the other hand, evidence suggests that lower-income individuals give proportionally more of their incomes to charity than do higher-income counterparts (Frank, 1996;Andreoni, 2001). Survey studies typically find no clear connection between willingness to pay for Fair Trade and other ethically labeled products and income levels of respondents (Dickson, 2001;De Pelsmacker, Driesen, & Rayp, 2005). As a result, it is not readily apparent whether findings from a study of a relatively high-income sample of consumers would tend to overestimate or underestimate the strength of demand for ethically labeled goods among the broader population.
It is also unclear whether the same tests conducted in a retail environment in which appeals to social preferences are less common would reveal a larger or smaller impact of the Fair Trade label. For one, the label may be more salient for consumers in an environment in which there is less competition in terms of cause-related marketing. On the other hand, the shoppers in our experiments are more likely to have been better informed about what the Fair Trade label represents than is the case in other retail environments. 31 Note that from this perspective, our findings are more likely to reflect the true preferences for ethical labels among (fully informed) consumers.
In addition, we conducted the experiments during one of the worst recessions in postwar history, raising prices of goods when retailers everywhere were cutting prices. It seems likely that consumer sensitivity to price increases in this period may have been particularly high. This should bias against finding higher willingness to pay a premium for the Fair Trade label. Ultimately, questions about generalizing the results would be best addressed by replicating the tests with different retailers, different products, and in different phases of the business cycle.
Overall, we suggest that in identifying significant support for ethical product labeling in a large-scale, multiple-store field experiment in the United States, our results could help motivate future research on consumer behavior and social preferences. An important future challenge is to provide a better understanding of the exact motivations of the consumers who respond to ethical product labeling. Intrinsic forms of motivation to purchase Fair Trade products may stem from pure altruism, when consumers derive private satisfaction from contributing to the well-being of others or from reducing global inequality (Fehr & Schmidt, 1999;Becchetti & Rosati, 2007), or "impure" forms of altruism, when consumers derive "warm glow" types of satisfaction simply from feeling better for giving to a cause (Andreoni, 1989(Andreoni, , 1990; Baron, 2009). 32 Alternatively, consumers may be extrinsically motivated by the anticipated impact that purchasing Fair Trade may have on their social status (Hollaender, 1990;Freeman, 1997;Cialdini, 2003;Goldstein, Cialdini, & Griskevicius, 2008), their self-image (Batson, 1998;Benabou & Tirole, 2006), or their reputation (Glazer & Konrad, 1996;Harbaugh, 1998;Fehr & Fischbacher, 2003;Benabou & Tirole, 2006). Finally, an additional extrinsic motivation for purchasing Fair Trade products could be the perception of higher product quality. Consumers could interpret ethical production standards, along with support for ethical causes and corporate social responsibility initiatives more generally, as a signal that the producing firm is an honest and reliable type that will not skimp on quality (Fisman, Heal, & Nair, 2006;Siegal & Vitaliano, 2007;Elfenbein, Fisman, & McManus, 2012). Besides addressing these specific types of potential motivations among consumers, future research could also identify the market for ethically labeled products according to sociodemographic segments such as age, gender, education, and income by combining experimental designs with individual-level data on purchasing behavior and consumer characteristics. 33 This research could then potentially clarify the conditions under which firms can boost sales and increase market share by offering Fair Trade-certified goods, either targeted to particular segments and priced at a premium, or marketed more generally at regular prices. 32 Empirical research on these specific types of motivations is limited. However, one set of findings consistent with pure altruism is from a survey experiment examining consumers' stated willingness to pay for Fair Trade (Hicks, 2007), which showed that the amount individuals were prepared to pay rose when they were provided with information about the positive impact of the program (specifically, information about the percentage of farmers participating and their revenues from Fair Trade sales). 33 Existing research based mostly on survey data reveal mixed or inconclusive results as to whether support for ethically labeled products is associated with key sociodemographic characteristics (e.g., De Pelsmacker et al., 2005). More recently, Cesarini et al. (2009) suggested that genetic differences can explain a significant portion of individual-level variation in preferences for giving.