Subjectivity Predicts Adjective Ordering Preferences

From English to Hungarian to Mokilese, speakers exhibit strong ordering preferences in multi-adjective strings: “the big blue box” sounds far more natural than “the blue big box.” We show that an adjective’s distance from the modified noun is predicted not by a rigid syntax, but by the adjective’s meaning: less subjective adjectives occur closer to the nouns they modify. This finding provides an example of a broad linguistic universal—adjective ordering preferences—emerging from general properties of cognition.


INTRODUCTION
Regularities in the behavior of speakers and speech communities provide a window onto the psychology of language.Here we take up one such regularity: adjective ordering.Speakers and listeners exhibit strong ordering preferences when two or more adjectives are used to modify a noun, as in "the big blue box" or "the good smooth purple plastic chair."Deviate from the preferred order, and the construction becomes odd.Something feels particularly unwieldy about "the blue big box," even more so with "the plastic good purple smooth chair."Why do most strings of adjectives have tightly constrained order?We investigate the role of adjective meaning, specifically the subjectivity of the properties that the adjectives name, in predicting ordering preferences.
Adjective ordering preferences stand as a particularly striking case of regularity in language.More remarkable than their robustness in English is their cross-linguistic systematicity: we continually find the same preferences across the world's languages.Hungarian (Uralic), Telugu (Dravidian), Mandarin Chinese, and Dutch are just a handful of languages with pre-nominal adjectives (i.e., languages where adjectives precede nouns) reported to have the same ordering preferences as English (Dixon, 1982;Hetzron, 1978;LaPolla & Huang, 2004;Martin, 1969b;Sproat & Shih, 1991).In languages like Selepet (Papuan) and Mokilese (Micronesian) with post-nominal adjectives (i.e., where adjectives follow nouns), these preferences are preserved in the reverse (Dixon, 1982;Hetzron, 1978;Sproat & Shih, 1991)-stable preferences determine the linear distance of an adjective from the noun it modifies.
There have been two general approaches to the investigation of adjective ordering preferences.As part of a larger project mapping the syntax and semantics of adjectives, the linguistics literature advances a universal hierarchy of semantic classes of adjectives.Leading the charge, Dixon (1982) set out to uncover language-internal structure by which to organize ordering preferences.The preferences were assumed to be hard-coded in the grammar; the researcher's job was simply to uncover them.Building on the ordering of semantic classes proposed by Dixon, Cinque (1994) advanced a fully syntactic account of the conventionalization of ordering preferences under which different classes of adjectives populate dedicated syntactic categories which inhabit specialized projections in the syntactic tree.For example, color adjectives project a Color Phrase, shape adjectives project a Shape Phrase.The Shape Phrase syntactically dominates the Color Phrase; with left-branching structure, hierarchical dominance results in linear precedence.The ultimate source of this rigid structure was immaterial; at issue was a comprehensive and deterministic account of the facts (see Scott, 2002, andLaenzlinger, 2005, for similar proposals).
Before the grammatical approaches, which map, as it were, the terrain of adjective structure, psychological approaches advanced the idea that aspects of adjectives' meaning explain their relative order.The trouble lies in deciding precisely which aspects of meaning are relevant.Kicking off the enterprise in 1898, Sweet proposed that adjectives which are more closely connected with the noun in meaning occur closer to the noun, and that adjectives with a more specialized meaning occur closer to the noun.Similarly, Whorf (1945) proposed that adjectives describing more "inherent" properties occur closer to the noun.Ziff (1960) proposed that adjectives with less context-dependent meaning occur closer to the noun, and that adjectives that felicitously describe a narrower set of nouns occur closer to the noun.Recent compositional approaches have argued that the fundamental factor in predicting adjective ordering is whether or not an adjective forms a new concept with the noun it modifies (McNally & Boleda, 2004;Svenonius, 2008): first you form the concepts (e.g., "wild rice" or "bad apple"), then you modify them (e.g., "Minnesotan wild rice").Similarly, Truswell (2009) argues that the type of composition an adjective invokes (i.e., intersective vs. subsective) determines its relative order (cf. the "absoluteness" proposal from Sproat & Shih, 1991).These proposals and others like them circle around similar aspects of adjective meaning in their account of ordering preferences; unfortunately, operationalizing metrics like meaning distance, specificity, inherence, and context-dependence is not a trivial task (but see the attempt in Martin, 1969a, as well as our "Comparing subjectivity with alternative accounts of adjective order" in our Supplemental Materials; Scontras, Degen, & Goodman, 2017).
We revisit the idea that ordering preferences emerge from aspects of adjective meaning, attempting to provide more thorough empirical grounding to these notions; from the grammatical approach we adopt the strategy of using semantic classes of adjectives to structure our investigation and smooth our data.Distilling the psychological proposals that precede us into a single feature, we advance the hypothesis that it is the subjectivity of the property named that determines ordering preferences, such that less subjective adjectives occur linearly closer to the nouns they modify (Hetzron, 1978;F. Hill, 2012;Quirk, Greenbaum, Leech, & Svartvik, 1985).In "the big blue box," judgments about bigness are likely less consistent than judgments about blueness; "blue" is less subjective than "big," and so, according to this theory, it occurs closer to the noun "box." We believe that subjectivity synthesizes-rather than supplants-many of the previous psychological approaches, incorporating notions like "inherentness" and "context dependence" into an intuitive psychological construct that readily operationalizes as a behavioral measure.To test the hypothesis that adjective subjectivity predicts ordering preferences, we created and validated empirical measures of the ordering preferences themselves and of an adjective's subjectivity.With reliable estimates of both, we then evaluated the predictive power of subjectivity in adjective ordering preferences.To evaluate the relative success of our subjectivity hypothesis, in "Comparing subjectivity with alternative accounts of adjective order" in our Supplemental Materials (Scontras et al., 2017), we operationalized three of the Subjectivity Predicts Adjective Ordering Preferences Scontras, Degen, Goodman previous accounts (inherentness, intersective vs. subsective modification, and complex concept formation) and compared their predictions with those of subjectivity.

Ordering preferences
We began by measuring preferences in adjective ordering.We selected a sample of 26 relatively frequent, imageable adjectives from seven different semantic classes (age, color, dimension, material, physical, shape, value).We then elicited naturalness judgments on adjectiveadjective-noun object descriptions.

Participants
We recruited 50 participants through Amazon.com'sMechanical Turk crowdsourcing service.Participants were compensated for their participation.

Design and methods
Participants were asked to indicate which of two descriptions of an object sounded more natural.Each description featured a noun modified by two adjectives, for example "the red small chair" or "the small red chair."Descriptions were random combinations of two adjectives and a noun from the list in Table 1, with the constraint that no description contained adjectives from the same semantic class.Description pairs contained the same words, with relative adjective order reversed.On each trial, participants indicated their choice by adjusting a slider with endpoints labeled with the competing descriptions; an example trial appears in Figure 1.Participants completed 26 trials.On each trial, we measured the distance of the slider from each endpoint; values ranged between 0 and 1.Only native speakers of English were included in the analyses; we analyzed data from 45 participants.

Results
For each adjective, we computed its mean naturalness score by averaging ratings of configurations in which it appeared in first position, farthest from the noun.Figure 2 (naturalness) plots these mean naturalness scores by adjective class; greater values signal that a class's adjectives are preferred in first position, farther from the noun.This preferred distance measure  closely tracks class-level ordering hierarchies reported in the literature (Dixon, 1982;Sproat & Shih, 1991).

Corpus Validation
To validate our behavioral measure of ordering preferences, we conducted a corpus study on the same 26 adjectives and measured their mean distance from the noun in phrases with two adjectives.We used TGrep2 (Rohde, 2005) and the TGrep2 Database Tools (Degen & Jaeger, 2011) to extract all "A A N" NPs that contained one of the 26 adjectives in Table 1 from the Penn Treebank subset of the Switchboard corpus of telephone dialogues (Godfrey, Holliman, & McDaniel, 1992), as well as from the spoken and the written portions of the British National Corpus (BNC, see http://www.natcorp.ox.ac.uk/).For these cases, we computed the distance of each occurrence of our 26 target adjectives from the modified noun, yielding results for a total of 38,418 adjective tokens.For each adjective, mean distance from the noun was computed (where the position directly preceding the noun was coded as 0, and the position preceding that was coded as 1).
Mean distance from the noun for each adjective class is shown in Figure 2 (corpus).The corpus measure closely tracks the qualitative pattern we measured in our naturalness experiment; quantitatively, the two measures are highly correlated (r 2 = .83,95% CI [.63, .90]), in spite of the fact that the corpus measure includes cases from a superset of the nouns tested in our naturalness experiment.Our naturalness ratings thus operationalize both immediate ordering preferences and speakers' preferences in natural usage.

Subjectivity
With clear estimates of ordering preferences, we then measured the subjectivity of the adjectives that were tested in the ordering preferences experiment.We started with a direct measure of "subjectivity."

Participants
We recruited 30 participants through Amazon.com'sMechanical Turk crowdsourcing service.Participants were compensated for their participation.

Design and methods
Participants were shown a series of adjectives and asked to indicate how "subjective" each one was on a sliding scale with endpoints labeled as "completely objective" (coded as 0) and "completely subjective" (coded as 1; Figure 3).Participants completed a total of 26 trials, one for each adjective in Table 1.The order was randomized for each participant.Only native English speakers were included in the analyses; we analyzed data from 28 participants.

Results
We averaged the subjectivity scores for each adjective; greater values indicate greater subjectivity.These averages were used in the analyses reported below.Figure 2 (subjectivity) shows these scores by adjective class.

Faultless Disagreement Validation
Because subjectivity may be an ambiguous, or even subjective, property, we explored a second measure that may have greater ecological validity.We operationalized subjectivity as the potential for faultless disagreement between two speakers, which captures potential uncertainty about assessment criteria and assessment outcomes (Barker, 2013;Kennedy, 2013;Kölbel, 2004). 1 We had participants (n = 40) evaluate whether two speakers could both be right while the speakers produced conflicting object descriptions.For example, an experimental trial would have Mary assert, "That apple is old," then have Bob counter with "That apple is not old"; participants rated whether both Mary and Bob could be right, or whether one of them must be wrong.This measure, the faultless disagreement potential for the adjective at issue, serves as an empirical estimate of adjective subjectivity.Figure 2 (faultless) plots these scores by adjective class, where a value of 1 signals that a class's adjectives are always amenable to faultless disagreement (i.e., maximally subjective).The results of this method were highly correlated with our direct "subjectivity" scores (r 2 = .91,95% CI [.86, .94]),suggesting that they measure a common underlying value: adjective subjectivity.
One might worry that conducting our analysis at the level of individual adjectives obscures information about the specific adjective-adjective configurations that participants rated in our naturalness experiment.We therefore computed a subjectivity difference score for each adjective class configuration (i.e., an ordered pairing of two adjective classes, CLASS1-CLASS2) by subtracting the mean subjectivity score for CLASS2 from the mean subjectivity score for CLASS1.Higher difference scores indicate that the adjective class closer to the noun is less subjective than the class farther away.Figure 5 plots mean naturalness ratings for adjective class configurations against these subjectivity difference scores; the two measures are highly correlated (r 2 = .80,95% CI [.68, .88]).We also see that as the difference in subjectivity approaches zero, the naturalness ratings approach 0.5 (i.e., chance): ordering preferences weaken for adjectives of similar subjectivity (e.g., "yellow square" or "fresh soft").

Discussion
We found that adjective subjectivity scores account for almost all of the variance in naturalness ratings, for several different analyses, strongly supporting our hypothesis that less subjective adjectives occur closer to the noun.In "Comparing Subjectivity With Alternative Accounts of Adjective Order" in our Supplemental Materials (Scontras et al., 2017), we compare these results with the predictions made by other accounts.We found that subjectivity vastly outperforms adjective inherentness (r 2 = .00,95% CI [.00, .02])and concept-formability (r 2 = .00,95% CI [.00, .00]) in accounting for ordering preferences.Indeed, we failed to find any evidence that ordering preferences depend on the modified noun.For subsective versus intersective modification, we found that subjectivity explains independent variance in the observed preferences within the different modification classes.
One might worry that the observed success of subjectivity in predicting ordering preferences is an artifact of the set of 26 adjectives we tested, and might not generalize to a broader set of adjectives.Therefore, we next consider a much larger set of adjectives.

EXPERIMENT 2: GENERALIZING OUR FINDINGS
To test the generalizability of the findings from Expt. 1, we aimed to construct a set of adjectives that are attested in multi-adjective constructions and that span both semantic classes and a broad spectrum of frequencies and lengths.The set of 78 adjectives we ultimately used includes many adjectives that are traditionally overlooked in investigations of ordering preferences.

Participants
We recruited 495 participants through Amazon.com'sMechanical Turk.Participants were compensated for their participation.

Materials
Starting with naturally-occurring examples of double adjective modification from the Switchboard corpus, we chose 196 unique adjectives (from 13 different classes; Table 2) and 166 unique nouns.Details of our selection process can be found in "Materials Selection for Expt.2" in our Supplemental Materials (Scontras et al., 2017).

Design and methods
The design was similar to our previous naturalness rating experiments (Expt.1: Ordering preferences): participants indicated which of two object descriptions sounded more natural, choosing between adjective-adjective-noun permutations that varied the relative order of the adjectives.Adjectives were chosen at random from the set in Table 2, with the constraint that adjectives from the same class were not paired together.completed 30 trials.On each trial, participants indicated their choice by adjusting the slider between endpoints labeled with the competing descriptions.Additionally, participants were able to indicate if a particular description did not make sense by checking a box labeled "Neither option makes sense."Only native speakers of English were included in the analyses; we analyzed data from 473 participants.

Results
For each adjective, we computed its mean naturalness score by averaging ratings of configurations in which it appeared in first position, farthest from the noun.Participants demonstrated little preference for adjective order when the descriptions were nonsense.For this reason, we excluded responses to nonsensical descriptions from the analyses of subjectivity below; this exclusion process removed 2,295 observations (16% of the total 14,190).

Subjectivity
Next, we evaluated the subjectivity of our new set of adjectives using the direct "subjectivity" task from Expt.1: Subjectivity.

Participants
We recruited 198 participants from Amazon.com'sMechanical Turk.Participants were compensated for their participation.

Design and methods
The design was identical to our previous direct "subjectivity" experiment.Participants completed a total of 30 trials.On each trial an adjective was chosen at random from the set of 78 in Table 2.Only native speakers of English were included in the analyses; we analyzed data from 189 participants.

Results
We averaged the subjectivity scores for each adjective; greater values indicate greater subjectivity.To evaluate the power of subjectivity in predicting adjective ordering preferences, we compared subjectivity scores with the naturalness ratings (Figure 6).Adjective subjectivity scores account for 51% of the variance in the naturalness ratings (r2 = .51,95% CI [.32, .66]).Four observations clearly stood out in Figure 6, corresponding to the superlatives best, biggest, closest, and last.Indeed, superlatives have been observed to eschew adjective ordering preferences, occurring farthest from the modified noun regardless of class or subjectivity (Dixon, 1982); our naturalness ratings reflect this fact.Removing superlatives, subjectivity scores perform markedly better, accounting for 61% of the variance (r 2 = .61,95% CI [.47, .71]).At the level of adjective class configurations, subjectivity difference scores account for 74% of the variance in the configuration ratings (r 2 = .74,95% CI [.66, .79]; Figure 7). 2 A post-hoc look at our data revealed a small number of outlier adjectives (in addition to the four superlatives).To systematically detect these outlier adjectives, we fit a linear regression predicting naturalness ratings by subjectivity scores, then calculated the absolute difference between the actual naturalness ratings and the model's predicted values.Setting the cutoff for this difference score at 3 × standard deviation, four adjectives stood apart as outliers: entrepreneurial, solid, current, and daily (labeled in blue in Figure 6).Without the four outlier adjectives (and the four superlatives), adjective subjectivity scores account for 70% of the variance in the naturalness ratings (r 2 = 0.70, 95% CI [0.58, 0.78]).
We also looked at the contribution of frequency and length in predicting ordering preferences.Treating subjectivity, frequency, and length as predictors in a linear regression predicting naturalness ratings (excluding superlatives), the model accounts for 70% of the variance (r 2 = .70).Nested model comparison reveals that the subjectivity predictor explains significant variance in the extended model, F(1, 70) = 141.38,p < .001; the frequency and length predictors also explain significant variance, frequency: F(1, 70) = 7.71, p < .01;length: F(1, 70) = 9.73, p < .01.If we remove outlier adjectives that fall more than three standard deviations away from the predicted value of the extended model (there were six: mini, frozen, solid, current, daily, designated), the model performs better, accounting for 76% of the variance (r 2 = .76).

Discussion
The results of the current experiment demonstrate that subjectivity predicts ordering preferences in a much larger set of materials drawn from naturally occurring examples.At worst, subjectivity accounts for more than half of the variance in the naturalness ratings for our set of 78 adjectives.Once we exclude superlatives, whose semantics likely dictates their position in strings of nominal modifiers, as well as four outlier adjectives, subjectivity accounts for 70% of the variance in this set of 70 adjectives.While adjective frequency and length contribute to the observed preferences, we saw that subjectivity alone accounts for the vast majority of the variance in our data.
There remains the question of precisely why the four outlier adjectives-entrepreneurial, solid, current, and daily-performed so poorly with respect to the predictions of subjectivity.Perhaps the most notable feature of this set of adjectives is its heterogeneity: we fail to find clear groupings by semantic class, relative frequency, or length.However, length likely does factor into the observed behavior of entrepreneurial, the longest adjective tested, which was the only outlier underpredicted by its subjectivity: participants preferred entrepreneurial closer to the noun than its subjectivity alone would predict.Indeed, relative length has long been known to affect the order of constituents, even in the domain of adjective ordering (Wulff, 2003): longer constituents appear later.Once we factor length into the equation predicting ordering preferences, entrepreneurial no longer stands out.

GENERAL DISCUSSION
Adjective ordering preferences have received considerable attention throughout the history of generative grammar and cognitive psychology, owing to their remarkable stability within and across languages.Something so robust, the reasoning goes, must evidence a deep principle of the cognitive architecture that shapes language.Yet while descriptions of the phenomenon abound, an explanation has proven elusive.Grammatical theories that posit a rigid syntax of adjective classes offer little more than a codification of the facts, and psychological approaches stumble when it comes to operationalizing the specific aspects of adjective meaning at play.
In our investigation, we established two empirical constructs: the preferences themselves, which we measured using naturalness ratings and validated with corpus statistics; and adjective subjectivity, which we measured directly and corroborated with potential for faultless disagreement.An adjective's semantics predicts its distance from the modified noun, such that less subjective adjectives occur linearly closer to nouns they modify.In our Supplemental Materials (Scontras et al., 2017), we investigated the predictions of three other hypotheses from the literature: adjective inherentness (i.e., how essential an adjective's meaning is to the noun it modifies; Sweet, 1898;Whorf, 1945), intersective versus subsective modification (i.e., the mode by which an adjective composes semantically with the noun it modifies; Truswell, 2009), and concept formability (i.e., whether an adjective composes with a noun to form a complex, idiomatic concept; Bouchard, 2005;McNally & Boleda, 2004;Svenonius, 2008).In each case, we found that subjectivity has greater predictive power.
It bears noting that the preference to place less subjective adjectives closer to nouns is not deterministic; nonpreferred orderings of adjectives can serve a communicative purpose, for example to establish contrastiveness in discourse (A. A. Hill, 1958;Martin, 1969aMartin, , 1970;;Vendler, 1963).This constrastiveness follows straightforwardly from a manner implicature (Levinson, 2000): marked forms (i.e., nonpreferred orderings of adjectives) yield marked interpretations (i.e., atypical modification constituency).The work lies in determining the preferred orderings from which contrastive uses depart.Indeed, many other situational factors are likely to influence ordering (e.g., phonological shape, noun semantics, word and bigram frequencies; cf.Wulff, 2003, and the results of Expt.2); it is the more general tendencies we are concerned with here.
Adjectives are just one of many elements that may occur in complex nominal constructions.Other classes of elements include demonstratives (e.g., this and that) and numerals.In his Universal 20, Greenberg observes that the relative order of these higher-order classes is also stable cross-linguistically (Culbertson & Adger, 2014;Greenberg, 1963), suggesting that subjectivity interacts with additional constraints from semantic composition in the determination of word order.Indeed, we saw hints of such interactions in Expt.2, where superlatives stood apart from run-of-the-mill adjectives.Beyond nominals, adverbs (e.g., honestly, probably, carefully) are reported to exhibit regular orderings cross-linguistically (Cinque, 1999;Ernst, 2002).Understanding these orderings would likely benefit from a systematic empirical treatment similar to the one we have advanced here.
While subjectivity accounts for the regularities we observe in adjective ordering, the deeper explanation for how subjectivity determines the relative order of adjectives remains unsettled.Our results suggest that ordering preferences likely emerge, at least partially, from a desire to place less subjective content closer to the substantive head of a nominal construction (i.e., closer to the modified noun).For now we can only speculate about the ultimate source of this desire.Subjective content allows for miscommunication to arise if speakers and listeners arrive at different judgments about a property description.Hence, less subjective content is more useful at communicating about the world.An explanation along these lines, based on pressures to facilitate successful reference resolution, would have to depend on the hierarchical, not linear, ordering of adjectives: noun phrases are built semantically outward from the noun, and more useful, less subjective content enters earlier in this process (cf. the mirroring of preferences in pre-vs.postnominal languages).A full explanation must examine not only why we observe the preferences that we do, but also how and to what extent these preferences get conventionalized via the diachronic processes that shape language-a promising direction for future research.
Whatever its source, the success of subjectivity in predicting adjective ordering preferences provides a compelling case where linguistic universals, the regularities we observe in adjective ordering, emerge from cognitive universals, the subjectivity of the properties that the adjectives name.

Figure 1 .
Figure 1.Example trial from Expt.1: Ordering preferences.Participants indicated the more natural of two adjective-adjective-noun descriptions on a sliding scale.

Figure 4 .
Figure 4. Mean naturalness ratings plotted against mean subjectivity scores for each of the 26 adjectives tested in Expt. 1.

Figure 5 .
Figure 5. Mean configuration naturalness ratings plotted against subjectivity difference scores for each pair of adjective classes tested in Expt. 1.

Figure 6 .
Figure 6.Mean naturalness ratings plotted against mean subjectivity scores for each of the 78 adjectives tested in Expt. 2. Superlatives are labeled in green; outlier adjectives are labeled in blue.

Figure 7 .
Figure 7. Mean configuration naturalness ratings plotted against subjectivity difference scores for each pair of adjective classes tested in Expt. 2.