High local mutual information drives the response in the human language network

Traditionally, syntactic operations are thought of as the core computational machinery that sets human language aside from other animal communication systems. Here, we tested an alternative hypothesis: the primary driver of the response in the language-selective regions of the brain is semantic composition. Using formal machinery from information theory, we estimated the likelihood of semantic composition via mutual information among words in a local linguistic context. Across two fMRI experiments, we examined the processing of veridical sentences as well as syntactically degraded sentences, including sentences where the local context does not support semantic composition. Consistent with behavioral/computational modeling work, syntactic degradedness did not lead to lower responses in the fronto-temporal language-selective network, except for when mutual information among words was low. These results challenge the primacy of syntax in the human language architecture, instead supporting the idea that successful semantic composition is what drives the language network in the brain.

: A sample item from the critical experiment; colors are used to illustrate the increasing degradedness (i.e., the color spectrum becomes progressively more discontinuous with more swaps). a. The schematic of the procedure used to create the scrambled-sentence conditions in Experiment 1. b. A sample stimulus from the LowPMI condition in Experiment 2. The parcels used to define the language-responsive areas. In each participant, the top 10% of most localizerresponsive voxels within each parcel were taken as that participant's region of interest. Replicating prior work [1], the localizer effect -estimated using across-runs cross-validation to ensure independence -was highly robust in both experiments (ps< 0.0001). c-d. Neural responses (in % BOLD signal change relative to fixation) to the conditions of the language localizer and Experiments 1 (n=16) and 2 (n=32). e. The formula for computing pointwise mutual information (PMI) (see Materials and Methods for details), and average positive PMI values for the materials in Experiments 1 and 2 (N.B.: Slightly different scramblings of the materials for the Scr1, Scr3, and Scr5 conditions were used in the two experiments; hence two bars (left=Experiment 1) for each of these conditions.) originally proximal content words (Figure 1; see Materials and Methods). The manipulation was effective, 23 leading to a significant drop in local mutual information (Figure 2e). According to the local mutual infor-24 mation hypothesis, the neural response should be substantially lower for this condition relative to the other 25 degraded conditions.

27
In Experiment 1, replicating much prior work [30, 1, 6], well-formed sentences elicited significantly stronger 28 responses than the word-list and nonword-list conditions ( Figure 2c, Table 1). Strikingly, however, degrading 29 the sentences by introducing word swaps did not decrease the magnitude of the language network's response: 30 even stimuli with seven word swaps (e.g., their last on they overwhelmed were day farewell by messages and 31 gifts; Figure 1) elicited as strong a response as fully grammatical sentences (e.g., on their last day they were 32 overwhelmed by farewell messages and gifts; Figure 2c, Table 1). This pattern of similarly strong neural 33 responses for the well-formed and degraded sentences is in line with the local mutual information hypothesis:  In Experiment 2, we replicated the pattern observed in Experiment 1 for the intact sentences and sentences 1 with one, three, or five local word swaps, all of which elicited similarly strong responses, all reliably higher 2 than the control word-list condition (Table 1). Critically, in line with the local mutual information hypothesis, 3 the LowPMI condition elicited a neural response that was as low as that elicited by the list of unconnected 4 words ( Figure 2b, Table 1).

5
In spite of eliciting as strong a neural response as veridical sentences, the scrambled sentences are rated as 6 less acceptable behaviorally (SI), suggesting there has to be a cost to the processing of this kind of degraded 7 linguistic input. Indeed, some brain regions outside the fronto-temporal language network-specifically, 8 within the domain-general fronto-parietal multiple demand network [31, 32]-were sensitive to the scrambling 9 manipulation, with stronger responses to more degraded stimuli (SI).

10
Discussion 11 Across two fMRI experiments, we found that sufficiently high local mutual information appears to be nec-12 essary and sufficient for eliciting the maximal response in the language system, where maximal is defined 13 as the response to the preferred stimulus-well-formed and meaningful sentences. > Nonwords contrast targets brain regions sensitive to high-level linguistic processing [1]. We have previously 10 established the robustness of this contrast to materials, modality of presentation, language, and task [1,43,7].

11
Each trial started with 100 ms pre-trial fixation, followed by a 12-word-long sentence or a list of 12 nonwords 12 presented on the screen one word/nonword at a time at the rate of 450 ms per word/nonword. Then, a line 13 drawing of a hand pressing a button appeared for 400 ms, and participants were instructed to press a button 14 whenever they saw this icon, and finally a blank screen was shown for 100 ms, for a total trial duration of conditions. In particular, a word was chosen at random and switched with one of its immediate neighbors.

29
This process was repeated a specified number of times. Because one random swap can directly undo a 30 previous swap, we ensured that the manipulation was successful by calculating the edit distance. ( words/nonwords presented one at a time with no punctuation in the center of the screen, for 500 ms each, in black font using capital letters on a white background), followed by a blank screen for 300 ms, followed 48 by a memory probe presented in blue font for 1,200 ms, followed again by a blank screen for 500 ms. The create condition orderings and to distribute fixation among the trials so as to optimize our ability to de-9 convolve neural responses to the different conditions. Condition order varied across runs and participants.

11
Critical task in Experiment 2 12 Design and materials. As in Experiment 1, participants read sentences with correct word order (Int) 13 and sentences with progressively more scrambled word orders (Scr 1, 3, and 5). The latter three conditions

37
The materials were distributed across five experimental lists; any given participant saw the materials 38 from just one list, and each list was seen by 5-7 participants.

39
As in Experiment 1, at the end of each trial, participants were presented with a word and asked to decide 40 whether this word appeared in the preceding trial (see SI for behavioral performance).

41
Procedure. The procedure was identical to that in Experiment 1 except that the memory probe was across the five word-list conditions, leaving a total of six conditions. In all the critical analyses, we consider 32 the language network as a whole (treating regions as random effects; see below) given the abundant evidence 33 that the regions of this network form an anatomically [e.g., 50] and functionally integrated system, as 34 evidenced by strong inter-regional correlations during rest and language comprehension [e.g., 51]), but see 35 Figure S1 and Table S3 for the six language fROIs' profiles and associated statistics.

36
Computing mutual information values. We used a sliding four-word window to extract local word 37 pairs from each 12-word string. This is equivalent to collecting the bigrams, 1-skip-grams and 2-skip-grams 38 from each string. For each word pair, we calculated PMI as follows: We declare no competing interests.
wisher. New method for fmri investigations of language: defining rois functionally in individual subjects.  natural to 7=natural). The ratings were analyzed using a mixed effect linear regression model with a fixed 5 effect and random slopes for Condition, and random effects for Participant and Item. To demonstrate the 6 effectiveness of the manipulation at every level, Condition was backwards difference coded. As can be seen 7 in Table S1, every increase in degradation was rated as significantly less natural than the previous amount of 8 degradation, although with diminishing returns. Thus participants were robustly sensitive to the degradation 9 manipulation. Code and data are available at OSF: https://osf.io/y28fz/.  Behavioral (memory probe task) data in Experiments 1 and 2.

11
To ensure that participants were attentive across conditions, we included a memory probe task in both 12 Experiments 1 and 2. After each stimulus (sentence, word list, or nonword list), participants saw a probe 13 word/nonword and were asked to press one of two buttons to indicate whether this word/nonword appeared   Individual language regions' responses to the conditions of Exper-1 iment 1 and 2.

2
In the main analysis (Figure 2c-d, Table 1), we reported a model that examined the response in the language 3 network as a whole (treating the six regions as random effects). As shown in Figure S1 and Table S3, the 4 pattern observed across the network was present-both qualitatively and statistically-in each of the six 5 language fROIs individually.   Brain regions sensitive to the sentence-scrambling manipulation. 1 A critical result observed in both Experiments 1 and 2 is the lack of neural response reduction in the 2 language regions for sentences where word order has been permuted yet the level of PMI among nearby 3 words remained high (which, we hypothesize, allowed for semantic composition). Behaviorally (see above), 4 we found that participants are highly sensitive to the scrambling manipulation as evidenced by progressively 5 lower acceptability ratings for more scrambled sentences. Here, we asked whether any parts of our brain 6 work harder when we process more scrambled sentences. Neural responses (in % BOLD signal change relative to fixation) to the conditions of Experiments 1 (top) and 2 (bottom), as well as the language localizer and spatial WM task.
Discovery of scrambling-responsive regions. To search for brain regions sensitive to scrambling, we per-8 formed a group-constrained subject-specific (GSS) whole-brain analysis [1]. This analysis searches for spa-9 tially consistent (across individuals) patterns of activation while taking into account inter-individual vari-10 ability in the precise loci of activations, which increases sensitivity relative to traditional random-effects 11 analyses that assume voxel-wise correspondence across people [56]. We chose a contrast between the most 12 scrambled condition that was shared between the two experiments (i.e., Scr5) and the intact condition.

13
Pooling data across experiments (n=47), we took individual whole-brain activation maps for the Scr5>Int 14 contrast and binarized them so that voxels that show a reliable effect (significant at p < 0.05, uncorrected 15 at the whole-brain level) were turned into 1's and all other voxels were turned into 0's. (  threshold for the individual activation maps to maximize our chances of detecting regions of interest; as 1 explained below, however, the resulting regions were subsequently evaluated using statistically conservative 2 criteria.) We overlaid these maps to create a probabilistic activation overlap map, thresholded this map to 3 only include voxels where at least 4 of the 47 participants showed activation, and divided it into "parcels" 4 using a watershed image parcellation algorithm (see [1] for details). Finally, we identified parcels that-when 5 intersected with the individual activation maps-contained supra-threshold (i.e., significant for our contrast 6 of interest at p < 0.05, uncorrected) voxels in at least half of the individual participants. Four parcels 7 satisfied this criterion, located in the middle frontal gyrus bilaterally and in the SMA ( Figure S2a). in Experiment 1, the response stays high or continues to increase for the Scr7 condition, but in Experiment 43 2, the response appears to fall off for the LowPMI condition. To quantify this non-monotonic pattern, we 44 collapsed across experiments and conducted a mixed effect linear regression with first and second order terms for Edit Distance 1 (i.e., the number of swaps required to reconstruct the original intact sentence) as fixed 1 effects and random slopes, and random effects for Participant and Region of Interest. We find a significant 2 initial increase in response which decreases slowly as stimuli become more degraded (Table S4).

3
With respect to the conditions of the language localizer and the spatial WM experiment, we found that 4 none of the four fROIs showed a stronger response to sentences than nonword sequences (in fact, three of 5 the four regions showed a reliably stronger response to nonword sequences than sentences); and that all four 6 fROIs showed a stronger response to the Hard than Easy WM condition. These results suggest that the remains high, which we argue allows for complex meaning construction. 23 1 Switching from a categorical (Condition) to continuous coding (Edit Distance) permits testing non-monotonicity and increases the sensitivity to detect an effect. To ensure that our analysis of the language network is robust to this switch, we conducted the same analyses looking for either a linear (i.e., first order) or non-linear (i.e., second order) effect of Edit Distance. Consistent with our Categorical analysis, there was no effect of Edit Distance on change in response in the language regions.  Figure S3: Neural responses (in % BOLD signal change relative to fixation) to the conditions of Experiments 1 (top) and 2 (bottom), as well as the language localizer and spatial WM task in each of the four scramblingresponsive fROIs.