The network architecture of value learning

Value guides behavior. With knowledge of stimulus values and action consequences, behaviors that maximize expected reward can be selected. Prior work has identified several brain structures critical for representing both stimuli and their values. Yet, it remains unclear how these structures interact with one another and with other regions of the brain to support the dynamic acquisition of value-related knowledge. Here, we use a network neuroscience approach to examine how BOLD functional networks change as 20 healthy human subjects learn the values of novel visual stimuli over the course of four consecutive days. We show that connections between regions of the visual, frontal, and cingulate cortices become increasingly stronger as learning progresses, and that these changes are primarily confined to the temporal core of the network. These results demonstrate that functional networks dynamically track behavioral improvement in value judgments, and that interactions between network communities form predictive biomarkers of learning.

(a) Individual learning curves for each of the 20 participants. Participants with task performance below 75% at the last run were excluded from the analyses in the main text.
(b) Average relative motion within scan runs. Participants with average relative motion larger than three standard deviations away from the mean in two of more scans were excluded from the analyses in the main text.
(c) Average absolute motion within scan runs. Participants with average relative motion larger than three standard deviations away from the mean in two of more scans were excluded from the analyses in the main text. (a) Average accuracy on each day in selecting the most valuable shape from the pair. Learning progressed as expected, with pairs of stimuli with similar value having worse task accuracy. Notice that all pairs of stimuli could be sufficiently discriminated as evidenced by accuracy values over 0.75 and by the absence of any pair with lower accuracy. (b) Multidimensional scaling analysis. The two dimensions accounting for the most variance (accounting for 61% when combined) are displayed. In this plot, stimulus similarity can be inferred by the proximity in the reconstructed space. These results show that stimuli were roughly organized in a circular fashion, with proximity clearly reflecting value similarity. The circular arrangement reflects the fact that accuracy is at ceiling for all pairs of stimuli whose value distance is beyond a certain threshold (approximately $5).
(c) Individual learning curves for each pair of stimuli. We fit each curve with an exponential curve with two free-parameters: the exponential parameter and the asymptotic accuracy. All fits were constrained to start at chance level. We observed accuracy for all but a few pairs increased steadily over the course of the experiment towards a maximum level of 1.0. The only major exception was the pair ($5, $6) whose accuracy converged to 0.74, which, as seen on Fig. 1a, are easily distinguishable based on visual similarity alone.  Figure S4: Related to Figure 3. Community-level interactions related to value learning (a) Correlation between average edge weight within/between communities and task accuracy for absolute feedback group. (b) Correlation between average edge weight within/between communities and task accuracy for relative feedback group. (data from both groups), but forming predictive networks after the removal of the average network strength at each scan. Cell colors and numbers represent classification accuracy in labeling held-out data as coming from scans early or late in the learning process. Data from the left-out participant was significantly classified above chance (50%) when the predictive network was comprised of edges connecting: (i) visual and fronto-parietal modules (accuracy: 93.75%; one-tailed binomial test, adjusted P -value: P = 0.0054); (ii) visual and somato-motor modules (accuracy: 87.50%; one-tailed binomial test, adjusted P -value: P = 0.044); and (iii) visual and cingulo-opercular modules (accuracy: 100%; one-tailed binomial test, adjusted P -value: P = 0.00032).  Figure S6: Related to Figure 6. Prediction of feedback-type from functional networks (a) We used a support-vector machine with leave-two-out cross-validation to classify feedback type based on the entire set of edge weights. On each cross-validation fold, the w-map represents how useful each feature (edge weight) is at discriminating between conditions. The figure displays the average z-scored w-map, limited to connections with z > 1.96 (green: absolute feedback; purple: relative feedback). Notice the similarity with Fig. 5a. (b) To gain insight into the specific modules that enable classification of feedback-type, we conducted the analyses in the main text separately for each pair of communities, selecting the top 10% edges at each cross-validation fold. Cell colors and numbers represent classification accuracy in labeling held-out data as coming from participants in the absolute-vs.-relative feedback group. Communities whose interactions classified held-out data significantly (permutation tests, Bonferroni corrected at α = 0.05) are highlighted. Community order is displayed in the bottom-right.We observed that interactions involving somato-motor, fronto-temporal, and caudate modules were modulated by feedback type. (c) Related to Fig. 5b. Average functional connectivity on DAY 4 between Somato-motor and Visual, and between Somato-motor and GP/NAcc modules displayed separately for each feedback group. Neither difference nor the interaction was significant (two-way ANOVA interaction: F (1, 28) = 2.13, P = 0.88; two-sample t-tests: t(14) = 0.86, P = 0.40, t(14) = 1.82, P = 0.09).
(d) Related to Fig. 5b. Average functional connectivity on DAY 1 (left) and DAY 4 (right) between Somato-motor and Visual, and between Somato-motor and GP/NAcc modules displayed separately for each feedback group, after subtracting the mean connectivity from each adjacency matrix. In line with our hypotheses, we observed a significant interaction (two-way ANOVA interaction: F (1, 28) = 10.8, P = 0.0027), with connectivity between somato-motor and visual modules being stronger for the absolute feedback group (two-sample t-test on Fisher normalized correlation values: t(14) = 3.02, P = 0.0092), and connectivity between the somato-motor and basal ganglia modules being stronger (though not significantly) for the relative feedback group (two-sample t-test on  Fig. 5b. Average functional connectivity on DAY 1 (left) and DAY 4 (right) between fronto-parietal and somato-motor, and between fronto-parietal and GP/NAcc modules displayed separately for each feedback group, after subtracting the mean connectivity from each adjacency matrix. We observed a significant interaction (two-way ANOVA interaction: F (1, 28) = 4.51, P = 0.043), with connectivity between fronto-parietal and somato-motor modules being stronger (though not significantly) for the absolute feedback group (two-sample t-test on Fisher normalized correlation values: t(14) = 1.10, P = 0.29), and connectivity between the fronto-parietal and basal ganglia modules being stronger (though not significantly) for the relative feedback group (two-sample t-test on Fisher normalized correlation values: t(14) = 1.93, P = 0.074). The interaction and differences were not significant on day 4 (two-sample t-tests: t(14) = 0.78, P = 0.45, t(14) = 0.079, P = 0.94; two-way ANOVA interaction: F (1, 28) = 0.16, P = 0.69).   Figure S9: Correlation between weight changes on specific network edges and task accuracy We examined the heterogeneity of effects across regions of the visual cortex in order to determine the level of granularity across regions involved in object perception and nearby regions. We considered five visual areas (in both hemispheres): Lateral Occipital Cortex (LO), a region known to be involved in object perception, and a region immediately anterior and ventral, Inferior Temporal Gyrus (ITG); A second region involved in object perception, Posterior Fusiform Gyrus (pFus), its anterior counterpart in the Temporal Occipital Fusiform Cortex (aFus), and the region immediately medial to it, the Lingual Gyrus (LG). We also considered two regions in the value network (in both hemispheres): the ventral-medial Prefrontal Cortex (vmPFC) and the Anterior Cingulate Cortex (ACC). We then calculated, for each subject, the Pearson correlation between edge weight and learning rate. Our results show an overall trend for positive correlation values, indicating that links between regions of the visual and value networks tend to grow stronger as learning progresses. Yet, this pattern was not equally significantly expressed in all regions of the visual cortex. In particular, edges connecting ITG with vmPFC; left-ITG with ACC; and LG with left-vmPFC were not significantly correlated with learning rate. These results suggest that learning requires changes in network edges that are relatively spatially specific.

Insights into fundamental constraints on dynamic network architecture 8
In the main text, we describe specific network components at various levels that change in concert with the 9 learning of value. Our initial results demonstrate that functional networks, in general, change gradually over 10 time as subjects learn stimulus values (Fig. 2). In subsequent sections, we then examined the role of specific 11 components of the network at the scales of nodes and edges ( Fig. 3; Fig.4; Fig. 5). In the present section, 12 we investigate the question of how these components relate to the mesoscale architecture of the network, in 13 order to gain a greater insight into the properties that may relate to the ability of a network component to 14 modulate (with) learning. In particular, we focus on the analysis of node flexibility, which has been used 15 successfully in the past to describe the reconfiguration patterns of network modules throughout learning. 16 Specifically, network flexibility examines the degree to which each brain region changes its allegiances to 17 network modules over time [1]. By grouping regions according to whether they are more or less flexible than 18 a suitable statistical null-model, the temporal core and periphery of the network can be reliably identified.

19
In the context of a motor-skill learning task, the core-periphery organization of a network has been used 20 to understand how putative functional modules are linked. Specifically, the core is composed of regions 21 whose connectivity change little over time (sensorimotor and visual regions in the case of a motor-skill learning task), while the periphery is composed of regions whose connectivity changes frequently (primarily 23 multimodal association regions), and the separation between these two large groups is predictive of learning 24 rate in a motor task [1]. In a subsequent study, regions of the network core were also observed to have

38
To test these hypotheses, we first examined the temporal variability of community structure by com- calculating the amount of variance (R-squared) in task accuracy explained by each region: that is, the aver-45 age R-squared over all edges departing from that region. We observed a strong negative relationship between 46 node flexibility and variance explained: regions with low flexibility explained larger amounts of variance in task accuracy than regions with high flexibility (Pearson's r = −0.49, t(15) = −5.37, P < 0.0001; Fig. S10a). 48 We wished to test whether this relationship could be accounted for by a higher flexibility in nodes 49 with lower signal-to-noise ratio. To that end, we calculated, for each node, the temporal SNR (tSNR), 50 defined as the mean signal of the fMRI time series divided by its standard deviation across time [5]. We

58
These results suggest that the regions whose functional connectivity tracks task accuracy are those in a 59 temporal core of relatively rigid areas whose affiliation with functional modules remains steady throughout 60 task practice [1]. To determine whether this is indeed the case, we next categorized brain regions into 61 temporal core and temporal periphery by assessing whether a region's flexibility was less than or greater 62 than expected in a nodal null model, respectively [1]. Using this approach, we observed that the network core  (Fig. S10b). Moreover, the communities that we previously observed to be related to learning 67 were not only the most rigid ones (Fig. S10a,b), but were also less flexible than expected by their size alone 68 (Fig. S10c).
involved in learning, we note that the rigidity corresponding to low flexibility simply indicates that the 71 region does not change module affiliation very frequently -indeed, a region may certainly change module 72 affiliation, for example, over the course of days, and still be classified as part of the core. Relatedly, a region 73 that changes its module affiliation very frequently is unlikely to be involved in the gradual learning that 74 occurs over the course of many days, and is instead more likely to reflect domain-general processes, related 75 to or not related to the task. These points support our theoretical reasoning that core regions, which have 76 stable partners during task execution, are also the ones more likely to change with learning.  Figure S10: Regions in the network core are more associated with learning than regions in the network periphery. (a) The variance in task accuracy explained by each of the 112 nodes (calculated as the average variance explained across all edges departing from a node) is negatively correlated with node flexibility (Pearson's r = −0.49, P < 0.001). Each circle corresponds to a brain region and is colored with the color of its corresponding community (Fig. 3d). The shaded area corresponds to the 95% confidence interval of the nodal null model described in (b), which separates nodes into a temporal core and a temporal periphery. (b) A nodal null model was constructed by rewiring the ends of the multilayer network's interlayer edges uniformly at random 100 times. The temporal "core" was then defined as the set of regions whose mean nodal flexibility was below the 2.5% confidence bound of the null-model distribution, and, similarly, the temporal "periphery" was defined as the set of regions whose mean nodal flexibility was above the 97.5% confidence bound of the null-model distribution. The temporal core consists of regions of the visual, frontal, and (right) motor cortices. The temporal periphery consists of subcortical regions and regions of the anterior temporal lobe. (c) Average flexibility within each network community controlling for community size. Two communities exhibited flexibility significantly below that expected by its size: (i) fronto-parietal (f = −0.019, P < 0.001), and (ii) visual (f = −0.015, P < 0.001); and two communities exhibited flexibility significantly above that expected by its size: (i) fronto-temporal (f = 0.032, P < 0.001), and (ii) GP/NAcc (f = 0.011, P = 0.019).