Effects of Individual Differences on Knowledge and Wisdom of Society: a Social Modeling Approach

Categorically organized knowledge is the main vehicle in high-level cognitive processes. The previous empirical and theoretical studies on categorization paid almost exclusive attention to how individuals learn categorical knowledge. In the real world, however, people acquire knowledge not only through individual learning, but also through interacting with others. In the present study, using computational modeling, we explored how social interactions would produce unique dynamics of knowledge acquisition that cannot be examined by studies on micro level processes. The results of simulation studies showed that when there were several clusters of individuals in a society where individuals held different beliefs about what constitutes " good " knowledge, then the society as a whole formed Pareto-optimal knowledge. That is, there was no cluster of knowledge that was simultaneously worse in two important aspects of knowledge (i.e., accuracy and simplicity) as compared with those of other clusters in a mature society.


Introduction
Categorically organized knowledge is the main vehicle in high-level cognitive processes, such as reasoning and communication (e.g.Murphy, 2002).Categorical knowledge which is often referred to as concepts allow us to achieve very complex cognitive tasks by effectively compressing overwhelmingly abundant information into manageable and meaningful chunks.Because of its importance, cognitive processes associated with categorization have been widely studied in the area of Cognitive Science, both empirically with behavioral experiments and theoretically with computational modeling techniques.In the empirical studies, researchers usually create experimental settings where individual participants learn categories by corrective feedback, providing empirical evidence about how people acquire knowledge through individual learning (e.g., Cohen & Lefebvre, 2005).Using the results of these empirical studies, computational studies also pay almost exclusive attention to how individuals learn categorical knowledge.
However, in the real world, people acquire categorical knowledge not only through individual learning, but also through interacting with others.Pentland (2007) argued that influences of social structures and activities need to be considered in order to better understand true human cognitive behaviors.Likewise, Goldstone and Janssen (2005) emphasized the importance of research on collective behavior.For example, they pointed out that "interacting ants create colony architectures that no single ant intends," indicating that social interactions can produce unique dynamics of knowledge acquisition that cannot be clarified by studies on individual's micro-level processes in knowledge acquisition.
In the present paper, we examine how a society as a whole would acquire categorical knowledge where some degree of individual differences exist in the society.

Computational Models
In the present paper, we used ALCOVE (Kruschke, 1992) as the model of individuals' categorization processes, and an optimization method based on evolutionary computation techniques as the model of social learning processes.

Individuals' Categorization Algorithm -ALCOVE
ALCOVE is a computational model of category learning that assumes that humans store many previously seen or experienced exemplars in their memory, and categorize input stimuli on the basis of psychological similarities between the inputs and the memorized exemplars.Psychological distances between an input stimulus and those memorized exemplars activate exemplar nodes in ALCOVE.Exemplars that are "psychologically" similar to an input are more highly activated than exemplars that are "psychologically" dissimilar.Specifically, as shown in Eq. 1, jth exemplar's activation (h j ) in ALCOVE is based on the inverse distance between an input, x, and a stored exemplar, ψ j , in multi-dimensional representational space where each dimension (i) is scaled by non-negative selective attention weights, a i : where β is called specificity which determines an overall similarity gradient, and superscript m indicates a categorization strategy or knowledge held by a particular individual m.Because our learning algorithm is built on the basis of a stochastic optimization technique, dimensional attention weights take the following form to attain stability in the model's behaviors: where D i is a pseudo-attention weight that is being updated in learning (not as).
The exemplar activations are then fed forward to the k-th output node (e.g., output for category k), O k , weighted by w kj , which determines the strength of association between exemplar j and output node k: The probability of categorizing input instance x to category C is based on the activation of output node C relative to the activations of all output nodes: where φ controls decisiveness of the classification response.
Higher φ values cause more extreme decisions.

Overview of Learning Algorithm
In the present research we assumed that quite simple learning processes take place in a society.In particular, we assumed that people communicate and exchange their knowledge with others where each individual would combine his or her knowledge with that of another individual.We refer to this process as "Knowledge Combination."After combining knowledge, each individual is assumed to modify his or her own knowledge by randomly altering it.We refer to this process as "Knowledge Modification."Knowledge Combination and Modification together may be interpreted as formations of new hypotheses.Finally, we also assumed that each individual has their own belief about what constitutes "good" knowledge, and knowledge that is believe to be good will be kept by individuals and therefore by the society.We refer to this process as "Knowledge Selection." In modeling the abovementioned learning strategies that take place within a society, we incorporated a type of Evolution Strategy (ES) techniques in the present research.An ES is a type of evolutionary computation method that is typically used for continuous parameter optimization.Knowledge Combination is achieved by what is called crossover in evolutionary computation literature in which randomly selected two individuals exchange their knowledge (i.e., parameters or coefficients in ES).Knowledge Modification is achieved by a process called mutation in which a small random value drawn from the Normal distribution is added to each element of knowledge (i.e., parameter).After new knowledge is formed through Knowledge Combination and Modification, each individual assesses his or her own knowledge on the basis of self-defined knowledge utility.Knowledge with high utility values will be kept by individuals and the society, while that with low utility values will be discarded.
Social structure There are few assumptions about how a society is organized.We assumed that people have interactions with a limited number of individuals, forming clusters of individuals.In other words, our model of a society has a highly locally clustered structure like a small world network (Watts & Strogatz, 1998).Previous studies have shown that many real world networks have analogous network structure to a small world network.For example, collaboration networks of film actors (Watts & Strogatz, 1998), networks of scientific collaboration (Newman, 2001), and ownership links among German firms (Kogut & Gordon, 2001) are shown to be structured as small world networks.
We further assumed that the principle of homophily exists in a society such that people who have similar beliefs (about constitutes "good" knowledge) would have close relationships with each other and that those who have close relationships would learn from each other.This assumption has reasonable face validity as, for example, right-wing conservatives often omit what is being stated by left-wing liberals or vise versa.For the sake of simplicity we assumed that people exchange information only with people from the same cluster, meaning that there are several independent or segregated clusters in a society (thus, although there several local clusters within a society like a small world network, our model of a society is not organized as a small world network as individuals from different clusters are not connected).People within the same cluster have the similar beliefs about constitutes good knowledge, while different clusters of individuals possess different beliefs.Knowledge Combination and Knowledge Selection take place within clusters (Knowledge Modification takes place within individuals).

Knowledge Combinations
In Knowledge Combination, randomly selected pairs of individuals within a cluster exchange information to form new knowledge.For the sake of simplicity, we use the following notation {w (m) , D (m) } ∈ θ (m) .The model utilizes discrete recombination for knowledge parameters (i.e., θs).Thus, where UNI is a random number drawn from the Uniform distribution.For self-adapting strategy parameters (i.e., σs), intermediary recombination (simple arithmetic average) is Artificial Life 13 used, thus σ ).The parameters for self-adaptation are the parameters that define search widths (i.e., learning rates) for the parameters for knowledge (i.e., w, D).A unique search width is allocated to each association and attention weight within individuals so that sensitivity to objective hypersurface is individually tailored to meet his or her learning objectives.
This combination process continues until the number of new knowledge produced reaches the memory capacity of the model.

Knowledge Modifications
After Knowledge Combination, each individual randomly modifies his or her knowledge, using a self-adapting strategy.Thus, where t indicates time, l indicates parameters, γ defines search width (via σ's), and N (0, σ) is a random number drawn from the Normal distribution with the corresponding parameters.

Knowledge Selection
We assumed that there are two "universally" important elements in determining utility of knowledge on categorization.One is accuracy and the other is simplicity.Everyone, regardless of his or her belief about what constitutes good knowledge, evaluates his or her knowledge on the basis of those two elements.However, individuals from different clusters differently weight the importance of those two elements.In the present research we operationally define different beliefs by different sets of weight vectors

Accuracy (inaccuracy)
In the model, inaccuracy (thus accuracy) of a particular set of parameters (knowledge) is estimated based on a set of all unique exemplars in a training set (i.e., errors in batch learning).Thus, knowledge inaccuracy is given as follows: where superscript n indicates a particular input-output pair, N is the number of unique training pairs, and d k is the desired output value ('1' if for k is a correct category, and 0 otherwise) for category k, and P k|x (n) is a probably that input x (n) being categorized as k.The desired output values are assumed to be obtained individually and thus Knowledge Inaccuracy is individually estimated.
For modeling individual learning processes, a batch learning method may be not psychologically valid (e.g., Matsuka, Sakamoto, Chouchourelou, & Nickerson 2008).In order to more precisely model individual's learning processes, this inaccuracy function can be easily extended to include a retrospective verification process (e.g., Matsuka, et al., 2008) that simultaneously accounts for laws of learning and forgetting (Anderson & Schooler,1991).

Simplicity (complexity)
There are two separate elements that defines knowledge simplicity (complexity), one based on association weights and other based on attention weights.The complexity measure based on association weights is as follows: This complexity measure simply signify absolute magnitudes of association weights.Thus, when exemplar nodes and category nodes are weakly associated in general, this measure tends to be small.This measure does not directly take into account the number of exemplars memorized and utilized.On the other hands, the complexity measure for attention weights take into account the number of feature dimensions being attended.
This measure tends to be small when a smaller number of feature dimensions is selectively attended.Note that this measure is estimated based on selective attention weights as, but not pseudo-selective attention weights Ds.
The overall knowledge complexity is the sum of two complexity measures, thus Comp x (m) = Comp

Individual Differences in Learning Objectives
Although we assumed that all individuals take both accuracy and simplicity into account in learning, there are some individual differences in weighting those two elements.We consider the difference in weights corresponds to difference in their beliefs.We define v κ E as a scaler weighting for relative importance of Knowledge Inaccuracy, and v κ comp = 1 − v κ E for Knowledge Complexity.Using these weights and Knowledge Inaccuracy and Complexity measures, we let as an overall fitness value of knowledge for a given belief (a particular Inaccuracy -Complexity weighting vector).

Simulation
In order to explore how social interactions would produce unique dynamics of knowledge acquisition, two simulation studies were conducted.In both simulation studies, the model, thus, a society was given simple categories to learn.
In Simulation 1, we examined characteristics of knowledge acquired by the society as a whole, using a stimulus set from a classical study (Medin & Schaffer, 1978).Simulations 2 was conducted to confirm the results of Simulation 1 and to propose a new way to analyze the properties of category structures.

Simulation 1
Method Table 1 shows schematic representation of stimulus set, which was adapted from Medin & Schaffer (1978).
The model was run in a simulated training procedure with 500 trial blocks (generations), where each block consisted of a random presentation of the nine unique training exemplars (see Table 1) exactly once, in order to learn the categories.The model parameters were arbitrary selected: β = 2.0, φ = 3.0, γ=0.1.There were 50 clusters within which there were 10 simulated individuals, thus there were a total of 500 individuals in Simulation 1.The scaler weights that define relative importance for Knowledge Accuracy (i.e., v κ E ) were evenly spread from 0 and 1 for the 50 clusters (i.e., 0.000, 0.0204, 0.0408, ..., 1).Note that the weight for Knowledge Complexity was 1 minus Knowledge Accuracy (v κ comp = 1 − v κ E ).

Results and Discussion
Figure 1 shows characteristics of knowledge acquired by individuals in a society, where each dot represents knowledge acquired by one individual and knowledge characteristic of every individual is plotted.The vertical axis represents error (i.e., Knowledge Inaccuracy), while the horizontal axis represents Knowledge Complexity.The figure shows that there was a great degree of individual differences in acquired knowledge.Some individuals acquired very accurate knowledge at the cost of complexity, while others acquired very simple knowledge at the cost of accuracy.It also shows that the society as a whole formed Pareto-optimal knowledge.That is no individual acquired knowledge that was worse in both Knowledge Inaccuracy and Knowledge Complexity as compared with those of other individuals, or no individual acquired knowledge that was better in both Knowledge Accuracy and Knowledge Simplicity as compared with others.The results can be interpreted as that a society would acquire cluster of knowledge that exceed at least one important aspect of knowledge when there are individual differences in beliefs and values and when individuals learn from others who share similar beliefs and values.This result was not surprising, because our model resembles one of multi-objective evolutionary optimization methods called vector evaluated approach (Deb, 2001).The resemblance may indicate that the principle of homophily (i.e., people who have similar beliefs tend to have close relationships with each other) and individual differences together can lead a society to acquire and hold paretooptimial knowledge.
Another interesting result was that there were some individuals who did not have any clue about categories (i.e., individuals whose knowledge accuracies were at the chance level).Although it may sound a bit odd that some individuals did not learn this type of simple categories, the result is very much expected because those individual did not care about how accurate their knowledge was (i.e, v κ E = 0) as long as their knowledge was at a minimum complexity (v κ comp = 1).This type of individuals is uncommon in a society -some people are ignorant about certain things.Using a social simulation approach, we were able to reproduce a wide variety of individuals with different types of knowledge about categories.

Simulation 2
Simulations 2 serves two purposes.One is to confirm the results of Simulation 1.The other is to propose a new way to Figure 2 shows characteristics of knowledge acquired by individuals in a society, where each dot represents knowledge acquired by one individual.As in Simulation 1, some individuals acquired very accurate knowledge at the cost of complexity, while others acquired very simple knowledge at the cost of accuracy, resulting in Pareto-optimal knowledge acquisition by a society in Simulation 2. This confirms that the principle of homophily and individual differences together can lead to acquisition of pareto-optimal knowledge by a society.
Four separate pareto-front lines were resulted from learning four categories.Given that the simulated learning processes were minimization problems (minimizing Knowledge Inaccuracy and Complexity), a line that is closer toward 0s in both objectives represent a category that is easier to learn, where easiness is defined by complexity relative to inaccuracy or vice versa.Thus, Simulation 2 replicated the order of difficulties for those categories suggested by empirical (Nosofsky et al., 1994;Shepard et al., 1961) and theoretical studies (e.g.Feldman, 2003).This implies that our simulation method can be used as a tool to analyze characteristics of category structures and/or to evaluate psychological validities of models of categorization or category learning.In fact, when a typical prototype model, which assumes that people hold one prototype for each category and categorize an input on the basis of psychological similarities between the input and the prototypes, was used for simulations, T3 was found to be "easier" to learn than T2, being inconsistent with empirical findings.This result suggests that a typical prototype model of categorization is inadequate in describe real human cognitive behaviors.
What is prominent our approach is that, unlike traditional theoretical approaches that are built on the basis of normative accounts (i.e., how human should think or behave), it can incorporate even subjective beliefs and attitudes into objectives of learning as long as they are consistent with real human cognitive behaviors.In other words, a social category learning simulation paradigm that incorporate the principle of homophily and individual differences is an effective exploratory tool in examining the nature of our bounded cognitive rationality and cognitive demands required by realistic contexts and situations.

Conclusion and Future Directions
Categorically organized knowledge is inarguably the main vehicle in high-level cognition.Unlike previous studies which primally focus on individual learning processes, we examined learning processes that take place in a society.In so doing we assumed that the principle of homophily (i.e., people who have similar beliefs tend to have close relationships with each other) and individual differences exist in a society.In two simulation studies that incorporated those two characteristics, we found that the society would acquire pareto-optimal knowledge, such that no cluster of knowledge that was worse in two important aspects of knowledge (i.e., accuracy and simplicity) as compared with those of other clusters.
In addition, our social category learning simulation was found to be an effective exploratory tool in examining the nature of our bounded cognitive rationality and cognitive demands required by realistic contexts and situations.
A natural extension of the present study is to examine other types of social structure, including small world networks (Watts & Strogatz, 1998) and scale free networks (Barabsi, & Albert,1999).The principle of homophily and individual differences are not uncommon in a society, but presence of clearly segregated clusters might not have been realistic.Honda and Matsuka (2011) showed that when a network consists of several clusters (i.e., a small world network) a society as a whole can maintain diverse knowledge.Although, their simulation studies paid more attention to structure of networks and incorporated rather simple learning algorithms, we expect somewhat similar findings when we use small world networks in our simulation paradigm.Additional simulation studies are needed to confirm this speculation and to see the dynamics of knowledge acquisition in a scale free network.
In the present study, we showed that presence of the principle of homophily and individual difference are robust characteristics of a society that leads to acquisition of paretooptimal knowledge.

Figure 1 :
Figure 1: Results of Simulation 1.This figure shows characteristics of knowledge acquired by a society, where each dot represents knowledge acquired by one individual.Some individuals acquired very accurate knowledge at the cost of complexity, while others acquired very simple knowledge at the cost of accuracy.The results shows that the society as a whole formed Pareto-optimal knowledge.

Figure 2 :
Figure2: Results of Simulation 2. As in Simulation 1, some individuals acquired very accurate knowledge at the cost of complexity, while others acquired very simple knowledge at the cost of accuracy.There were four separate pareto-front lines for four types of categories.It replicated the order of difficulties suggested by empirical and theoretical studies.

Table 1 :
Schematic representation of stimulus set used in Simulation 1Cat