The Multidimensional Research Agendas Inventory—Revised (MDRAI-R): Factors shaping researchers’ research agendas in all fields of knowledge

Abstract This study creates a novel inventory that characterizes factors influencing the research agendas of researchers in all fields of knowledge: the Multi-dimensional Research Agendas Inventory-Revised (MDRAI-R). The MDRAI-R optimizes an initial inventory designed for the social sciences (the MDRAI) by reducing the number of items per dimension, improving the inventory’s psychometric properties, and including new dimensions (“Academia Driven” and “Society Driven”) that reflect the greater influence of social and organizational structures on knowledge production and demands for research impact. This inventory enhances our ability to measure research activities at a time when researchers’ choices matter more than ever, and it will be of interest to researchers, policy makers, research funding agencies, and university and research organizations.


INTRODUCTION
With research playing an increasingly central role in driving knowledge creation in fast-paced, globalized, connected, uncertain, and technology-driven contemporary societies, it is critical to better understand the factors that influence researchers' research agendas, particularly those based in academic settings. This is important for not only researchers but also those looking to create added value from the available research, such as policy makers, research funding agency managers, and university and research laboratory administrators (Ciarli & Ràfols, 2018;Franzoni & Rossi-Lamastra, 2017;Wallace & Ràfols, 2018). Understanding the factors that influence researchers' research agendas is ultimately relevant to the development of science itself at a time when researchers are facing global, multifaceted, and increasingly complex challenges, and more and more research output is being produced without necessarily leading to breakthroughs (Young, 2015). Today, a key premise in science is that researchers' strategic research choices matter, because these choices (which are to some extent personal in nature) shape the knowledge produced and the general orientation of the broader research efforts and future research directions (Polanyi, 2012). Although researchers' choices of research agendas have been examined in seminal works in the sociology of science (Zuckerman, 1978), the area remains underexplored and has mostly been analyzed from a qualitative perspective (Luukkonen & Thomas, 2016;McGrath, 1981;Shwed & Bearman, 2010). The literature shows that the cultures, traditions, and dispositions of fields of knowledge have a fundamental influence on researchers' choices of research (Becher & Trowler, 2001). Disciplinary cultures become embedded in the habitus of researchers, as they feel that they belong to and identify with specific knowledge-based research communities and abide by these communities' values, norms, and attitudes (Bourdieu, 1975). This occurs as part of a path-dependent process that begins with the researchers' socialization through their doctoral studies to become independent researchers (Jung, 2018;Mantai, 2017). During this time, the researchers learn how to conduct research while accumulating expertise and developing, under supervisory guidance, research interests that are likely to resonate with and influence the current and future research choices (Åkerlind & McAlpine, 2017;Brew, Boud, & Malfroy, 2017). Research agendas can be influenced by students' mentors during their doctoral studies and in the years after completion. Collaboration with peers and other stakeholders can also influence the design of research agendas, as collaborations bring novel information, expertise, and perspectives and the possibility of serendipitous opportunities to engage in innovative, disciplinary, and multidisciplinary research (Kingdon, 2013;Shi, Foster, & Evans, 2015).
The patterns of collaboration are increasingly likely to influence the research agendas of researchers at a time when their career trajectories are increasingly nonlinear (Hancock & Walsh, 2016). Nonetheless, prestige and recognition by peers in the field continue to be critical signals of important contributions to the pool of knowledge and tend to drive successful careers (Kim & Kim, 2017). In the "publish or perish" research environment, where performativity has become central to career survival and progression, researchers might well be encouraged to engage in research agendas that promise prolific research output (i.e., publications) with high levels of visibility and recognition (i.e., citations) and possibilities of funding. According to the Mertonian rationales of science (e.g., the Matthew effect and cumulative advantage in science), such output can lead to further publications, visibility, funding, and collaboration, including invitations to collaborate in others' research agendas (Allison, Long, & Krauze, 1982;Merton, 1968). These activities and dynamics define and are defined by the research agendas of researchers through interactive processes, as researchers position themselves (and their interests) within their research communities (Whitley, 2000).
A few recent studies add to our understanding of researchers' choices of research agendas from a quantitative perspective (Foster, Rzhetsky, & Evans, 2015;Horta & Santos, 2016;Santos & Horta, 2018;Ying, Venkatramanan, & Chiu, 2015). These studies mainly focus on a single field of knowledge or disciplinary area, such as biomedicine  or higher education (Santos & Horta, 2018). Interestingly, these studies examine the tensions between the two main research strategies identified by Kuhn (2012): that is, between the conservative research strategies that are part of "normal science," and are characterized as safe and representing incremental contributions over time, and riskier strategies that tend to be more innovative and disruptive in searching for new paradigms. Only one of these quantitative studies offers an inventory for identifying the factors that influence the research agendas of researchers (Horta & Santos, 2016). Although, to the best of our knowledge, this inventory, the multidimensional research agenda inventory (MDRAI), is the first of its kind, it was designed with social science researchers in mind. Our study aims to extend the MDRAI. Using a data set on over 12,000 researchers located all over the world and from all fields of knowledge who provided key information about their research agendas in an online survey carried out in 2017 and 2018, we develop a novel instrument that identifies the key factors influencing the research agendas of researchers in all fields of knowledge. Our MDRAI-R optimizes the initial MDRAI developed by Horta and Santos (2016) by reducing the number of items in each dimension of the original inventory and including new dimensions relevant to fields of knowledge not considered in the original instrument. Moreover, our revised MDRAI-R is valid for all fields of knowledge.
This study largely focuses on the methodological development of the MDRAI-R. To a lesser extent, it also stresses, wherever applicable, the substantive insights that underline its evaluative applicability in current knowledge producing settings. The methodological development of the MDRAI-R is based on a pilot study and a comprehensive psychometric evaluation that includes exploratory factor analysis (EFA), confirmatory factor analysis (CFA), validity, reliability, and sensibility evaluations, and tests of measurement invariance.

FROM MDRAI TO MDRAI-R
The MDRAI is based on the classical tenets of the sociology of science and focuses on researchers' personal and environmentally influenced motivations. It is also based on the literature on academic research and work and the changing world of science, research, and academia that underlines the increasing importance of networking, competitiveness, and resources (Horta & Santos, 2016). The MDRAI covers eight dimensions, four of which have subdimensions. The first dimension is Scientific Ambition, which refers to the desire for recognition by peers, as most researchers strive to have their contributions to knowledge acknowledged by their peers and gain prestige by doing so (Latour & Woolgar, 2013). This dimension has two subdimensions. Prestige, which represents the researcher's desire for recognition, and Drive to Publish, associated with the need to produce concrete evidence of the creation of new knowledge through the proper channels recognized by the knowledge community as appropriate for disseminating and increasing the credibility and visibility of knowledge. The second dimension, Convergence, relates to the researcher's preference for research agendas that have a clear disciplinary focus. This dimension refers to a researcher's decision to build a position of authority in a sole disciplinary field. Although this usually takes a substantial amount of time (Allison et al., 1982), it can be part of a specialization strategy linked to higher research productivity gains because it avoids the transaction costs of disciplinary mobility (Leahey, 2007). Convergence has two sub-dimensions: Mastery, representing the expertise of a researcher in a given field, and Stability, the investment of time and effort in a specific discipline to become an expert in the field. The third dimension, Divergence, stands in opposition to the second dimension, as it represents the researcher's preference for research agendas that integrate or make use of more than one discipline. This dimension also has two sub-dimensions: Branching out, which refers to expanding the research agenda towards other fields of knowledge (including the use and application of theories and methods from one field to another), and Multidisciplinarity, which is associated with the inclination to engage in multidisciplinary projects (Schut, van Paassen, Leeuwis, & Klerkx, 2014).
Discovery and Conservative, the fourth and fifth dimensions of the MDRAI, are also in opposition to each other, although these dimensions do not have sub-dimensions. Discovery refers to a researcher's preference for a research agenda that is riskier but has the potential to create new knowledge in a disruptive way, possibly creating new paradigms (Kuhn, 2012). Conservative measures the preference for pursuing a research agenda that is focused on well-established themes and a more incremental knowledge creation perspective. This preference is deemed to be safer and within the bounds of normal science, according to Kuhn (2012), and thus entails less risk of encountering research dead ends or a lack of acceptance by the research community. The sixth dimension, Tolerance of Low Funding, measures a researcher's willingness to pursue a research agenda even when little or no funding is available to support it. This dimension is relevant because it is associated with the competitive drive for research funding that universities and other institutions exhibit even when their researchers do not necessarily need such funding to do their research (Roumbanis, 2019). However, this dimension also illuminates how researchers can engage in research agendas without having access to resources at a time when the distribution of resources is characterized by inequality and increasing concentration (Hicks & Katz, 2011). The seventh dimension, Collaboration, plays an increasingly key role in the contemporary research dynamics (Kwiek, 2018) and refers to the preference for engaging in collaborative research agendas. This dimension also has two sub-dimensions, which represent how engagement in collaborative research can occur: Willingness to Collaborate, which indicates the propensity to collaborate, and Invitations to Collaborate, which refers to the collaborative opportunities provided by others (i.e., research projects started by others). The final dimension of the MDRAI is Mentor Influence, which measures the extent to which researchers are influenced by their mentors when designing their research agendas. The influence of a mentor on an individual's research agenda is to some extent a proxy for scientific independence but can also attest to good professional relationships forged during a researcher's PhD study, even though the mentor's influence is expected to wane over time (Ooms, Werker, & Hopp, 2018).
The MDRAI covers these critical dimensions and can be complemented by additional dimensions that are likely to shape the way that research is thought about and considered. Based on the literature, three dimensions are considered. First, the research agendas of researchers in the fields of science, technology, mathematics, and engineering (STEM) are known to be more influenced by their field communities, in which consensus on the significant questions that should be addressed tends to be reached collectively and holistically. This consensus is expected to influence a researcher's choices in those fields when defining a research agenda (Becher & Trowler, 2001). However, the research preferences of researchers in the social sciences and humanities tend to relate more strongly to personal interests. Although these personal interests are linked with issues significant to the researchers' field communities, the field communities are not expected to influence individual researchers to the same extent that they do in STEM fields (Collins, 1994). Second, with the rise of performativity, managerialism, and metrics associated with world university rankings and competitive national funding schemes, universities and other institutions are playing an ever greater role in influencing the research agendas of researchers (Kenny, 2018). These organizationally determined metrics establish the goals and targets related to research careers and influence decisions on salary increases and tenure and promotion (Acker & Webber, 2017). The recent literature shows that the increasing institutional pressure is influencing academic work and the way that researchers use these institutional constraints and incentives to orient their intellectual interests and career trajectories (Brew, Boud, Crawford, & Lucas, 2018). Third, as research funding agencies and other institutional bodies (including universities, through policies related to research exchange) are increasingly highlighting the impact and social relevance of research, it is becoming increasingly likely that forms of research practice such as "action research communities" or "participatory research" are chosen. In these practices, researchers work collaboratively or consult lay communities about the challenges that they may face, and they structure their research from this perspective (Mendes, Plaza, & Wallerstein, 2016;Wooltorton et al., 2015). As a result, researchers may increasingly seek the opinions of nonexperts about social and technical problems and build research agendas that deal with "real problems" and are likely to have a strong societal impact.

METHOD
This section provides information relevant to the various analyses presented later in this study, such as the methods of determining validity.

Structural Equation Modeling
This subsection provides a brief introduction to structural equation modeling (SEM) to enable readers unfamiliar with this methodology to better understand the remainder of the study. Readers already familiar with SEM may wish to skip this subsection.
In the pilot and main studies, SEM was implemented using AMOS 24, with the goal of conducting CFA as a follow-up to a previous EFA. The AMOS software package was developed by IBM as a companion to the more well-known SPSS, focusing on SEM. Although there are other software packages dedicated to SEM, AMOS has the distinct advantage of being largely graphics-based and is thus easier to use. SEM has the capacity to include latent variables to account for factors that cannot be directly observed (Bentler & Weeks, 1980) and also provide linear modeling procedures, such as analysis of variance and linear regression (Marôco, 2010). It has also the advantage of providing significantly more fit indicators than those available for general and generalized linear modeling, which can be used to re-estimate the model to achieve optimal fit, such as by allowing for covariance between the error terms (Bollen, 2014;Marôco, 2007;Marôco, 2010).
SEM typically contains two components: the measurement model and the structural model. The measurement model examines the trajectories from the manifest variables to the latent variables, with the dependent or endogenous variables being represented as follows (Bollen, 2014;Marôco, 2007;Marôco, 2010): where y is the vector for the manifest variables, Λ y is the matrix for the factorial weights of η in y, η is the vector for the latent variables, and ε is the error term for y. The independent or exogenous variables are given by where x is the vector for the manifest variables, Λ x is the matrix for the factorial weights of ξ in x, ξ is the vector for the latent variables, and δ is the error term for x.
The second component in SEM, the structural model, defines the relations between the various latent variables, and is given by the following (Bollen, 2014;Marôco, 2010): where Β is the coefficient matrix for the latent endogenous variables in the structural model, Γ is the coefficient matrix for the latent exogenous variables in the structural model, and ζ is the vector for the disturbance terms in the structural model.
CFA is a specific type of SEM that is largely centered around the measurement model, because the structural section, if it exists, is largely reserved for second-order constructs. CFA is frequently used as a follow-up analysis to EFA. In EFA, the variables are allowed to freely load onto any extracted factors (Marôco, 2003), whereas CFA requires that the researcher specify the structure to be tested (Brown, 2015). Thus, EFA can provide initial insights into how to specify the model, and this specification can subsequently be tested through CFA.
Rather than relying on ordinary least squares, various methods can be used to estimate the parameters in SEM. The de facto standard in SEM estimation is maximum-likelihood (ML) estimation. ML estimation was used in all of the SEM analyses in this study because it is robust to deviations from the multivariate normality and generally considered to be the most useful estimation method (Arbuckle, 2007;Jöreskog & Sörbom, 1989;Marôco, 2010).

Considerations When Using SEM With a Large Sample
The main study used a much larger sample than is typically encountered in studies or referred to in statistical textbooks. Although this increases statistical power, it also creates issues in SEM due to the method's reliance on the χ 2 statistic. The χ 2 is a mathematical function of the sample size and is generally inflated by large samples (Hair, Black, Babin, Anderson, & Tatham, 2007). This makes the underlying test almost always significant, and other indicators that are dependent on this statistic are likewise influenced. In other words, the χ 2 statistic reflects the sample size rather than the model fit (Browne & Cudeck, 1993). As Iacobucci (2010) states, "as N increases, χ 2 blows up," with quasiexponential gains in the χ 2 statistic reached for sample sizes as low as 500. As a result, fit evaluation was conducted using a suite of alternative fit indices (AFIs) (Barrett, 2007;Browne & Cudeck, 1993;Kline, 2016;Putnick & Bornstein, 2016), which are detailed in the following section. There was also an issue with the modification indices (MI), which are also based on the χ 2 statistic (Whittaker, 2012). Due to the sample-related inflation of the statistic, trivial changes were signaled as highly significant by the MIs, thus rendering the usual MI thresholds (Marôco, 2010) functionally useless. As a result, MIs were used in a limited manner. More details on how they were implemented are provided in the relevant section. Finally, the measurement invariance could not be tested using χ 2 comparisons, for the same reasons. Instead, AFIs were used (Meade & Lautenschlager, 2004;Putnick & Bornstein, 2016) in accordance with the stated guidelines for best practice in the literature (Cheung & Rensvold, 2002;Milfont & Fischer, 2010).

Fit Evaluation
Following model estimation, it is necessary to evaluate the model fit. Due to the large number of fit indicators, each representing different features of goodness of fit, it is usual to select one indicator for each category of indicators rather than report the entire suite of indicators (Bentler, 1990). The most common measure of fit is the χ 2 goodness-of-fit test (Barrett, 2007), which tests the null hypothesis that the population's covariance matrix is identical to the covariance matrix estimated by the model. However, due to the sample-related issues noted above, our evaluation relied heavily on the AFIs listed below.
The first category of fit indices is the absolute indices, which provide a measure of fit (Marôco, 2010). Traditionally, this is done using χ 2 /df, the ratio of the chi-square statistic to the degrees of freedom. However, due to the large sample, it became necessary to use an alternative indicator for this category. We used the goodness-of-fit index (GFI), which is also commonly used in the literature. The second category of indices is the comparative indices, which compare the model fit with the fit of the independence and the saturated model (Bentler, 1990;Marôco, 2010). In this case, we used the comparative fit index (CFI) (Bentler, 1990). For the category of parsimony-adjusted indices, which penalize more complex models (Marôco, 2010), we used the parsimony-adjusted counterpart to the CFI, the PCFI. The fourth category was the population-discrepancy indices, which compare the model fit as calculated by the sample moments, where the model fit is calculated through population moments (Marôco, 2010). For this category, we used the commonly used root mean square error of approximation (RMSEA), which is a popular choice because it is relatively insensitive to index inflation (Steiger, Shapiro, & Browne, 1985). The final category of information-theory indices is also dependent on the χ 2 statistic, but in this scenario this is less problematic, as the values of these indices are devoid of meaning on their own. Rather, they are used to compare multiple models and are read as "less is better" (Anderson, Burnham, & White, 1998;Marôco, 2010). For this category, we used the modified expected cross-validation index (ECVI), which does not require the competing models to be nested (O'Rourke & Hatcher, 2013) and is considered to be particularly useful for CFA purposes (Bandalos, 1993). We used the modified version of ECVI because it is preferable under ML estimation (Marôco, 2010).

Modification Indices
To increment the model fit, it is possible to carry out model respecifications. The first approach to respecification eliminates nonsignificant trajectories and trajectories with low loadings, which has the additional advantage of increasing the factorial validity (Marôco, 2010). The second strategy involves MIs, which estimate the discrepancy or delta in the χ 2 statistic when certain adjustments are made to the model. It is important that these adjustments are coherent at a conceptual level, as otherwise a model can statistically have a good fit but be theoretically implausible (Arbuckle, 2007). This is usually performed by drawing covariances between error terms within the same factors and eliminating variables with cross-loadings, which tend to manifest as high MI values connected to the covariances between error terms of variables in different factors (Marôco, 2010). In AMOS 24, the MIs use the Lagrange multipliers method (Bollen, 2014). MI analysis is usually conducted iteratively. The adjustments are first specified with an MI of 11 or higher, corresponding to a Type I error probability of 0.001, and then with an MI of 4 or higher, representing a Type I error probability of 0.05 (Marôco, 2010). In the main study, MIs were used sparsely due to the sample size.

Imputation
Missing values were imputed via Markov Chain Monte Carlo multiple imputation, which produced five complete data sets. EFA was carried out for each of the five complete data sets, and pooled estimates were then produced. In the CFA stage, because AMOS does not have built-in integration with the SPSS multiple imputation module, we used a single complete data sets.

Scale Level
The original MDRAI and the new MDRAI-R items are scored on a 7-point Likert scale ranging from completely disagree to completely agree. Although Likert scales are technically ordinal, the data are treated as continuous throughout the entirety of the analysis. The rationale for this is as follows. First, various studies indicate that at the 5-point range and beyond, Likert scales can simply be treated as continuous (e.g., Johnson & Creech, 1983;Norman, 2010;Sullivan & Artino, 2013;Zumbo & Zimmerman, 1993). In the context of SEM specifically, Kline (2016) only recommends using alternative estimation methods (i.e., not ML) when the range of the scale is 5 points or smaller. Indeed, this is precisely why we opted to use a 7-point scale, which is less common than the 5-point scale. Second, the skewness and kurtosis values for the individual items indicate that they are sufficient approximations of a normal distribution (as we demonstrate in a later section), further indicating that the items can reasonably be treated as continuous.

Procedures
We conducted several searches on the Scopus database from June 2017 to August 2018 to identify the corresponding authors of articles from all fields of knowledge (based on the Scopus disciplinary area classifications) published from 2010 to 2016. As the Scopus database only shows the results for the first 2,000 matches, several sorting strategies were used to maximize coverage, namely, default sorting, most relevant, least relevant, and highest cited. No further sorting strategies were used, as significant numbers of duplicate records had been obtained by this point. We found 915,447 corresponding authors.
The survey was carried out electronically through an online surveying platform. Invitations to participate were sent out by e-mail in batches from June 2017 to August 2018 (this included an additional wave of invitations to the authors that did not respond to the initial invitation). The invitation included a description of the project and the survey aims and an opt-out link for participants who did not wish to be contacted again about the project. Those who accepted the invitation were directed to a page with an informed consent letter describing the scope, objectives, and purposes of the survey in further detail. The participants were required to give informed consent before they could proceed to the survey itself.
In total, 21,016 individuals agreed to participate. Of these, 8,883 dropped out before completion and were thus removed from the subsequent analysis. The final sample contained 12,183 participants, of whom 4,153 (34.1%) were female and 8,030 (65.9%) were male. The mean age was 49.994 years (SD = 12.285). In regard to geographical distribution, the most represented countries were the United States (N = 2,235; 18.3%), Italy (N = 806; 6.6%), the United Kingdom (N = 760; 6.2%), Spain (N = 554; 4.5%), and France (N = 548; 4.5%). The remaining participants were distributed across a range of other countries, ensuring global coverage. Table 1 summarizes the descriptive statistics for the sample. The geographical distribution is shown in Appendix A, due to its size. Finally, for cross-validation purposes, the working data set was randomly divided into two sub-samples (see, e.g., Johnson & Stevens, 2001): a training data set, with roughly 10% of the participants (N = 1,203), to be used in the EFA, and a holdout data set with the remaining 90% of the participants (N = 10,980), to be used for the CFA.

Analytical Roadmap
We describe our analytical strategy as follows. We begin by reporting the results of a pilot study that was conducted prior to the main survey and the subsequent analysis. We then report the EFA results for the main study, which was conducted with the goal of obtaining a preliminary data structure for the new scales to be included in the model. EFA was followed by CFA, where the model was further refined through iterative respecification until an optimal fit had been attained. After reporting the results of CFA, we describe the findings of our validity, reliability, and sensibility analyses, conducted to demonstrate the psychometric properties of the instrument. We conclude with measurement invariance analysis, which was performed to demonstrate that the instrument has similar measurement properties across all fields of knowledge.

Pilot Study
A pilot study was conducted in May and early June 2017 in preparation for the primary survey and the subsequent validation exercise. The pilot study aimed to (a) reinforce any weak preexisting scales (i.e., those with the minimum number of items per dimension or items with relatively lower loadings in the MDRAI); (b) develop new questions related to entirely new themes that had emerged since the development of the original MDRAI; and (c) ensure that the global number of items was reasonable by filtering out unnecessary items without compromising the factorial structure (as an excessively lengthy survey can discourage participants from completing it).
A pool of 92 questions was developed based on these criteria. This pool had 22 items unchanged from the original MDRAI and 13 items that were edited for clarity based on the comments by the participants in the pilot study. The 57 remaining items were original. Of these, 35 items were intended to reinforce the pre-existing scales, with the remaining 22 related to novel themes, most notably orientation (toward institutions, community, or society) and external metric-driven pressure.
Participation in the pilot study was by invitation. We sent invitations to several researchers from a variety of fields of knowledge and institutions around the world. A public invitation was posted on the project's ResearchGate page. Ninety-seven researchers agreed to participate in the pilot study. The questions were presented in random order to each participant.
The data obtained in the pilot study were analyzed using EFA and then CFA. Each scale was analyzed independently due to (a) the small sample size for the pilot study and (b) the expectation of relative independence for each scale (they are meant to be able to be used individually if desired, as each scale measures a separate facet of a research agenda). For the new themes, EFA was conducted using Varimax rotation (Ebrahimy & Osareh, 2014), and the optimal number of factors was determined using the following criteria: (a) Kaiser's criterion, (b) the scree plot's "elbow," and (c) the percentage of extracted variance. The extracted structure was then specified in the CFA stage for further evaluation.
The two main conclusions of this study relate to the new themes. The item elimination, although necessary, was less interesting, and the results are summarized in a later section. The items related to the new orientation scale originally revealed three factors explaining 67.38% of the variance. Based on their content, the items seemed to be related to the field orientation (e.g., "My choice of topic is determined by the field community"), society orientation (e.g., "I decide my research topic based on societal challenges"), and institutional orientation (e.g., "My research agenda is aligned with my institution's research strategies"). Thus, the CFA specified a model with three lower order latent variables in accordance with this structure. The field and institutional orientation dimensions had reasonable loadings (0.72) and (0.91), but society orientation loaded poorly onto the higher order factor (0.39). We interpreted this as indicative that a society orientation can sometimes be at odds with an academic orientation, or, in practical terms, that the society orientation factor might be independent of the other two orientation factors. We decided to reinforce the society orientation factor (which had only three items) with an additional three items and repeat the EFA for this factor in the main study. This generated two new subscales: one with the field and institutional orientation scales (which we termed Academia Driven), and a second with the society-related items (which we termed Society Driven). Our second conclusion concerns the metric-driven pressure scale, which identified two factors explaining 55.87% of the variance: one related to publication pressure and the other to evaluation metrics pressure. This subscale was tentatively termed Publish or Perish. The pilot study concluded with a preliminary version of the revised survey composed of 68 items, which was used in the main study as described below.

EFA
Before conducting the CFA, a new EFA was conducted on the new scales (Academia Driven, Society Driven, and Publish or Perish) using the training data set similar to the EFA in the pilot study, to obtain a tentative factorial structure for the CFA stage (Bentler & Weeks, 1980). Accordingly, three independent EFAs were conducted, one for each scale. Although we could have conducted a single EFA, we decided to use identical procedures to the pilot to ensure consistency and reflect the modular nature of the inventory.
The EFA for the Academia Driven subscale largely matched that observed during the pilot study, with two extracted factors explaining 69.43% of the variance. Semantic interpretation of the items loading onto each factor exhibited similar behavior to that previously observed, with a factor related to institutional orientation and another to field orientation. The Society Driven scale, with the reinforcement items added in the previous stage, showed that two factors explained 79.26% of the variance. Semantic analysis of the items suggested that one of the factors was related to society (e.g., "I decide my research topics based on societal challenges"), and another was related to interactions with nonacademics (e.g., "I choose my research topics based on my interaction with my nonacademic peers"). We tentatively named these two factors "Society" and "Nonacademic." Finally, in contrast with the observations from the pilot study, the Publish or Perish scale revealed that a single item explained 47.77% of the variance. Due to the previous findings and because the analysis scree plot suggested a possible twofactor solution, a forced two-factor extraction was attempted. However, this revealed significant cross-loadings on both factors from multiple items, thus confirming that the one-factor solution was optimal. As a result, we decided to use the one-factor solution in the CFA stage and re-evaluate the structure of this scale based on the findings. Table 2 summarizes the results of these analyses.

Model Specification
From this section onwards, the holdout sample is used for the reported analysis. The initial specification strategy replicated the structure obtained during the CFA for the original version of the instrument for the changed scales (Horta & Santos, 2016) and replicated the structure obtained during the EFA stage (see the previous subsection) for the new scales (Marôco, 2010). This specification resulted in a model with an inadmissible solution due to a nonpositive definite covariance matrix. This is a difficult issue to address, as it does not have a clear cause or method of diagnosis. In the literature, this is attributed to small sample sizes, insufficient numbers of manifest variables for each latent variable, misspecification of the model, and multicollinearity (Hair et al., 2007;Kline, 2016;Marôco, 2010). However, the issue needed to be resolved before proceeding with the analysis. As the sample for this exercise was not small and the recommended number of items per latent variable was met or exceeded in each case (Marôco, 2010), the only plausible remaining solutions were misspecification of the model or multicollinearity. As this was a CFA exercise, rather than path analysis, multicollinearity was somewhat expected and desirable (despite conceptual expectations of varying degrees of independence of some of the scales). Nevertheless, we speculated that there could be some degree of overlap leading to a misspecification issue. To diagnose this, we reran an EFA, but this time with the entire pool of items. The issue then became apparent. In the original validation exercise, some competing dimensions had loaded onto separate factors (Horta & Santos, 2016), but in this exercise, they exhibited different behaviors. Some of the items in the Conservative scale loaded onto the same factor as the Convergence scale, and some items for the Convergence scale loaded onto the Divergence scale, albeit with a negative loading, simultaneously exhibiting cross-loadings with the remaining items of the Convergence scale. This strongly suggests the redundancy of these scales, in the sense that Convergence/Divergence and Discovery/Conservative can be measured on a spectrum using a single scale rather than independent scales. As a result, it was decided to remove the Convergence and Conservative scales entirely and instead measure these concepts through the Divergence and Discovery scales (i.e., lower scores for Divergence translate to higher scores for Convergence characteristics). An additional issue emerged in the new Publish or Perish scale, which exhibited substantial cross-loadings across the board and thus was considered unviable for inclusion in the instrument. The removal of these scales addressed the issue and allowed an admissible solution to be estimated. An incidental benefit was that this further assisted the stated goal of reducing the number of items in the instrument.
The second step for specification was scanning for items with poor loadings (under 0.50), which indicate poor factorial validity (Kline, 2016;Marôco, 2010). The only such item was one of the new items in the Discovery scale ("I invest most of my time in research that I believe is at the forefront of knowledge"), with λ = 0.44. All of the other items were above the required threshold. This item was removed, and the model was re-estimated.
The third step involved removing redundant items, in line with the stated goal of reducing the number of items. The main candidate scales for item reduction were Mentor Influence, Tolerance of Low Funding, and Discovery, all with six items each. Observing the MIs, it was evident that there were substantial within-scale correlations between the error terms for the respective items, suggesting the redundancy of some of these items and providing grounds for their removal. Although there is no consensus on the optimal number of items for measuring a latent factor, similar analyses have been carried out with as few as two manifest variables (Rammstedt & John, 2007). However, most scholars consider this to be the absolute minimum, with a recommended minimum of three (Hair et al., 2007;Marôco, 2010). We opted to reduce the number of items in these scales to four. We decided to remove the two worst performing items in each of the scales (due to either poor loadings or high cross-loadings). For the Tolerance of Low Funding scale, the two items removed were "I try not to worry about funding availability when I plan my research," with λ = 0.65, and "I think I can progress in my career doing research with limited funding," with λ = 0.58. For the Discovery scale, the items were "I have a preference for new research topics," with λ = 0.62, and "I prefer to work on topics that have a high degree of novelty," with λ = 0.77. Finally, for Mentor Influence, the removed items were "My PhD mentor's opinion carries much weight in my research choices," with λ = 0.71, and "My PhD mentor still often works alongside me," with λ = 0.69. In addition, one of the items on the Prestige subscale of the Scientific Ambition scale ("Standing out from the rest of my peers is one of my goals") performed somewhat worse than its peers, with λ = 0.68. As the Scientific Ambition scale was already measured by seven items (four for Prestige and three for Drive to Publish), we decided to also remove this item. After this round of removals, the model was re-estimated.
The fourth and final step was evaluating the MIs. This was a daunting task, as MI values are based on the χ 2 statistic (Whittaker, 2012). As noted in the methods section, this statistic was substantially inflated by the sample size, which also caused the MIs to be inflated by proxy, resulting in the MIs flagging trivial model changes as highly significant. Specifically, the threshold value of 11, which corresponds to a Type I error probability of 0.001 (Marôco, 2010), applied to nearly all of the proposed changes. We opted to implement modifications following the usual convention of creating covariances between error terms loading onto the same factor (Kline, 2016;Marôco, 2010) and evaluate the effective fit gain through the AFIs. Other than the within-factor error disturbances, two items were removed due to substantial cross-loadings evident from very high MI values, both from the Academia Driven scale: "I often decide my research agenda in collaboration with my field community" and "My institution defines my research agenda." As the χ 2 statistic could not be used to gauge the quality of the model changes, we opted to evaluate improvements through the CFI instead. For each implemented MI change, the model was re-estimated and re-evaluated in an iterative manner until a CFI above 0.950 was reached. This level is considered the highest possible qualitative threshold for model fit using this index (Hu & Bentler, 1999).
This multistage specification strategy yielded notable gains in model fit (MECVI initial = 1.941 versus MECVI final = 1.103), accomplished the goal of item reduction, and addressed all of the specification issues. The fit evaluation at each stage is summarized in Table 3.

CFA
Full information ML was used to estimate the final model. For this final iteration, the model was as significant as the various trajectories ( p < 0.001). Based on the fit evaluation, and using the common thresholds (Barrett, 2007;Hair et al., 2007;Hooper, Coughlan, & Mullen, 2008;Kline, 2016;Marôco, 2010), it was determined that the model fit could be qualitatively assessed as very good (GFI = 0.950; CFI = 0.953; PCFI = 0.850; RMSEA = 0.037). Table 4 indicates the factorial loadings for the final model, and Figure 1 provides a visual representation of the model. Finally, Table 5 provides item-level descriptive statistics, from which it can also be observed that all of the items follow univariate normality, following Kline's (2016) criteria for skewness and the kurtosis thresholds.
In addition to the factorial loadings, initial insights regarding the interplay of the various dimensions can be obtained by observing the correlations in Figure 1. First, a moderately strong correlation can be observed between the Academia Driven and Society Driven scales  (r = 0.646). A possible explanation is that institutions (and indeed the academy) currently place emphasis on society-focused research, causing them to be somewhat aligned, even if they are still independent (and, as mentioned in the pilot study section, sometimes at odds with each other). The Society Driven scale also exhibits a moderate correlation with Divergence (r = 0.508), which suggests either that the society-focused challenges are requiring more multidisciplinary approaches or that researchers who have a preference for diverging research are also more likely to engage in society-driven research. Divergence also exhibits a moderate correlation with Discovery (r = 0.503), which is expected because these two agendas are core traits of the trailblazing doctrine that was identified in the previous iteration of the MDRAI (Santos & Horta, 2018). Similarly, Collaboration exhibits moderate correlations with Scientific Ambition (r = 0.568) and Divergence (r = 0.554), and thus also resonates with the characteristics of the trailblazing doctrine. Several other correlations, which are not covered here but are relatively easy to interpret, can be identified, but they are not as strong. Overall, the observed correlational matrix can provide insights into how to use the MDRAI-R in future studies.

Validity, Reliability, and Sensitivity
Three types of validity were assessed in this study: factorial validity, convergent validity, and discriminant validity (Hair et al., 2007;Marôco, 2010). James Gaskin's Stats Tool Package (2016), specifically the Validity Master macro, was used for the assessment. This also reflects the same types of validity evaluated in the validation exercise for the first version of the MDRAI. Factorial validity can be attained when the standardized loadings for all items exceed the 0.50 threshold (Marôco, 2010). One of the steps in the previous section ensured that this criterion was met, so the model had factorial validity. The second type, convergent validity, relates to high loadings from the manifest variables onto the latent variables and is evaluated through the average variance extracted (AVE; Fornell & Larcker, 1981). The AVE for a given factor is given by: Based on this calculation, convergent validity is confirmed when the AVE exceeds the 0.50 threshold (Hair et al., 2007). This was the case for all of the factors, with the exception of Discovery, with a slightly lower AVE of 0.473. Although this could conceivably have been increased by eliminating the lowest-loading item, a minor shift from the threshold is likely to be irrelevant at a practical level. Therefore, we argue that convergent validity was largely  demonstrated, although the abovementioned issue must be taken into consideration when using the Discovery scale. We proceeded by evaluating the discriminant validity, which reflects the degree of extra-factorial correlation. Discriminant validity is demonstrated when the square root of the AVE for a given pair of factors i and j is equal to or greater than the correlations between those two factors. Furthermore, the AVE must be equal to or greater than both the maximum shared variance (MSV) and the average shared variance (ASV; Fornell & Larcker, 1981;Hair et al., 2007;Marôco, 2010). All of the factors met this criterion, demonstrating the discriminant validity of the instrument.
Following the validity evaluation, we proceeded with the analysis of reliability, which is a measure of consistency (Marôco, 2010). This was done using the composite reliability (CR) (Fornell & Larcker, 1981), which for a given factor j with k items is given bŷ The proposed threshold of 0.7 is considered to indicate scale reliability (Hair et al., 2007). All of the factors exceeded the required threshold, with the exception of Divergence (CR = 0.695). However, as before, a millesimal difference is likely to be trivial. Despite this slight deviation, the instrument can be considered reliable. Table 6 summarizes the validity and reliability findings and the correlations between the factors.
The final factor is sensitivity, which refers to the capability of an instrument to differentiate between two individual items. This is demonstrated when all of the individual items have a reasonably normal distribution (Marôco, 2010). Items are considered to have a reasonable approximation to the normal distribution when their skewness and kurtosis are under the absolute value of 3 (Kline, 2016). All of the items were below this threshold for both parameters, thus demonstrating the sensitivity of the instrument and completing the validation exercise.

Measurement Invariance
In this step, the goal was to assess and eventually demonstrate measurement invariance across the major fields of knowledge. The fields of knowledge were the exact and natural sciences, health and medical sciences, engineering and technology, social sciences, and humanities. Measurement invariance indicates that the operationalization of a construct has the same meaning in different contexts (Meade & Lautenschlager, 2004). In other words, its metric is universal wherever invariance is tested. To achieve this, we used a multigroup analysis following the procedure outlined by Marôco (2010) and Kline (2016), which involves comparing the unconstrained model with progressively more constrained models. Typically, this is done using χ 2 tests for difference. However, as noted in the literature and observed in our own data set, this statistic becomes unreliable with larger samples, as all trivial differences are deemed to be significant (Chen, 2007;Cheung & Rensvold, 2002;Kline, 2016;Meade, Johnson, & Braddy, 2008;Putnick & Bornstein, 2016). Scholars have proposed using AFI in these scenarios instead (Putnick & Bornstein, 2016). Cheung and Rensvold (2002) propose that a CFI change of less than 0.01 indicates measurement invariance. Thus, we estimated the multigroup analysis for fields of knowledge using progressive levels of constraints, based on the hypotheses for testing measurement invariance proposed by Cheung and Rensvold (2002) and following the guidelines recommended by Milfont and Fischer (2010).
We began by testing hypothesis H λ . Metric invariance was demonstrated for the first model, with a ΔCFI of 0.000 (Model II), indicating that the constructs manifest identically across fields of knowledge (Cheung & Rensvold, 2002). For the next hypothesis, H Λ,Θ(δ) , residual variances and covariances were also demonstrated, with a ΔCFI of 0.002 (Model III), indicating that the internal consistency is identical across the fields of knowledge (Cheung & Rensvold, 2002). The threshold for hypothesis H Λ,ν , scalar invariance, was not met, with a ΔCFI of 0.012 (Model IV). Following the guidelines in the literature for best practice in testing measurement invariance, we then tested for partial scalar invariance (Byrne, Shavelson, & Muthén, 1989;Cheung & Rensvold, 2002;Milfont & Fischer, 2010). This required us to determine which intercepts varied to the greatest degree across the fields of knowledge. Due to the large number of intercepts and groups, a more efficient method than simple visual inspection of the intercept matrix was required. We computed the square root of the sum of the squared differences for each pair of intercepts to identify which intercepts had the largest cross-field of knowledge discrepancies. These intercepts lay in two scales: Tolerance of Low Funding and the new Society Driven scale. This finding can be explained as follows. For Tolerance of Low Funding, it could relate to the widely varying availability of funding across the fields of  Notes: ΔCFI is calculated with reference to less constrained models using the guidelines in Cheung and Rensvold (2002). Although Cheung and Rensvold (2002) indicate equivalence of construct variance and equivalence of construct covariance as separate hypotheses, they were merged for this exercise. This is a technical limitation, as the AMOS software package bundles these two constraints together. knowledge, leading to different levels of risk tolerance (Lanahan, Graddy-Reed, & Feldman, 2016;Mejia & Kajikawa, 2018). Similarly, for the Society Driven scale, the finding could relate to the difference between basic and applied research, as basic research has lower levels of Society Driven research agenda characteristics than applied research (see Bentley, Gulbrandsen, & Kyvik, 2015).
Having identified the source of variance, we allowed these intercepts to vary freely across the fields of knowledge and proceeded with the analysis, as per the guidelines provided by Milfont and Fischer (2010). The new model met the threshold for partial scalar invariance, with a ΔCFI of 0.008 (Model V), indicating scalar invariance for all of the scales except Tolerance of Low Funding and Society Driven. The next level of invariance is at the construct level. The next model constrained the construct variances and covariances and tested hypotheses H Λ,Θ( jj) and H Λ,Θ( jj') . The equivalence of construct variance and covariance was not demonstrated, with a ΔCFI of 0.011 (Model VI), indicating that the range of responses and relationships between the constructs were not identical across the groups. Finally, the last model tested for differences in the latent means (hypothesis H Λ,ν,к ), which were demonstrated with a ΔCFI of 0.000 (Model VII). Measurement invariance was demonstrated for the instrument, with full metric, scalar, and partial construct invariance for all of the scales except Tolerance of Low Funding and Society Driven, which nevertheless still possessed metric invariance. The results of the model comparison are summarized in Table 7. Finally, the descriptive statistics for each factor and field of knowledge are presented in Table 8.

DISCUSSION
In this section, the various scales and their scoring are interpreted. We first discuss the scales, then focus on the scoring. To calculate composite scores for each scale, there are numerous options to choose from (DiStefano, Zhu, & Mindrila, 2009). As with the initial version of the MDRAI, simple summation is discouraged due to the unbalanced number of items across the factors. Although this was one of the goals of this revision, it was unfortunately not possible to do so and maintain the validity of the scale. Therefore, the score range varied across the scales, making direct comparison difficult. The simplest alternative way of computing the composite scores, and the approach we encourage for general use, is to calculate the mean score of the items in each scale. This yields a composite nondiscrete score ranging from 1 to 7. In addition, the mean for each item can be weighted using the factor loadings provided in Table 4. Scores can be computed for either the first-order factors or the second-order factors, depending on the specific research purposes.
The Scientific Ambition dimension retained the same importance as it had in the MDRAI, including that of its subdimensions (Prestige and Drive to Publish), stressing the relevance of engaging in research agendas that can provide recognition for one's work from peers and help to achieve positions of intellectual and field authority in the knowledge communities of interest (Latour & Woolgar, 2013;Whitley, 2000). The Collaboration dimension and its subdimensions (Willing to Collaborate and Invited to Collaborate) also retained critical importance in the MDRAI-R, which demonstrates an understanding that collaborative agendas are necessary in all fields of knowledge and that collaborating or not with peers is a key decision when embarking on new research agendas (Siciliano, Welch, & Feeney, 2018). Higher scores for the dimensions and the respective subdimensions mean that the relevance of these factors to the research agenda is more important for researchers (e.g., a higher score for Scientific Ambition means that researchers privilege this dimension when developing their research agendas).
The Tolerance of Low Funding and Mentor Influence dimensions also appear to be critical in influencing the research agendas of researchers, as they were in the MDRAI. Higher scores for Tolerance of Low Funding indicate that researchers are not discouraged by a lack of available funding from pursuing specific research agendas, meaning that they do not place an emphasis on research funding when deciding on a research agenda, and lower scores for this dimension indicate that researchers consider research funding to be a critical element when deciding on specific research agendas. We further argue that a median score in this dimension can indicate that in some cases, researchers follow research funding when opting for specific research agendas but not in others. This scoring could also indicate that researchers are willing to engage in exploratory research agendas that have little to no funding as a way to obtain initial findings that could allow them to then prepare research agendas of greater scope, ambition, and focus that might need research funding to come to fruition. A higher score for Mentor's Influence suggests that the PhD supervisor continues to have a say in or a degree of influence on a researcher's research agenda, and the opposite means that the researcher embarks on research agendas without requesting their PhD supervisor's guidance or opinion. These scores can be a proxy for researcher independence, but can also be understood as a measure of a researcher's relationship with his or her PhD supervisor after completing a doctorate (Ooms et al., 2018).
The Discovery dimension in the MDRAI-R combines the MDRAI dimensions of Discovery and Conservative into a single dimension, as discussed in the main study section, thereby placing the previously independent dimensions on a continuum. The higher the score for the Discovery dimension, the more likely the researcher is to engage in research agendas that are riskier and focus on emerging and unexplored themes that have greater potential for breakthroughs but also for failure. Santos and Horta (2018) characterize researchers with high Discovery score research agendas as trailblazers, and Foster et al. (2015) characterize these researchers as having innovative research strategies. A lower score in this dimension indicates a preference for low-risk research agendas that are more focused on the gradual accumulation of knowledge in well-established themes, topics, and fields. Santos and Horta (2018) characterize researchers with low Discovery score research agendas as cohesive,  characterize them as having traditional research strategies. The Divergence dimension maintained the same structure as in the MDRAI, including its subdimensions (Branching out and Multidisciplinary), but similar to the Discovery dimension, it also combined the MDRAI dimensions of Divergence and Convergence into a one-dimension continuum. A higher score in the Divergence dimension means that researchers establish research agendas that link and involve knowledge from other fields of knowledge and are attuned to the current needs of complex problems (Zuo & Zhao, 2018). Lower scores in this dimension indicate research agendas bounded by a single field of knowledge and are associated with specialization, knowledge mastery, field identity, and a focus on one or few topics rather than diversification (Franzoni & Rossi-Lamastra, 2017).
The first new dimension of the MDRAI-R is Academia Driven, which refers to the extent to which a research agenda is influenced by holistic, valuative, and normative traits and dispositions related to the scholarly and academic environment and social structure with which the researcher identifies. The higher the score in this dimension, the more the research agenda conforms to and is aligned with the questions, topics, and strategic focuses that the academic environment might regard as a priority. A lower score in this dimension indicates that a research agenda is more based on personal interests and not as affected by the scholarly and academic environment. This dimension has two subdimensions. The Field subdimension refers to the extent to which the research agenda is influenced by scientific priorities that the field community determines by consensus (Becher & Trowler, 2001;Collins, 1994). A higher score for this subdimension means that the research agendas are more influenced by a community priority focus. The other subdimension, Institution, refers to the propensity of researchers to align their research agendas with the strategic research targets of their institutions. The higher the score for this subdimension, the greater this propensity will tend to be; the lower the score, the greater the likelihood that the research agenda will be affected by institutional constraints. This propensity is expected to vary according to the sector in which the researcher is working (e.g., academia, industry, government, nonprofit sector) and the career stage of the researcher, such that younger, untenured, and contract-based researchers will be more affected by institutional constraints (Giroux, 2015).
The second new dimension in the MDRAI-R is Society Driven, which measures the likelihood that a research agenda aims to solve challenges in society. The higher the score for this dimension, the greater the focus on such challenges; the lower the score, the lesser the focus on such challenges. This dimension has two subdimensions. The first subdimension is Society, which refers to the incidence of society-related challenges in a research agenda, and the second subdimension, Nonacademics, measures the influence and participation of laymen and nonexperts in the design of a research agenda. The higher the score for this subdimension, the greater the likelihood of engaging with nonresearch communities in an "action research community" or "participatory research" (Mendes et al., 2016;Wooltorton et al., 2015). These two subdimensions reflect the possibility of having a society-focused research agenda that does not involve collaboration with nonexpert communities.

CONCLUSION
This study refines, extends, and optimizes the original MDRAI, which was validated only for the social sciences. Our revised MDRAI-R includes new dimensions and fewer items per dimension, and it expands the scope and applicability of the inventory to all fields of knowledge. The new version exhibits good psychometric properties and satisfactory validity, reliability, and sensitivity. Furthermore, our measurement invariance analysis indicates that the model can be applied equally to all fields of knowledge, thus broadening its scope of application. The new dimensions (Academia Driven and Society Driven) provide new angles for assessing research agendas. This reinforces the usefulness of the instrument by allowing for cross-field studies and also identifying agendas with possible societal impact. Thus, in addition to being of interest to individual researchers, our instrument will be of value to policy makers, research funding agency strategists, and university and research organization leaders. In particular, the updated instrument will enable them to better characterize their research teams and create incentives that can add value to their research. The final validated version is provided as an appendix to this study (Appendix B). The items are presented in no specific order, and randomization is recommended before application to ensure that the gamification or fixed structuring of the questions does not result in biased responses.
This study has the following limitations. First, as with all perception-based measures, there is a risk of bias from the participants, and this possibility needs to be considered when reviewing the response data, especially with smaller samples. Second, the Academia Driven subscales are represented by only two items each. Although this is acceptable and not uncommon, it must be noted that this is the absolute minimum number of items possible per factor. Thus, care should be taken when using the subscales alone rather than the overall Academia Driven measure, especially when data are missing. An additional limitation is that we could not test the external validity with current data. This is something we plan to address in future studies. Finally, some minor issues were identified with the convergent validity of the Discovery scale and the reliability of the Divergence scale. Although these are only