Productivity Gains from Agglomeration and Migration in the People's Republic of China between 2002 and 2013

We evaluate the evolution of productivity gains in cities in the People's Republic of China between 2002 and 2013. In 2002, rural migrants exerted a strong positive externality on the earnings of urban residents, which were also higher on average in cities with access to foreign markets through a seaport. In 2007 and 2013, city size (measured in terms of both employment density and land area) was the crucial determinant of productivity. Market access, whether internal or external, played no direct role. Rural migrants still enhanced urban residents’ earnings in 2007 and 2013, though the effect was less than half that in 2002. Urban gains and their evolution over time are very similar on a total and a per hour earnings basis. Finally, skilled workers and females experienced slightly larger gains than unskilled workers and males.


I. Introduction
Although the empirical literature on productivity gains associated with larger cities has matured over the past decade (Combes and Gobillon 2015), evidence about the characteristics and role of cities in developing economies is still largely incomplete (Chauvin et al. 2017). This stands in sharp contrast with urbanization rates that are much lower but increasing rapidly in many developing economies compared with developed countries where urbanization is fairly stable. Therefore, quantifying gains and costs from such a rapidly evolving spatial concentration of economic activity is a highly topical issue. If one expects the gains to outweigh the costs, then urbanization should be promoted, for instance, by facilitating rural-to-urban migration. If costs dominate, appropriate policy must be implemented in order to refrain urban expansion.
The People's Republic of China (PRC) offers the best example of a large increase in rural-urban migration, having undergone such a process that has fed the development of cities for more than 15 years. In an early contribution, Au and Henderson (2006) document the productivity gains that could emerge from larger cities in the PRC. They conclude that, apart from the largest cities, large productivity gains might also emerge from a larger number of medium-sized cities. Using microdata from the mid-2000s, Combes, Démurger, and Li (2015) estimate the specific externality that migrants exert on native urban workers-beyond the impact of standard agglomeration variables such as total employment density and city land area-and find evidence of a large positive effect of the city's migrant share on urban residents' wages. This paper complements the analysis over time by evaluating the evolution of productivity gains in cities in the PRC between 2002 and 2013. Measuring possible changes has important policy implications because the success of urban policies ultimately depends on the nature of the gains, which may have changed over time. Another contribution of this paper consists of estimating urban gains not only in terms of total earnings but also earnings per hour, which is a closer measure of labor productivity, as well as for both skilled and unskilled workers and by gender. In doing so, we combine nationally representative household income surveys conducted in 2002, 2007, and 2013 with city-level data.
We find that city characteristics have a significant impact on labor productivity and that the characteristics that matter have changed over time. In 2002, the migrant externality was very strong. To a lesser extent, access to foreign markets through proximity to a seaport also played a positive role. In 2007 and, to a greater extent, in 2013, agglomeration variables that are standard in the literature and mostly reflect city size (e.g., employment density and land area) had become crucial determinants of a city's productivity. By contrast, market access, whether internal or external, played no significant direct role. As for rural migrants, they still enhanced urban residents' earnings, but the effect was less than half what it had been in 2002. These findings show that the PRC moved from urban gains shaped by specific features (e.g., large disparities across cities in terms of access to the sea and migrant shares) in 2002 to a situation in 2007 and 2013 where variables more typically affecting urban workers' earnings are playing a more influential role. What matters for cities in the PRC now is their capacity to host migrants as well as their overall size in terms of both density and land area. Urban gains and their evolution over time are very similar on a total and per hour earnings basis. Surprisingly, we find little difference in earnings gains between skilled and unskilled workers, yet the city effects are slightly larger for skilled workers. This may be due to the fact that the substitution and externality effects from migrants are difficult to disentangle from one another with the data sets we use. Last, female workers seem to gain slightly more from cities than male workers do.
Section II briefly recalls the theoretical background and the associated empirical strategy, and presents the data we use. Section III is devoted to measuring the impact of city characteristics for all workers taken together on both a total and per hour earnings basis. Section IV evaluates agglomeration and migration gains separately for skilled and unskilled workers, and for male and female workers. Section V concludes.

A.
Background and Methodology Marshall (1890) suggested long ago that gains from cities arise from various local external economies of scale. The underlying mechanisms, which are now well understood, were classified by Duranton and Puga (2004) as sharing (specialized inputs, diversity, or risk); matching (quantity and quality of matches in local labor markets); and learning (creating, diffusing, and accumulating knowledge). These mechanisms either directly impact household utility or increase their productivity, which in turn leads to higher earnings. The largest part of the empirical literature augments standard wage equations with city variables to evaluate the determinants of the nominal gains that result from larger cities. Similar conclusions are usually reached when one works on total factor productivity measures. Comparing nominal gains to the higher cost of living in cities, and therefore assessing real income gains in cities, is the focus of another strand of the literature. 1 We use a standard two-step procedure for the estimation of agglomeration effects discussed in Combes and Gobillon (2015) and extended by Combes, Démurger, and Li (2015) to properly account for the role of migrants. The procedure consists in estimating the following specification: The first-step estimation of equation (1) evaluates the impact on individual i's wage at date t, w it , of city-time fixed effects, δ ct for city c where worker i is employed at date t and of city c's characteristics for sector s where i is employed, L cst , net of the role of individual characteristics X it . We estimate equation (1) for each year separately, which makes all estimated parameters year specific. Controlling for individual characteristics is crucial to remove the bias arising from a possible nonrandom sorting of individuals across cities depending on their abilities. The sorting of individuals has been documented to be large in some economies, although this does not seem to be the case in the PRC as shown by Combes, Démurger, and Li (2015). Introducing city-sector variables, typically here only the logarithm of the share of sector s in city c employment, allows identifying the role of the characteristics of the sector within the city beyond the role of other nonsector-specific city characteristics captured by δ ct . 2 The second step disentangles the characteristics of the city (vector U ct ) that impact city productivity δ ct . We pool the 3 years of data together and compare estimates when some (or all) estimated parameters are year specific. The two main variables of interest are employment per square kilometer (density) and the share of internal migrants. The literature has shown the former to be the main driver of urban productivity. The role of the latter has been highlighted in the case of the PRC by Combes, Démurger, and Li (2015). These authors discuss the functional forms to be used in order to properly interpret the estimated coefficients. The logarithm of the employment density of urban residents identifies the standard impact of density that can be compared to the one obtained in the literature for other economies. Then, the impact of the logarithm of the inverse of 1 minus the share of migrants in the city's total employment corresponds to the sum of the impact of density-for a given number of urban residents, migrant inflows increase density-and the separate migrant effect net of such density gains. The latter effect results from a complementarity (if positive) or substitution (if negative) effect between urban residents and migrants in the production function and a possible further migrant externality (either positive or negative). Unfortunately, these last two effects cannot be separately identified without appealing to extended data and a more complex approach, as surveyed by Lewis and Peri (2015).
We augment the second-step specification by a number of city characteristics that can shape productivity in cities in the PRC. Two margins shape city size: density and land area. Density corresponds to the role of the intensive margin: that is, a larger number of workers per square kilometer within the city boundary. Land area captures the role of the extensive margin, the city being made larger by extending its boundaries at a given density: that is, under a simultaneous proportional increase in total employment.
Since trade takes place between locations and workers are mobile, the literature has also emphasized the role of the access to distant markets that can generate imported external economies in cities. We use two variables to disentangle the roles of internal and international markets. The first variable is a within-PRC market potential based on Harris (1954), which is the sum of the employment density in other cities in the PRC divided by the distance to the city being considered. It assesses how far the city is from other large cities. As a large share of the PRC's exports is coursed through coastal ports, the second variable corresponds to the city's distance to the closest seaport. 3 Lastly, and even if this variable has never proved to be very influential, we also consider the role of the city's industrial diversity measured by the inverse of a Herfindhal index computed on the shares of each industry in the city's total employment.
Although we pool together different years, the impact of local variables is mostly identified over the cross section of cities because time variations are much smaller than across cities and also because few cities are sampled for the 3 years that we consider. The literature emphasizes two types of possible biases. The larger one is due to the spatial distribution of workers, which could depend on unobserved skills that largely affect their productivity, and does not necessarily result from city externalities. This can make the estimated impact of local variables twice as large as it would be for the same individual who would locate in different cities in different periods. Since we do not have access to an individual panel where workers are followed across years, we can only control for observable individual skills. For example, a large spatial sorting according to skills has been observed in France and the United States. In the case of France, Combes, Duranton, and Gobillon (2008) show that controlling for observable skills is not strong enough to remove the bias. By contrast, no sorting according to unobserved heterogeneity seems to be present in the United States according to Baum-Snow and Pavan (2013). As for the PRC, Combes, Démurger, and Li (2015) show that no spatial sorting occurs according to observable skills. Therefore, only a large sorting of unobserved skills, which should not be correlated to observed skills, could significantly bias the estimates. We see this as unrealistic, at least until many years of intensive migration and possibly unequal access to higher education have further shaped the spatial distribution of individual skills.
The second possible bias arises from reverse causality. Local characteristics could be driven by local productivity, and not the reverse, if the location choices of firms and workers are endogenous. The local density variable is the most suspicious one under this perspective. This is why we use the specification proposed by Combes, Démurger, and Li (2015), which decomposes total employment density into the impacts of urban employees and the share of migrants. The urban residents' density, which is shown to have the same marginal impact on local productivity as total density, should be much less prone to reverse causality as urban residents settled in the city prior to the review period.
By contrast, the migration variable is likely to be affected by reverse causality. Migrants choose locations where employment conditions and earnings, in particular, are favorable. By appealing to specifications considering different sets of control variables and by using various instrumental variable strategies, Combes, Démurger, and Li (2015) find evidence of biases in the PRC very similar in magnitude to those usually documented for other economies. At around 20% at most, this is not very large. Addressing reverse causality is beyond the scope of the present paper, though it deserves further investigation. As emphasized by Lewis and Peri (2015), this would require more sophisticated strategies, which might be difficult to implement with the data available.
Other local variables could be endogenous too, which is why we present estimates with and without controlling for them. Again, the literature that has made some attempts to instrument these variables never reported any large bias. Yet, the problem could be more severe here for the land area variable because city boundaries are regularly reshaped by the authorities to match local demographic evolution. Again, the causality would go the other way round. We have checked that similar magnitudes were obtained when we use a land area definition that is constant over time (using the 2007 value) but this again deserves further investigation provided that more comprehensive data are available.

B. Data
The data used in this paper come from various sources. Equation (1)  A major feature of the three individual databases is that they cover registered urban residents only (the urban hukou holders). 6 This means that an important segment of the urban labor market, comprising migrants not officially registered in cities, is excluded. Representing about 17% of the PRC's 1.3 billion people, migrants constitute an important social grouping, yet they are clearly marginalized (Cai, Park, and Zhao, 2008;Démurger et al. 2009). The exclusion of migrants whose hukou is not in the same location as their place of employment has an important implication for our estimates: our data set refers to urban residents (natives) only and one cannot infer that earnings determinants across the cities under review would equally apply to migrant workers. Nevertheless, we provide an estimate of the impact that migrants exert on local urban workers simultaneously with agglomeration effects.
All explanatory variables are defined consistently across all 3 years for both estimation steps. The only exception is for enterprise ownership in equation (1), which is more detailed in 2002 and 2013 in comparison with 2007.
Total Earnings Table 1 displays the results obtained for all workers' earnings for the estimation of equation (2). Rows with the variable's name without vintage (e.g., "Density") correspond with the effect common to all years. This is also the total effect for year 2013. The effects for 2007 or 2002 are obtained by adding to this number the coefficient reported in the corresponding line (e.g., "Density 2007" or "Density 2002"). The significance test denotes whether the effect for 2007 (or 2002) is significantly different from the 2013 effect. The first five columns provide estimates based on total earnings and the last two columns give estimates based on earnings per hour. As indicated above, the number of hours worked is not available in the microdata for the year 2007, which implies that these per hour earnings estimates are based on 2002 and 2013 only.
In column (1), density only is introduced in the specification. We obtain an elasticity of total earnings with respect to density equal to 0.097 for 2013, as reported in the first line of the table. The value estimated for 2007 is not significantly different from the one for 2013. The elasticity for these 2 years is close to the values reported by Combes, Démurger, and Li (2015) and is almost three times larger 6 The hukou (household registration) system is a distinctive institutional feature of the PRC that divides the population based on occupation (agricultural or nonagricultural) and place of residence. By entitling access to social benefits to local hukou holders only, the system has limited population mobility for decades. Most (rural) migrants still hold their hometown hukou, which prevents them from permanently settling in cities. than usual estimates for developed economies. Large density economies seem to prevail in the PRC, at least from the mid-2000s onward. By contrast, the effect appears much lower in 2002, even though the estimated year-specific effect is not significantly different from zero. We will return to this issue once other variables have been introduced.
Column (2) adds the role of migrants, which is found to be largely positive albeit to a decreasing degree between 2002 and 2013. In this augmented specification, the effect of density is lower. This can be due to the fact that if denser cities host more migrants, then estimates in column (1) capture both effects. The correlation of density with other city characteristics is also high; in particular, it is positive with market access and negative with distance to seaports and land area. Meaningful estimates are therefore obtained in column (3), where all variables are introduced together, and even more in columns (4) and (5), our preferred specification, which progressively considers more year-specific effects.
Most control effects compensate each other with regard to density, and its impact, at 0.096 in column (5), is very close to that of column (1). While the density impact is 33% lower in 2007, though not significantly because of the imprecision of the estimation, it is significantly lower in 2002 to the point that it is fully canceled out. Hence, while there were no density gains present in the PRC in 2002, by 2013 these gains were three times larger than in developed economies. Moreover, while the extensive margin of city size is mostly found to have no impact in developed economies, it generates further gains in the PRC from 2007 onward. As with density, the impact of land area is significantly different from zero in 2007, while it is not significant though slightly positive in 2002. 7 By contrast, the impact of migrants was three times larger in 2002 than what it was in 2013, with 2007 presenting an intermediate value that is somewhat closer to the 2013 estimate. The 2002 impact is very large. Typically, moving from the first to the last quartile of the migrant share variable increases natives' earnings by 17.5% in 2002, but by only 8.5% in 2013. 8 Distance to a seaport also presents a significant negative effect in 2002, which vanishes in the latter years. This finding suggests that proximity to the sea (and thus access to international markets) was a factor of productivity gains in 2002 that did not seem to hold in subsequent years. This could be related to the changing role of international trade in the PRC's economic growth over the past decade. Finally, a city's industrial diversity was found to be only marginally significant in all 3 years under review, as usually found in the literature.
Overall, we find that the two standard city-size variables, density and land area, as well as the presence of migrants, generate higher earnings in cities in 2007 and 2013, with other characteristics playing no significant direct role. 9 The impact of migrants in 2007 and 2013 is, however, less than half of what it was in 2002. At that time, density and land area had no effect, access to the sea being the only other city characteristic significantly impacting earnings. These findings highlight 7 These comments are based on additional regressions where a dummy normalization is done so that significance tests for the total effects for 2002 and 2007 are obtained. They are available upon request. 8 Note that the effect is divided by two despite the fact that the interquartile gap in the migrant share variable increased from 2 to 2.6 between 2002 and 2013. 9 Some variables (market potential, for instance) could drive density or the presence of migrants and thus have an indirect effect. Yet, the strategy developed here does not allow this to be assessed. the PRC's move from urban gains shaped by some of its specific features in the early 2000s-large disparities across cities in terms of access to the sea and the presence of significant numbers of rural migrants-to a situation where variables that typically affect urban earnings are now playing an influential role. What matters for cities in the PRC now is their capacity to host migrants as well as their overall size, in terms of both density and land area.
First-step estimates for equation (1) from which the city fixed effects used in the second step are obtained are provided in Appendix Table A.1. We do not further comment on the impact of individual characteristics, which are consistent with usual findings for the PRC (Démurger, Li, and Yang 2012). The first-step specification also includes the role of a city-industry-specific variable, specialization, which is measured by the logarithm of the share in the local economy of employment in the firm's industry. Being specialized in an industry increases earnings in a city, which is an additional agglomeration effect typically found in the literature. The effect seems to have reinforced over time: it is significant on earnings per hour only in 2002, while the magnitude of the impact in 2013 is larger than that usually obtained.
As mentioned above, the cities surveyed for each wave differ across time. The regressions provided in Table 1 are performed on the resulting nonbalanced panel. We checked that the distribution of city characteristics does not differ too much between waves. As a further robustness check, we run the same regressions on the subsample of cities that are common in all three waves of the survey. Results are presented in Table A.2 in the Appendix. Our main conclusions remain unchanged, even though some estimates are less precise because of the much smaller sample size.

B. Earnings per Hour
The agglomeration literature mostly focuses on productivity externalities, which are better tested empirically by earnings per hour than by total earnings. The 2002 and 2013 surveys allow us to compute for each worker a measure of earnings per hour, which should thus be closer to labor productivity. If the number of hours worked varies significantly across cities, as documented by Rosenthal and Strange (2008) for the United States, the city determinants of total and per hour earnings can differ. As shown by the last two columns in Table 1, this appears not to be the case in the PRC, whether only density and migrants are interacted with year (column [6]) or all city variables (column [7]). 10 The estimated impact of density is only very slightly lower on earnings per hour than it is on total earnings. Importantly, both density and land area effects still totally offset in 2002. Neither the intensive nor the extensive margins of city size increase labor productivity at that date. By contrast, the larger presence of migrants does significantly increase earnings per hour, with an impact equal to the one estimated on total earnings. Again, the effect is less than half in 2013, but is still significant. Overall, both total and per hour earnings are influenced by the same city characteristics and to the same extent.

IV. Gender and Skills Heterogeneity
The literature shows that both the returns to individual characteristics and the returns to city characteristics, typically gains from larger city size, should and do differ across skills. High-skilled workers tend to benefit more from agglomeration, in particular when local externalities result from technological spillovers (Bacolod, Blum, and Strange 2009). The comparison of agglomeration effects between genders is less frequent; Phimister (2005) is one of the rare exceptions. The data we use allow us to test whether city characteristics have different impacts on skilled and unskilled workers, and on male and female workers. The results are reported in Table 2.
Consistent with what is often observed for developed economies, skilled workers gain slightly more from larger city size (both higher density and larger land area) than unskilled workers. This finding holds for both total and per hour earnings, suggesting that at least part of city-size gains arise from technological spillovers rather than from market effects. Yet, the skills gap is not large since point estimates for high-skilled workers are only around 10% higher than for their unskilled counterparts. Because of the lack of the estimates' precision, the gap is not significantly different from zero most of the time.
A similar conclusion is reached for the impact of the city's migrant share for the year 2013. The estimated effects are very similar for skilled and unskilled workers, and even slightly larger for unskilled workers on total earnings. When the migrant effect is twice as large in 2002, the gap in favor of skilled workers is also larger, with the marginal impact being around 25% larger, though it is not significant. A larger positive impact of migrants on skilled urban residents was expected as migrants are more likely to substitute for unskilled workers in production functions.
To the best of our knowledge, a gender gap for urban returns has never been documented in the PRC. We do find small differences in favor of females, even though the size of the sample makes it difficult to obtain significant gaps. Gaps are also slightly larger for earnings per hour than for total earnings. Female workers are found to gain slightly more from city size (both higher density and larger land area). They also gain slightly more from the presence of migrants. A possible explanation for the larger benefits for females in cities may relate to household-level choices of which neighborhood to live in. If the location decision is made by men,  either because of cultural and social norms or because male earnings are higher, females have to adapt to the nonfreely chosen location with respect to their own job opportunities. Better job matches are more easily found in dense and large areas, and externalities are more important there than in less dense and smaller locations where job opportunities are less abundant. As a result, the gender gap should be lower in cities benefiting from characteristics such as larger size and a larger pool of migrants.

V. Conclusion
Gains from urbanization are evident in the PRC, where they seem to be larger than in developed economies. The nature of these gains has evolved rapidly and the city characteristics that shaped such gains in 2002 are different from those in effect in 2013. The presence of rural migrants and access to a seaport were the largest productivity drivers in cities in the PRC in 2002. City size, which is measured by both the intensive margin (employment density) and the extensive margin (land area), became the most important determinant over the next decade even as the migrant externality remained strong. In 2013, workers in cities benefited from more migrants, increased density, and a larger land area. This contrasts with developed economies, where density has the largest effect (and with much lower elasticity). Internal market access, which is almost the only other important local characteristic in developed countries, does not seem to play a major role in the PRC.
In this paper, we do not assess possible biases due to missing city variables or reverse causality. In earlier studies on developed economies, reverse causality was found to affect estimates by no more than 20%. Yet, this should be carefully evaluated in future studies on the PRC because high migration rates could make such biases larger. Furthermore, the land area of cities is also regularly updated by authorities in the PRC based on past development, which could affect the estimated impact. Similarly, we do not find evidence of any strong spatial sorting according to observed individual skills, but this issue could emerge progressively through the endogenous location choices of an increasingly heterogeneous population with regard to skills.
Lastly, the migrant externality we document is largely a black box that needs to be better understood. The impact of migrants is a mix of externality and substitution effects. Identifying these effects separately is needed to design consistent local policies. Identifying these channels separately requires working on larger data sets because some differences (for instance, between skilled and unskilled workers) seem to emerge from our results. Investigating in more detail the role of workers' allocation between industries and occupations, which sharply differs between local residents and migrants, is necessary too. Finally, as migrants themselves become a larger share of a city's population, measuring the impact of city characteristics on their own earnings would help to complete the analysis of urbanization gains in the PRC. Data limitations in the surveys used in this study did not allow us to investigate these additional issues.     2002, 2007, and 2013; 38 cities common to 2002 and 2013). Time dummies are included in all specifications. Standard errors in brackets. *** = p < 0.01, ** = p < 0.05, * = p < 0.10. Source: Authors' calculations.