Natural City Growth in the People's Republic of China

This paper analyzes the growth of Metropolitan Statistical Areas in the PRC between 1992 and 2013 by focusing on the night-light radiance—a measure of economic activity—of connected subcity places that we refer to as a natural city. This paper documents the rapid growth of natural cities in the PRC between 1992 and 2009 that was followed by a slight reduction in the size of some natural cities between 2010 and 2013 in the aftermath of the recent global financial crisis. Institutional factors—such as the location of places near Special Economic Zones, the ramifications of legal migration from rural to urban areas following reforms to the hukou (household registration) system, and infrastructure accessibility—are found to be important drivers of the integration of peripheral places into natural cities.


I. Introduction
With the increase in global population, the change in urbanization rates around the world is a dynamic phenomenon. While in 1994 only about 30% of the world's population lived in cities, as defined by national statistical offices, about 54% did in 2014. 1 In the People's Republic of China (PRC), which has been among the most dynamic economies in the world over the last quarter of a century, almost 25% of the population has moved to urban areas during the past 2 decades. The PRC's National New-Type Urbanization Plan, 2014-2020 targets an urbanization rate of 60% by 2020. While urbanization is often measured as the increase in the population within the administrative boundaries of cities, urbanization in a broad sense is driven by three phenomena: (i) the increase in population density (and economic activity) within the administrative boundaries of existing urban zones, (ii) the increase in population density (and economic activity) in areas in the vicinity of administrative urban zones through the growth of Metropolitan Statistical Areas (MSAs), and (iii) (to a lesser extent) the physical growth of the administrative areas of cities. 2 This paper focuses on the first two phenomena, which are objects of interest in the theoretical and empirical urban economics literature focusing on city growth; urban sprawl, which goes hand in hand with the formation of densely populated urban subcenters; and the decentralization of economic activity (see, for example, Fujita andOgawa 1980, 1982;Henderson and Mitra 1996;Glaeser andKahn 2001, 2004;McMillen and Smith 2003;Burchfield et al. 2006;Garcia-López, Hémet, and Viladecans-Marsal 2016).
Unlike in many other countries, the recent growth of cities in the PRC has been governed by regulations. The country's one-child policy, which had been instituted in its most restrictive form between 1978 and 2015, led to a slump in overall population growth, reduced the growth rate of cities, and slowed the average urbanization rate. Furthermore, the hukou (household registration) system has restricted the internal migration of people to urban centers by limiting access to public goods such as health care, schools, universities, and official housing. Finally, the inception of Special Economic Zones (SEZs) has ensured the protection of the private property rights of foreign investors, alleviated taxes and tariffs, regulated the policy of land usage, and liberalized economic and labor laws in geographically confined zones. According to Wang (2013), most major cities in the PRC's 326 municipalities hosted some sort of SEZ by 2006. A consideration of these regulatory provisions-apart from factors capturing the economic attractiveness and amenities in cities-appears relevant as they may lead to a gap between actual and optimal city size in the PRC, thereby affecting the associated economies of scale and scope (see, for example, Au andHenderson 2006a, 2006b;Desmet and Rossi-Hansberg 2013), which can result in potentially significant output losses.
The PRC's extensive investments in transport infrastructure, particularly road and railway networks, have fundamentally reshaped the structure of its urban areas. In the early 1990s, the Government of the PRC began to renew and upgrade its transport infrastructure, which caused previously underdeveloped regions to grow faster as industries started to decentralize (Banerjee, Duflo, and Qian 2012;Faber 2014;Baum-Snow et al. 2016, 2017. For example, Baum-Snow et al. (2017) find that suburban ring roads have displaced an average of about 50% of central city industrial gross domestic product (GDP) to the outskirts of cities, while marginal radial railroads have displaced an additional 20%. Similarly, Baum-Snow et al. (2016) argue that expanded regional highway networks in the PRC have had a negative average effect on local population density, causing a reallocation of economic activity and altering the structure of the country's cities.
The focus of this paper is on the growth of natural cities, which are defined as connected places with a minimum level of night-light radiance as a measure of place-and time-specific economic activity (Henderson, Storeygard, and Weil 2012), and which are associated with the PRC's 300 largest administrative cities over the period 1992-2013. One major merit of using remote-sensing data to define cities is that such data are available at much higher frequency than population census data. Furthermore, the data collection itself is much more homogeneous in terms of timing and concept. The data suggest that the PRC's natural cities grew rapidly between 1992 and 2010 before shrinking to some extent in the last few years of the review period, which might be attributable to the detrimental effects of the recent global financial crisis. We document this phenomenon for all cities in terms of descriptive statistics and illustrate it exemplarily for two major agglomerations, Beijing and Shanghai. This paper explores these developments using econometric analysis and identifies institutional factors-as reflected in the proliferation of SEZs and the provisions of the hukou system-and infrastructure accessibility as being important determinants of natural city growth. We highlight the effects of road and railway accessibility, and illustrate that shocks to infrastructure can be expected to induce relatively rapid adjustments in natural city size over the next 20 years.
The remainder of the paper is organized as follows. Section II introduces the definition of a natural city employed in this paper and outlines the measurement thereof. The data and their descriptive statistics, empirical strategy, and results are presented in Section III. Section IV concludes.

II. Natural City Borders in the People's Republic of China, 1992-2013
In this paper, we employ a definition of city boundaries based on what we call natural borders. Natural city borders relate to the well-known concepts of MSAs and Functional Urban Areas (FUAs), which measure city size by activity rather than administrative boundaries (see, for example , Zipf 1949;Krugman 1996;Eaton and Eckstein 1997;Harris, Dobkins, and Ioannides 2001;Ioannides and Overman 2003;Eeckhout 2004;Rozenfeld et al. 2011). A general motivation to use a city definition based on either MSA or FUA is that they capture more accurately the extent of urban units, going beyond (and sometimes integrating several units with) administrative boundaries. When looking at emerging urban areas, especially in transition economies such as the PRC, the study of MSAs and FUAs follows an economic rather than an administrative logic. We define the boundaries of natural cities based on the City Clustering Algorithm (CCA) (Rozenfeld et al. 2008, Figure 1.

Centroids of the 300 Biggest Administrative Cities in the People's Republic of China by Population, 2000
Source: Authors' illustration. Rozenfeld et al. 2011), which we apply to remote-sensing (night-light radiance) data collected from satellites (Burchfield et al. 2006;Henderson, Storeygard, and Weil 2012). We measure the average night-light radiance in places that are 3 kilometers (km) in length by 3 km in width. 3 We are facing a trade-off between portraying and approximating the boundaries of small cities, especially in the early phases of the sample period, and the tractability of the data, particularly the application of the CCA. 4 The former requires sufficiently small places and the latter sufficiently few places. For those reasons, the consideration of 3 km × 3 km places was the finest-grained grid we could use given the time constraints. In general, one major advantage of using remote-sensing data to define natural cities is that annual data are available between 1992 and 2013, while MSA and FUA data are based on population censuses and therefore only available at lower frequency. We consider the 300 biggest administrative cities in the PRC by population in the year 2000. 5 Figure 1 shows a map of the PRC and the location of the centroids of all 300 cities covered. Very few cities are located in the western PRC, while there is a particularly high density in the vicinity of the coastal belt, which is not surprising provided the high degree of economic activity through international trade in that area.
The objects of interest in this study are the aforementioned 3 km × 3 km places. We define natural city borders on a uniform grid of such places for all cities in the sample. On this grid, we assign a place to a natural city in a year if (i) the average night-light radiance on the square exceeds a value of 40; and (ii) it is located near a cluster of places with average night-light radiance over 40, including the place that contains the city centroid (based on the CCA algorithm). We employ Version 4 of the Defense Meteorological Satellites Program-Operational Linescan System to measure night-light radiance at the pixel level (Croft 1978). The remote-sensing (night-light radiance) data therein take on values between 0 (no light) and 63 (maximum light). Night-light radiance data per pixel are available for all years between 1992 and 2013 based on pictures from six different satellites (F10, F12, F14, F15, F16, and F18), with some years covered by two satellites. 6 We chose the data such that the number of satellites they come from is minimized (F10 for 1992-1993, F12 for 1994-1999, F15 for 2000-2004, F16 for 2005-2009, and F18 for 2010-2013). The data comprise a raw-data version as well as a stable-data version, where the latter ensures that the data are not conflated by fire or firework incidents, or clouds or any other weather conditions. In this paper, we use the stable-light data version and compute the mean of radiance across all pixels within each place. In the final data set, we include all those places that were assigned to be in a natural city in any year between 1992 and 2013, and we track these places over the entire review period.
Figures 2 and 3 delineate the natural city with its city centroid (black dot) and administrative boundaries for Beijing and Shanghai for the years 1992, 1998, 2007, and 2013. In every panel, gray grids represent places that constitute the natural city in that particular year. Prefecture-level administrative city boundaries are indicated in black. 7 In the case of Beijing, we observe that its natural city size grew remarkably over the entire sample period. Especially from 1998 onward, the natural city of Beijing grew outward toward the northeast, which could be partly related to the 1993 opening of the Airport Expressway linking central Beijing to the Beijing Capital International Airport. Additional infrastructure investments to improve airport connectivity (e.g., Airport Express Subway) in preparation for the 2008 Olympic Games may have also contributed to the northeast developing more rapidly than other parts of Beijing. Similar to Beijing, Shanghai's natural city grew over the entire review period and mostly integrated urban areas along the downstream part of the Yangtze River. The example of Shanghai illustrates that, especially toward the end of the review period, several administrative cities merged into one natural supercity. The natural city of Shanghai in 1992 contained only one administrative centroid, while by 2013 it had incorporated a number of formerly distinct administrative and natural cities along the Yangtze River into one natural supercity. However, in spite of the general growth of natural cities through 2007-2013, many natural cities, including Beijing and Shanghai, shrank between 2010 and 2013, most likely as a consequence of the global financial crisis (Figure 4). Table 1 reports average unconditional transition probabilities for natural city places for the whole sample of places considered. The table suggests that there is a high degree of persistence from 1 year to another: 92% of all natural city places keep their status, while about 90% of all places outside the natural city boundary remain outside that boundary from 1 year to another. The probability of acquiring natural city status amounts to 10%, while losing natural city status occurs in 7% of     Table 1, the average natural city size is expected to grow over the sample period. Figure 5 draws kernel density estimates of natural city sizes for the years 1992, 1998, 2007, and 2013. In each of the four panels of Figure 5, the horizontal axis shows the number of 3 km × 3 km places in a natural city. We observe that the average natural city size, reflected in the total number of places covered, increases remarkably with time. Especially in the beginning of the review period, the density mass is concentrated in the left tail of the distribution, indicating a great number of relatively small natural cities and only a small number of very large supercities in the sample. Later in the review period, the degree of dispersion in terms of natural city size increases and the density mass in the left tail of the distribution gets smaller.

III. Drivers of Natural City Growth
In this section, we introduce all variables included in the subsequent empirical analysis.

A. Data
We use average (night-light) radiance data in a 3 km × 3 km place i at period t as the dependent variable to measure economic activity in that area. The variable radiance it is continuous and censored from below as well as from above, ranging from 0 (no light) to 63 (maximum light). Information on the source and the processing of the radiance data can be found in section II.
We identify three key categories of variables that drive natural city growth: geographical, climate, and institutional. The geographical variables include distance measures, some of which are time variant (indexed by both i and t) and others that are not (indexed by i only): distance to the administrative city center (dist to center i ), distance to the administrative city border (dist to adborder it ), distance to the nearest waterway (dist to water i ), distance to the ocean (dist to ocean i ), distance to the nearest road (dist to road i ), and distance to the nearest railway line (dist to rail i ). The geographical variables include a binary indicator that is unity if a place lies within the administrative boundary of the city centroid and zero otherwise (within admin boundary it ). Except for dist to adborder it and within admin boundary it , which utilize annual data on administrative boundaries (at the county level) from the China Data Center at the University of Michigan, all distances are taken from OpenStreetMap using ArcGIS software. 8 Furthermore, we utilize topological information in the form of a measurement of altitude (altitude i ) from WorldClim Global Climate Data, and we control for the geographical location of each centroid by using information on its longitude and latitude from ArcGIS. The reason for including the latter two is that they relate to a place's accessibility. For instance, Chinese cities near the coast grew faster due to better accessibility to sea transport, which attracted foreign direct investment and was further stimulated by the formation of SEZs.
We use the following time-invariant climate data: average annual rainfall during the period of observation (rain i ); average annual temperature (temperature i ); and average annual temperature variation (sd temperature i , as measured by the standard deviation). Gridded climate data are available from WorldClim Global Climate Data.
The institutional variables represent two types of institutional changes that governed the PRC's urban growth: reforms in the hukou system and the formation of SEZs. Between the late 1970s and mid-2000s, a period which is referred to as the first wave of hukou reforms, restrictions on movement and work were eased, which led to a large inflow of rural workers into urban areas. In most provinces, the scale of reforms varied with city size. Generally, reforms have had little impact on institutions in the most attractive urban areas such as provincial capitals and large cities along the coastal belt. To capture the different effects, we introduce three binary indicators-small it , medium it , and large it -which are unity if a province applied their latest hukou reforms to small, medium, or large cities, respectively, and zero otherwise. A combined effect of these reforms is captured in the binary indicator hukou it , which is unity if any one of the three, two out of three, or all three city-size variables are unity, and zero otherwise. Time-variant information on the extent of the latest hukou reforms by province during the period 1998-2008 is available in Organisation for Economic Co-operation and Development (2013).
SEZs are geographic regions that are typically characterized by liberal economic policies designed to attract foreign investors and enhance economic activity. In this paper, we use the term SEZ as a generic term for all types of special economic zones and open areas, including Free Trade Zones, Economic and Technology Development Zones, and open coastal cities, among others. Wang (2013) characterizes four big waves in the formation of SEZs in the PRC (1979-1985, 1986-1990, 1991-1995, and 1996-2007) and lists the corresponding municipalities that were designated as SEZs in each of the first three waves. This allows us to code three different binary indicator variables (firstwave it , secondwave it , and thirdwave it ) of which the former two are time variant because of the time variation in administrative city boundaries. The third variable is time variant because in our coding there is no treatment of places and cities prior to 1995. We also include the combined effect of the three waves that is captured in the binary indicator SEZ it , which is unity if any one of the three, two of the three, or all three SEZ wave indicator variables are unity, and zero otherwise. Since the information on SEZs provided in Wang (2013) pertains to the municipality level, and while data utilized here vary by place, we assume that all places within the treated municipalities were affected by SEZs in the same way.

B.
Descriptive Statistics Table 2 summarizes the descriptive features of all variables by natural city status (within a natural city, Nat = 1; outside of a natural city, Nat = 0; Average), and reports the mean and standard deviation for each variable. Table 2 indicates that places within a natural city are on average 1.4 times closer to the city centroid than places outside of a natural city. Similarly, places inside are 1.1 times closer to the coast, 1.3 times closer to waterways, 1.6 times closer to the nearest road, and 1.6 times closer to the nearest railway line. As expected, places are on average much closer to the nearest road (0.3 km) than to the nearest railway line (3.5 km). We also observe that places inside a natural city are closer to the nearest administrative border since administrative areas close to the considered city centroids are smaller in the average year than areas outside of the considered administrative city centers. Places within and outside of natural cities do not differ in terms of their average location in terms of longitude and latitude, but they differ in terms of altitude: places inside natural cities have an average altitude 1.2 times lower than places outside. Only about 30% of all places in the data lie within the administrative boundaries of one of the 300 major city centroids in our sample. By comparison, 62% of all places are located inside natural cities in the average year. Finally, places inside and outside natural cities do not significantly differ in terms of average precipitation and temperature. Table 2 further suggests that places inside natural cities are more densely populated and more luminous in the beginning of our study period (1.5 times and 2.3 times, respectively). Over the entire study period, both places inside and outside of natural cities have a higher radiance level than they did in 1992. Places inside of a natural city appear to experience a relatively stronger increase in radiance during the study period. These places are also an average of 2.5 times more luminous than places outside of a natural city over the entire study period. Table 3 summarizes descriptive statistics (mean and standard deviation) for all time-variant variables by year (1992, 1998, 2007, and 2013). Table 3 suggests that the latest wave of hukou reforms (1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)) started impacting small cities-only 7.5% of all places in the sample were treated in 1998-before reaching medium-sized and large cities after 1998. Given that the hukou data are coded at the provincial level and that we consider the 300 biggest administrative cities in the PRC, it is not surprising that by 2013 almost 93% of all places in the sample had experienced some degree of hukou reform. Concerning the SEZ indicators, the first wave of reforms (1979)(1980)(1981)(1982)(1983)(1984)(1985) included a relatively small number of places, with only 2.6% of all places treated during this wave, whereas the second (1986)(1987)(1988)(1989)(1990) and third (1991)(1992)(1993)(1994)(1995) waves applied to more than one-third of all places in the sample. Consequently, about 62.3% of all places were assigned to an SEZ in 1995. Finally, Table 3 indicates that the average night-light radiance (radiance it ) increased from 19.4 in 1992 to 53.9 in 2013.

C. Econometric Approach
In this subsection, we outline the econometric model used to estimate coefficients on the suspected determinants of the (night-light) luminosity of place i in year t, radiance it . Two features of the dependent variable are worth mentioning: (i) it is censored from below at 0 and from above at 63, and (ii) it appears to be serially correlated. 10 To respect both the double censoring and autocorrelation through equicorrelation (accruing to the repeated observation of places over time and the presence of place-specific effects) and through inertia, we postulate a dynamic Tobit model with double censoring and random effects. We account for dynamic adjustment by letting radiance it be a function of its first-, second-, and third-lagged values R it = (radiance it-1 , radiance it-2 , radiance it-3 ), respectively, and estimate it along the lines of Wooldridge (2005). Accordingly, the endogeneity of the lagged dependent variables on the right-hand side of the model-through the presence of time-invariant random shocks, µ i , in the models-can be acknowledged by properly specifying the initial conditions of the process (Hsiao 2015).
Subsume all exogenous drivers of radiance it in the common vector X it and let α = (α 1 , α 2 , α 3 ) be the unknown parameters on R it and β be the unknown parameters on X it . Furthermore, let ϵ it be the (normalized) remainder disturbances in the processes. Then, we may introduce a latent, uncensored, normal counterpart to radiance it , radiance * it , and relate the two of them as follows: Moreover, we may specify the latent variable radiance * it in a linear fashion as a function of the parameters of interest through For the estimation of equation (2), we employ two alternative sets of initial conditions for R it . One involves the observed radiance in the initial year of the data, radiance i1992 , and the other one additionally involves the time averages of all time-variant variables in X it . Since the functional form of the dynamic Tobit model with double censoring is nonlinear and X it includes squared values of some of the determinants, we will report marginal effects only as is customary with nonlinear models. Table 4 summarizes the estimated effects of the lagged dependent variables associated withα, but only a subset of the effect estimates associated withβ. 11 For instance, we do not report the effects pertaining to variables used for the modeling of the initial condition with averages of the time-variant variables. Since the models are dynamic, the reported estimates should be interpreted as short-run effects materializing within a 3-year time window. Moreover, for the binary variables in X it (e.g., the four variables each relating to either hukou or SEZ), we compare the average of the conditional mean when the variable takes on a value of unity for all places with the one when the variable takes on a value of zero for all places (Greene 2012). In Column (1) of Table 4, we model the initial condition as a function of the radiance in the initial year, radiance i1992 . In Column (2), the initial condition additionally includes the time averages of all time-variant variables. On a final note, the magnitudes of the total short-run effects of continuous variables in Table 4 should only be compared across such variables after normalization (e.g., by scaling them with the standard deviation of the respective variables in Table 3).

D. Results
As the signs of significant effects do not differ qualitatively between Columns (1) and (2), and since the estimation of Column (2) is less efficient than for Column (1), we focus on the effects in Column (1). While we observe that the hukou and SEZ variables induce significant effects on radiance it , we skip discussion of those effects here for the sake of brevity. Similarly, we forego discussion of the effects of geography and climate that are also reported in Table 4. In what follows, we focus on the effects of infrastructure, particularly roads and railways, near a place.
Two things stand out regarding these effects: (i) greater distance to transport infrastructure-such as roads, railway lines, and waterways-reduces the night-light radiance of a place; and (ii) the magnitude of the marginal effect of ln(dist to road i ) is around five times larger than that of ln(dist to rail i ). Clearly, these effects on radiance it reflect the importance of transport infrastructure, particularly roads, for local economic growth across all places in the sample. 11 Table A.3 provides effects estimates akin to the dynamic Tobit model in Table 4 based on three alternative specifications that ignore censoring. These alternative models are linear models and always include satellite fixed effects. Apart from the infrastructure variables of interest, they are specified as follows: (i) the model in Column (1) does not include any other variables besides place fixed effects; (ii) the model in Column (2) is the same as in Column (1), but includes control variables; and (iii) the model in Column (3) is the same as in Column (2), but includes lags of the dependent variable and is an immediate linear counterpart to the dynamic Tobit model in Table 4. The results across these models and the dynamic Tobit model in Table 4     Admin = within administrative boundary, Nat = natural city, SEZ = Special Economic Zone. Notes: Reported coefficients are marginal effects. Standard errors are reported in parentheses. *** = p < 0.01, ** = p < 0.05, * = p < 0.1. All columns include squared terms for the following geography and climate variables: ln(dist to road i ), ln(dist to rail i ), ln(dist to ocean i ), ln(dist to water i ), ln(dist to adborder it ), ln(dist to center i ), ln(altitude i ), ln(rain i ), temperature i , sd temperature i . Column (2) includes time averages for all time-variant variables. All columns include satellite effects. All distance measures in the empirical estimation are in meters. Source: Authors' calculations.
In Columns (3)-(6), we estimate the same model as in Column (1) for various subsamples of the data. Columns (3) and (4) divide the sample between places inside and outside of natural cities, while Columns (5) and (6) separate places inside and outside of the administrative borders of the major cities in our sample. Interestingly, the effect of ln(dist to road i ) in Column (3) is larger than in Column (4), while the opposite is observed for ln(dist to rail i ). Similarly, Column (6) shows a significant negative impact of ln(dist to rail i ), while the corresponding estimate in Column (5) is much smaller and not significant. The differences in the effects between Columns (3) and (4) on one hand and Columns (5) and (6) on the other-both in absolute terms and compared with Column (1)-reflect differences in the opportunity costs of certain types of transport infrastructure depending on the relative centrality or peripherality of places relative to the natural city or the administrative city center. In general, these results indicate that a marginal decline in distance to the road network leads places inside the natural city to grow relatively faster than places outside of it. However, a marginal decline in the distance to railway lines benefits peripheral areas more than central ones. 12 Using the estimated effects from Column (1) in Table 4, we can predict the radiance level of all places from period to period and the change associated with an infrastructure improvement to the road or railway networks. We do so by reducing

Figure 7. Kernel Density Estimates-Observed versus Predicted Radiance Levels, All Years
Source: Authors' calculations. the distance to roads and railway lines by one standard deviation. We use 2007 as the benchmark year for this thought experiment since it is the year in which there are almost as many places outside (49%) as inside natural cities (51%). We predict the radiance level of all places in 2007 given the estimated coefficients associated with Column (1) of Table 4 and the variables in R it and X it as observed. We plot the kernel density estimates of observed and predicted radiance levels in 2007 in Figure 6. 13 Then, we shock ln(dist to road i ) and ln(dist to rail i ) alternatively by one standard deviation in 2007 and let the process run to see how such shocks impact radiance levels in the short and long term. Following the definition of a natural city used in this paper, we assume that any place will be part of a natural city in the counterfactual scenario if (i) its predicted radiance level amounts to at least 40, and 13 The kernel density estimates of observed and predicted radiance levels for all years are plotted in Figure 7 and reflect a similar fit as the benchmark year (2007).  (2012) 38.57 (2012) 100 85.57 (2017) 14.43 (2017) 100 96.91 (2022) 3.09 (2022) 100 99.73 (2027) 0.27 (2027)  100 Nat = natural city. Note: In addition to the radiance threshold, the City Clustering Algorithm condition is a necessary condition for a place to be assigned as a natural city (Nat = 1). The City Clustering Algorithm condition implies that a place is near a cluster of places with average radiance greater or equal to the threshold. Source: Authors' calculations.
(ii) it is connected to other places in the natural city with a radiance level of at least 40.
In Tables 5 and 6, we report effects of these shocks on radiance levels in 2007-as well as after 5, 10, 15, and 20 years-compared with the baseline predictions. Table 5 shows the effect of a shock on road infrastructure. Most places predicted to lie inside a natural city in the baseline case remain inside it in the counterfactual scenario after 5 years (99.7% in 2012) and after 20 years (100% in 2027). However, the share of places in the sample that are predicted to lie outside of the natural city in the baseline but inside of it in the counterfactual scenario steadily increases over time in response to the shock from 0.7% in 2007 to 30.2% in 2017 and to 83.4% in 2027. The magnitude of the effect is relatively high because the actual number of places not in a natural city after 2017 is relatively small by construction of the data set. 14 Figures 8 and 9 illustrate the examples of Beijing and Shanghai, respectively. 15 14 The data set includes only those places that were in a natural city at some point between 1992 and 2013. This implies that all places that are not yet in a natural city in 2007 have a high probability of becoming part of a natural city within a few years. 15 The sample includes only those places that are part of a natural city in any of the years covered.  (2012) 0.08 (2012) 100 99.91 (2017) 0.09 (2017) 100 99.96 (2022) 0.04 (2022) 100 99.98 (2027) 0.02 (2027)  100 Nat = 0 0 . 1 3 (2007) 99.87 (2007) 100 1. 02( 2012) 98.98 (2012) 100 4.18 (2017) 95.82 (2017) 100 10.47 (2022) 89.53 (2022) 100 15.34 (2027) 84.66 (2027)  100 Total 43.22 (2007) 56.78 (2007) 100 59.43 (2012) 40.57 (2012) 100 80.36 (2017) 19.94 (2017) 100 94.31 (2022) 5.69 (2022) 100 98.64 (2027) 1.36 (2027)  100 Nat = natural city. Note: In addition to the radiance threshold, the City Clustering Algorithm condition is a necessary condition for a place to be assigned as a natural city (Nat = 1). The City Clustering Algorithm condition implies that a place is near a cluster of places with average radiance greater or equal to the threshold. Source: Authors' calculations.
The picture is similar, albeit of a smaller magnitude, when looking at the effect of a shock on rail infrastructure as shown in Table 6. The extreme majority of places predicted inside a natural city in the baseline are also predicted to lie inside in the counterfactual analysis after 5 years (99.9% in 2012) and (100% in 2027). The share of places predicted to lie outside in the baseline but inside in the counterfactual (Nat = 0 in baseline, Nat = 1 in counterfactual) is also increasing over time from 0.1% in 2007 to 4.2% in 2017 to 15.2% in 2027. The smaller magnitude of the effect reflects the smaller magnitude of the coefficient of ln(dist to rail i ) compared to the coefficient of ln(dist to road i ) estimated in Column (1) of Table 4. 16 Our finding that transport infrastructure has a positive effect on local economic activity is well aligned with the findings in Banerjee, Duflo, and Qian (2012). They indicate that transport networks lead to higher levels of GDP per capita, even though the effect reported is small in magnitude. In line with  Baum-Snow (2007) and Baum-Snow et al. (2016), the results in our paper also suggest that better transport connectivity increases local economic activity in suburban areas. Considering the population density in city centers versus suburban areas, Baum-Snow et al. (2016, 2) suggest that "each additional radial highway displaced about 4% of [the] central city population to suburban regions and that the existence of some ring road capacity in a city reduced city population by about 20%." Contrary to these findings, we observe a positive effect of transport infrastructure on natural city growth, with a positive effect on both central and more peripheral areas of an average natural city. These results contrast with Faber (2014), who, looking at peripheral counties outside the commuting zones of metropolitan areas, finds that highway network connections have led to lower GDP growth among peripheral counties. This difference in findings suggests that transport networks have different effects on economic activity in remote areas than in metropolitan areas.

IV. Conclusions
This paper documents patterns in the size and growth of natural cities in the PRC for the 300 largest urban entities between 1992 and 2013. Rather than using administrative data on economic outcomes and their determinants, the paper identifies the boundaries of a natural city, which is related more closely to the notion of MSAs or Functional Urban Zones, in terms of the night-light radiance of connected places that measure 3 km × 3 km. Ultimately, the boundaries of natural cities are determined by applying the CCA to remote-sensing data for those places during the review period.
The key results of our analysis include the following. First, the number of distinct natural city centers decreased during the review period due to the absorption of some natural cities by others. This was particularly the case for larger cities, such as Shanghai, that formed natural supercities during the review period. Second, we detected rapid growth for the average natural city, which is in accordance with population census data that are only available at less frequent time intervals than night-light data, and adheres to the PRC's goal of fostering the rate of urbanization. The results suggest that natural cities grew considerably beyond the administrative boundaries of cities, which calls into question policies that target urbanization rates and other related development objectives based on administrative city boundaries. Third, the global financial crisis at the end of the last decade left its marks on natural city growth as some Chinese natural cities in our sample shrank between 2010 and 2013. Fourth, infrastructure improvements to the road and railway networks benefit agglomerations, although railway network improvements are expected to mainly benefit peripheral areas of cities more so than road improvements.