The Effects of Formalization on Small and Medium-Sized Enterprise Tax Payments: Panel Evidence from Viet Nam

Do firms pay more taxes after formalization? The answer to this question is nontrivial. Tax noncompliance can be a persistent behavior among formerly informal firms. Analyzing the relationship between formalization and tax payments can also be challenging if nonswitching and switching firms have different characteristics. I use a panel dataset built from five small and medium-sized enterprise surveys conducted in Viet Nam from 2005 to 2013. By comparing nonswitching informal firms to switchers, I show that switchers are more likely to pay taxes and to pay a higher amount, thereby confirming heterogeneity. By comparing switchers before and after formalization, I find that formalization increases tax payment likelihood by 20% and the tax amount paid by 93%. A control function approach indicates that my results are robust to potential endogeneity of formalization. Therefore, this paper provides supportive evidence for a key public policy rationale to promote formalization: increased tax revenues.


I. Introduction
Existing research has mainly focused on the private costs and benefits of formalization for informal firms, with available evidence suggesting that formalization can have a positive effect on firm performance (McKenzie and Sakho 2010;Fajnzylber, Maloney, and Montes-Rojas 2011;Rand and Torm 2012;Bruhn and McKenzie 2014;Boly 2018). 1 In contrast to most previous studies, this paper asks whether a government can also benefit from formalization through additional tax payments resulting from firms opting out of the informal sector.
Tax payments are costs to firms, and these costs will increase as a direct consequence of formalization only if there is (partial or full) compliance with formal tax regulations. Yet, noncompliance with formal tax or labor regulations can persist among formerly informal firms; noncompliance is a common phenomenon even among formal firms, including in member countries of the Organisation for Economic Co-operation and Development (OECD 2008). 2 In consequence, the claim that formalization increases government revenues is ultimately an empirical question, the answer to which can strengthen (or not) a key public policy rationale for promoting the formalization of small and medium-sized enterprises (SMEs). 3 Analyzing the effects of formalization on informal firms is challenging due to potential firm heterogeneity. Heterogeneity can come from the fact that firms opting out of the informal sector have different characteristics (e.g., owner capabilities and firm preferences) compared with those that remain informal. Moreover, unobserved characteristics that affect firm outcomes may lead to formalization; if, for example, more successful firms become more visible, leading to a higher likelihood to register formally (McKenzie and Sakho 2010;Fajnzylber, Maloney, and Montes-Rojas 2011).
To study the relationship between formalization and tax payments, I applied regression analysis to a panel dataset compiled using five SME surveys from Viet Nam. I restricted the dataset to informal firms that remained informal throughout the survey periods and to informal firms that became formal at a given point in time (referred to as switching firms or formalized firms). Informality is a multidimensional concept that is difficult to define. However, for my purpose, formal firms are defined as those that registered to pay tax (obtained a tax code), which is a commonly used indicator of formality in the literature Sakho 2010, Rand andTorm 2012). 4 McKenzie and Sakho (2010) hypothesize that a profit-maximizing firm becomes formal if and only if the expected present discounted value of the net benefits from doing so outweighs the upfront costs: T t=1 δ t E(π F,t − π I,t ) + θ law-abiding > C Money + C Time + C Information where π F,t denotes the firm's profits if it is formally registered at time t, and π I,t denotes the firm's profits if it is not formally registered at time t. θ law-abiding denotes the utility benefit to firm owners from obeying the law and feeling they are contributing to national welfare through paying taxes. C Money , C Time , and C Information denote the monetary, time, and information costs of registering, respectively.
2 See also Basu, Chau, and Kanbur (2010) and Tedds (2010). 3 See Bruhn and McKenzie (2014), who consider that the claim that formalization is socially optimal because it increases government revenues and reinforces a culture of respecting the rule of the law, requires more research. 4 An operational definition based on business registration would be consistent with the International Labour Organization's (2003) definition of informal sector enterprises, which are basically "unregistered and/or small-scale private unincorporated enterprises that produce goods or services for sale or barter." Other criteria of informal employment include employment contract registration, provision of social security protection, and size of the However, the absence of registration does not mean that the informal sector is not taxed, as a nonnegligible fraction of informal firms pay some sort of taxes in Viet Nam, albeit mostly local taxes (Cling, Razafindrakoto, and Roubaud 2011, p. 33). 5 It is therefore important to ascertain that formalization leads to additional tax payments by switching firms beyond and above what was already paid.
Using the formal status variable (status equals 0 if a firm is informal and 1 if formal), I constructed a variable called switcher, which equals 1 for all years in which a switching firm has been observed in my sample, including the years before formalization, and 0 if the firm remained informal throughout the survey periods. The switcher variable, as firm-type fixed effects, allows for capturing heterogeneity between firms remaining informal and firms that switched out of the informal sector at some point. As a result, when included in my regression, the variable status, which takes a value of 1 only for switching firms after they have formalized, captures the net effect of formalization on switching firms.
By comparing nonswitching informal firms to switchers, I confirm heterogeneity between switching firms and nonswitching firms regarding tax payments: switchers paid a significantly higher amount of taxes than nonswitchers. Then, by comparing switchers before and after formalization, I find that formalization increases the likelihood of tax payment (by 20%) and the amount of taxes (by 93%) relative to preformalization levels. This significant increase in tax payments persists both in the short and long term. Using a control function approach, I show that my results are robust to potential endogeneity of formalization. I also find that the increase in tax payments is mainly driven by a significant increase in the payment of taxes such as license fees or import and export taxes, but not in the payment of revenue taxes that are arguably more difficult to collect. Firm size, previous performance, and compliance inspection all have a positive relationship with both the amount and likelihood of tax payments.
The remainder of this paper is organized as follows. Section II briefly presents an overview of the existing empirical literature on the effects of formalization. I describe my dataset in section III and explain the empirical approach in section IV. Section V presents the main results and section VI concludes.

II. Literature Review
The literature on the relationship between formalization and firm performance has mainly focused on the private benefits for a firm. Comparing employer (Henley, Arabsheibani, and Carneiro 2009). However, compared to having a tax code, these other definitions may not be best fits, given that I am seeking whether formalization leads to additional tax payments. 5 See also Olken and Singhal (2011) for a discussion of informal taxation in 10 developing economies, including Viet Nam. firms that were created immediately before and after a business tax reduction and simplification scheme in Brazil, Fajnzylber, Maloney, and Montes-Rojas (2011) find that this scheme led to increased levels of registration and to higher revenues, profits, and employment among registered firms. Likewise, Sharma (2014) suggests that registration leads to significant gains in both sales and value added per employee in India. In Bolivia, McKenzie and Sakho (2010) analyze the effects of formalization on firm profits, using the physical distance between a firm and the tax office as an instrument. The assumption is that being closer to a tax office increases the probability of registration. Their findings suggest that the effect of tax registration is positive but heterogeneous. Related to the research question in the current paper, McKenzie and Sakho (2010) also show that registered firms are more likely to pay taxes, but not significantly more likely to be paying a larger share of their profits as taxes.
In contrast to most previous cross-section studies, Rand and Torm (2012) use the same panel data as in this study, but only for 2007 and 2009. Their results indicate that registration leads to an increase in profits and investments for Vietnamese SMEs. Boly (2018) shows that becoming formal can further increase gross profits and the value added of switchers compared with preformalization levels, both in the short and medium term. The present study is closest to Boly (2018) as it uses the same dataset and empirical approach, while focusing on analyzing the relationship between formalization and tax payments. As mentioned previously, the majority of the existing literature focuses on the private benefits of formalization for a firm, leaving an evidence gap on the potential social benefits, specifically those accruing to governments.

III. Data
The description of the dataset used in this study parallels that in Boly (2018), restricted to informal and switching firms. The dataset comes from SME surveys conducted in Viet Nam in 2005, 2007, 2009, 2011, and 2013. The surveys, which were conducted by the Central Institute for Economic Management and the University of Copenhagen, cover about 2,500 firms in each year. They were carried out in 10 locations in the cities of Ha Noi and Hai Phong, in Ho Chi Minh City, and in the rural provinces of Ha Tay, Khanh Hoa, Lam Dong, Long An, Nghe An, Phu Tho, and Quang Nam.
The population of nonstate manufacturing enterprises was based on two data sources from the General Statistics Office of Viet Nam (GSO): the Establishment Census 2002 (GSO 2004) and the Industrial Survey 2004-2006(GSO 2008. A representative sample of registered household and nonhousehold firms was drawn from this population, using a stratified sampling procedure. The aim was to ensure the inclusion of an adequate number of enterprises in each province with different ownership forms. For reasons of implementation, the survey was confined to specific areas in each province or city. In addition, the GSO enterprise census focused only on "visible" firms, which are those with fixed professional premises. Informal household firms were included in the SME surveys based on random onsite identification within the survey districts observed by the enumerator. With such an identification approach, the informal firms included in the survey are those operating alongside officially registered enterprises. These informal firms may be relatively more competitive (and profitable) compared to informal firms clustering in areas with no or very few formal firms (Rand and Torm 2012). In this regard, the sample of informal firms in this study may not be fully representative of the informal sector as a whole in Viet Nam.
Despite the aforementioned weakness, my dataset remains unique by the number of survey years (5), number of firms, and focus on the informal sector. I restricted my sample to firms with at least two observations. I also excluded firms that were formal when initially entering the sample, given that my interest is in informal firms and whether they pay more taxes after formalization. The restricted sample is dominated by informal nonswitchers, which account for 66% of the total number of firms, while switchers account for 34% of the total number of firms (Appendix Table A1.1). 6

IV. Empirical Approach
To examine the relationship between formalization and tax payments, I use the following specification: where y it is the dependent variable; D S i captures firm-type fixed effects (1 if switchers, 0 if nonswitchers); F it indicates whether a firm has become formal or not (0 if a firm is informal, and 1 if the firm is formal); X it represents additional control variables; λ t denotes a full set of time dummy variables; i indexes individual firms; and t indexes time. As specified, my approach is comparable to a differencein-difference approach with varying treatment years.
The difference-in-difference specification (between switchers and nonswitchers) rests on the parallel trend assumption in tax payments before switching. A graphical check of this assumption suggests that the parallel trend assumption holds well between 2005 and 2007 and between 2007 and 2009, but not between 2009 and 2011. As a robustness check, I run the above specification for different waves of the survey (see section V.C.3).
A key feature of my approach is the construction of a switcher variable denoted by D S i , using the variable F it (0 if a firm is informal, and 1 if the firm is formal). If a firm has shifted out of the informal sector at any time in my survey periods, the switcher variable equals 1 for all years in which the firm has been observed in my sample (including the years before formalization); the switcher variable equals 0 if the firm remained informal throughout the survey periods. The (informal) nonswitcher group is used as control group in my regressions. The inclusion of firm-type fixed effects in my main regressions (using dummy variable D S i ) enables me to account for time-invariant heterogeneity between nonswitching and switching firms. As a result, the variable F it , which takes a value of 1 only for switching firms after they have formalized, picks up the net effects of formalization. However, a known limitation of the fixed-effects approach is that endogeneity due to time-varying omitted variables is still present, although the bias gets smaller than with cross-sectional data. I therefore discuss robustness to endogeneity in section V.C.2.
In addition to the previously mentioned variables, I include several control variables that could affect firms' decisions regarding tax payments. These control variables are summarized in Appendix Table A1.2 for the pooled sample and by firm type. 7 The control variables include (i) gender of the owner or manager (1 if male, 0 otherwise); (ii) education level of the owner or manager (0 if secondary school has not been completed, 1 otherwise) to proxy for the owner's or manager's human capital; (iii) firm's previous performance using previous year's gross profits; (iv) number of regular full-time employees (in logarithmic form), as well as the square, to control for firm-size effects; (v) whether or not the firm holds a certificate of land use rights to proxy for property rights; (vi) government inspections (0 if the firm has not been subject to inspections in a given year, 1 otherwise); and (vii) dummy variables to control for location and time factors.

V. Results
After briefly discussing some summary statistics, I present results on the relationship between formalization and total tax payments, my main variable of interest, using a Tobit regression that considers both the binary participation decision and the amount paid. 8

Average Total Tax Payments-Evolution around Formalization Year
Note: Formalization year is equal to 0. Source: Author's calculations.

A. Summary Statistics
As mentioned earlier, Appendix Table A1.2 describes the dependent variables (for the pooled sample) by firm type, that is, nonswitchers and switchers (and for switchers: overall, before switching, and after switching). I observe that nonswitchers paid some form of taxes 57% of the time, while switchers paid taxes 90% of the time (overall), 83% of the time before switching, and 97% of the time after switching. The differences are significant at the 1% level between nonswitchers and switchers before switching on the one hand, and between switchers before and after switching on the other hand.
The total tax payments made by switchers before switching (D1,778) are significantly higher than that of nonswitchers at the 1% level (D334). 9 Switchers also pay a significantly higher amount of total taxes after joining the formal sector (D3,792). Overall, the average total tax payments of switchers (D2,883) is about 8.6 times higher than the total tax payments of nonswitchers (D334); switching from the informal to the formal sector resulted in tax payments increasing more than twofold from D1,778 to D3,792.
The figure uses an events-study graph to provide an illustration of the evolution of tax payments for switchers. In this graph, the year of formalization is set at 0 on the x-axis. Negative numbers refer to years in the preformalization period and positive numbers to years in the postformalization period. Although there is an increasing trend in tax payments in the preformalization period (−8; −2), I observe a significant jump in the short term, postformalization period (0; +2); this increase in tax payments persists in the long term (+4; +6) compared to the preformalization period (−8; −2). The preliminary analysis above therefore suggests that tax payments increase after formalization.

B. Panel Regression
I also study the relationship between formalization and tax payments using a random effects Tobit regression (left-censored at 0). The results in Table 1 (Model 1) show that switchers' total tax payments are significantly higher than nonswitchers. I therefore find heterogeneity between switching and nonswitching firms; that is, switchers tend to pay higher taxes while in the informal sector than nonswitchers do. This heterogeneity has been typically assumed or differenced out in most of the previous studies on formalization.
The amount of switchers' tax payments, both before and after they formalized, is analyzed by looking at the coefficient of switcher (after formalization). I find that formalization increases switchers' tax amounts significantly compared to preformalization levels ( Table 1, Model 1). For switchers, the mean marginal effects of formalization on the expected value of the censored outcome is about 93%, and the mean marginal effects of formalization on the expected value of the truncated outcome is about 73%, controlling for province and year fixed effects.
To study the effects of formalization over the short and long term, I use two dummy variables that reflect the length of time since formalization. The first dummy (short term) is 1 for firms that have been formal for 2 years or less, while the second dummy (long term) is for firms that have been formal for 4 years or more. 10 As can be seen in Table 1 (Model 2), the increase in total tax payments is observed both in the short and long term. However, the coefficient for total tax payments in the short term (2 years or less after formalization) is significantly higher than the coefficient for tax payments in the long term (4 or more years after formalization), suggesting a decrease in total tax payments over time. Alternatively, the higher coefficients in the short term for total tax payments may be capturing initial entry costs into the formal sector.
Several other control variables are noteworthy in Table 1. First, lagged gross profits have a positive and significant relationship with tax payments at the 1% confidence level, indicating that more successful firms pay more taxes. Second, firm size also has a positive and significant relationship with tax payments, likely explained by the fact that it is more difficult for larger firms to hide their activities. Finally, undergoing at least one compliance inspection is positively related to the amount of total taxes.

C. Robustness Check
In this section, I test the robustness of my results to measurement errors or possible endogeneity.

Likelihood of Tax Payments
An advantage of the dataset used in this paper is the provision of hard evidence on tax payments made by informal sector firms. However, asking about tax payments through traditional survey techniques can be problematic as firms may decide to misreport. 11 Given these possible reporting errors, I conduct a robustness check by using a dummy variable that equals 1 when tax payments are strictly positive (see summary statistics in Appendix Table A1.2). This dummy dependent variable reflects the binary participation decision to pay or not to pay taxes; it is estimated using a random effects logit regression. Table 2 shows that switchers' likelihood of paying taxes is significantly higher than nonswitchers (Model 1). The results also suggest that becoming formal is positively and significantly associated with an increased likelihood to pay taxes when looking at the coefficient of switcher (after formalization). Table 2 also shows that an increased likelihood of paying taxes is observed both in the short and long term (Model 2).

Endogeneity 12
This section analyzes the robustness of my results to potential endogeneity of formalization, using a control function approach (Wooldridge 2015).
As a first step, I estimate a model of the endogenous explanatory variable: where 1[.] is the binary indicator function; F it , the dependent variable, is a dummy variable that takes a value of 1 if a firm is formal and 0 otherwise; X it are control variables described earlier in section IV; I it corresponds to a set of exogenous variables that are omitted from equation (1) and that are partially correlated with formalization; Z it = (X it , I it ); and ν it is an error term. As a second step, I compute generalized residuals,r it , based on results obtained from equation (2) as follows: where λ(.) = ϕ(.)/ (.) is the inverse Mills ratio. As a third and final step, I reestimate equation (1) by addingr it as an additional regressor to control for endogeneity.
To construct I it , I take the annual provincial-level averages for the following two binary variables: access to powered equipment (1 if with access, 0 otherwise), and bribe payments (1 if the firm has made any bribe payments in a given year, 0 otherwise). Here, I restrict my sample to only always-formal firms and formalized firms (i.e., switchers once they have switched), with the latter included in the year following formalization. As noted in section III, informal firms in the sample were selected based on random onsite identification in the neighborhoods of formal firms. In Viet Nam, over 80% of formal firms consider that registration is beneficial, while nearly 50% of informal firms do not see any value to it (Cling, Razafindrakoto, and Roubaud 2012). My exclusive restriction assumption is that informal firms are more likely to formalize when they can observe formal firms' characteristics and potentially attribute those characteristics to formalization. Yet, these formal firms' observable characteristics will not affect informal firms' tax payment behavior. The results of the first-step regression in the first columns of Table 3 (Formal, Probit) indicate that bribes paid by a formal firm have a negative and significant effect on the likelihood of formalization, while access to powered equipment has a positive and significant effect. As a result, I it fulfills a requirement of the control function approach that at least one exogenous variable that is omitted from equation (1) be partially correlated with the dependent variable in equation (2).
The results of the third step (in the control function approach described above) are also presented in Table 3, both for the total tax amount and the likelihood to pay tax. They suggest that endogeneity is not an issue. This provides supportive evidence that formalization can have a positive effect on switchers' total tax payments and that this effect can persist over time for the amount paid but not the likelihood to pay tax.

Unbalanced Panel
This paper uses an unbalanced panel, which includes only firms with at least two observations. However, if firms that remain informal are more likely to drop out of the panel (as formalized firms are more visible and easier to locate in subsequent rounds), then this would bias the estimates derived from firms that remain in the panel. As a first robustness check to the unbalanced panel issue, I run regressions using the full sample, or the sample of firms with at least three, four, or five observations. In the second check, I keep firms with at least two observations but vary the number of survey waves, starting with the first two waves in 2005 and 2007. In all cases, the full model (including controls) is estimated. However, in Appendix Table A2.1 (varying sample size) and Appendix Table A2.2 (varying number of waves), I present the coefficients for only the variables switcher (from informal to formal) and switcher (after formalization) to save space. My results show that switching firms are different from informal nonswitchers and that the effect of formalization is positive and significant at conventional levels for all samples.

D. Discussion
To explore where the increase in tax payments after formalization comes from, I make a distinction between revenue taxes and other taxes, with the latter obtained by excluding revenue taxes from total tax payments. Such a distinction is motivated by the difficulty to monitor and collect income taxes compared with other types of taxes such as license fees or import and export taxes. This difficulty partly explains the use of presumptive taxes for taxing informal sector activities (Joshi, Prichard, and Heady 2014;Dube and Casale 2016).
Regression analysis indicates that both the likelihood of payments and the amount paid in revenue-related taxes and other taxes by switchers are significantly higher after formalization (Appendix Table A2.3). However, a large share of additional taxes after formalization appears to come from other types of taxes, not revenue-related taxes. Indeed, switchers report only a small increase in revenue taxes paid before and after switching, based on the sample overall average (D1,152 before switching and D1,175 after switching); while other tax payments appear to increase significantly at more than double the amount paid in revenue-related taxes (D625 before switching and D2,616 after switching). Such a result could suggest that some firms formalize when they realize that in order to start importing or exporting, they need a tax identification number to make subsequent traderelated tax payments. To some extent, these results also confirm that revenue-related taxes can be challenging to collect, even after formalization, compared with other types of taxes such as license fees or import and export taxes. Arguably, collecting trade taxes mainly requires being able to control trade flows at major entry points (e.g., ports, airports, or land borders), while collecting income or sales taxes requires major investments in enforcement and compliance structures throughout the entire economy.
Overall compliance with tax obligations is another interesting aspect that can be considered when analyzing tax payments. To obtain an estimate of the degree of overall compliance, I compared reported total taxes paid to revenue levels by switchers, before and after formalization. However, a precise computation of the level of compliance would require a tax calculator, which is beyond the scope of this paper. The results on compliance should therefore be treated with caution. I find that, using an overall average, the total tax-to-revenue ratio of switchers slightly increased from 1.26% before formalization to 1.44% after formalization. 13 Using fractional regression, given that the ratio of total tax to revenue is between 0 and 1, I find that the increase in the compliance level is significant at the 1% level. 14 Notably, the compliance rate of switchers, even after formalization, remains below that of always-formal firms, which stands at 3.47% in my sample, suggesting that switchers may still not be fully compliant with their tax obligations even after joining the formal sector.

VI. Concluding Remarks
Using a panel dataset consisting of five waves of SME surveys in Viet Nam, this paper analyzes the relationship between formalization and tax payments. Such an analysis can be challenging because of potential firm heterogeneity, due to the fact that firms choosing to formalize can have different underlying characteristics compared with the ones that remained informal.
To control for unobserved heterogeneity, I created dummy variables that distinguish between two groups of firms in my sample: those that remain informal and those that switch to the formal sector at some point. As a result, when the variable status, which takes a value of 1 only for switching firms after they have formalized, is included in my regression, it picks up only the net effects of formalization on tax payments.
My results show that switching firms are different from informal nonswitching firms regarding total tax payments. Such heterogeneity is typically assumed in most previous studies on formalization but not explicitly assessed. After formalization, I observe a significant increase in the amount and likelihood of tax payments, both in the short and long term. These results are mainly driven by a significant increase in the payment of other types of taxes-such as license fees, import and export taxes, and property taxes-not in the payment of revenue taxes.