HIGHER EDUCATION EFFICIENCY FRONTIER ANALYSIS: A REVIEW OF VARIABLES TO CONSIDER

The measurement of efficiency in higher education has gained a growing interest in recent years, especially due to the expansion of the university system. This paper provides a review of the literature on efficiency in higher education institutions by covering empirical articles which applied frontier efficiency measurement techniques from 1997 to 2019. We review the methodological approaches used, both parametric and non-parametric techniques, such as Data Envelopment Analysis, Malmquist index and Stochastic Frontier Analysis. Secondly, we list the applied inputs, input prices, outputs, quality, and environment variables and based on the overview, we discuss the advantages and drawbacks of the different empirical proxy variables used. We address the importance of characterizing students and research funding as raw materials of both the teaching and research services, respectively, and we provide suggestions on how to deal with them empirically. We also discuss the difference between quality and environmental variables, and we give some practical indications to distinguish them in doubtful cases.


INTRODUCTION
The measurement of efficiency in higher education has gained growing interest in recent years, especially due to the expansion of the university system. With increasing enrolment rates all over the world, they are forced to employ increasing resources to achieve their goals. Avkiran (2001), characterize the universities productive process as one with a 'lack of profit motive 1 , goal diversity…, diffuse decision making, and poorly understood production technology'. Productivity and efficiency improvements are thence not easy to define and are sometimes viewed with distrust or rejected by insiders. They are often conceived as quality-insensitive cost reductions or managerial practices which do not contribute to academic goals or that they relax academic requirements on students to improve achievement indicators (Gates and Stone, 1997). In service sectors, productivity and efficiency are hard to measure. It is hard to identify and to measure outputs, the value added by each input, the simultaneous role of the consumer in the final outcome and as an input (e.g. personal effort devoted to study), and to account for environmental (contextual) and quality Review study respects. Productivity measures are rank-free indicators of the rate at which inputs are transformed into outputs. Technical efficiency is defined as the ability to minimize input usage for a given output (or to maximize output for given quantities of inputs). That is not the only efficiency measure. Allocative or cost efficiency is defined as the ability to optimize the input mix, given their prices, while economic or overall efficiency considers both, technical and allocative efficiencies. Which variables are considered in empirical studies of efficiency depends on the type of efficiency assessed: technical efficiency studies require data of physical inputs and outputs, while cost efficiency studies employ information of costs, physical outputs and input-prices. Universities have multiple objectives and outcomes, sometimes defined in a very general way. Some of them yield externalities or have public good features (that is, not rival consumption plus impossibility to exclude consumers, in issues such as social values building). Their goals and its relative importance are open to discussion. Many inputs are hard to quantify, which complicates their value-added attribution. In turn, some educational results, in words of Worthington (2001), "defy parameterization". Quality definition and measuring, common in almost all service activities, add complexity to the analysis. Outcome quality correlates with the quantity and intensity of human effort invested in the processes. It is not easy, to save or replace human involvement in the productive processes or to automatize it. This fact is common in services' sectors which differ from goods' production, where productivity can be increased by replacing or automatizing human effort with machines or software. E-learning and other forms of information technology effects on university efficiency are still unknown (D'Elia and Ferro, 2019). This paper contributes to the literature by discussing in a structured way the empirical articles on efficiency in higher education institutions which apply frontier efficiency measurement techniques. We review 89 empirical studies and almost 40 methodological and conceptual articles written in English between 1997 to 2019 on higher education efficiency frontiers. We first review the used methodological approaches, both parametric and non-parametric techniques such as Data Envelopment Analysis, Malmquist index and Stochastic Frontier Analysis. Second, we list the applied inputs, input prices, outputs, quality, and environment variables. Based on the overview, we discuss the advantages and drawbacks of the different empirical proxy variables used. Some aspects are outside the scope of this research: e-learning, economies of scale, analysis of efficiency in departments or other administrative units within universities, and another ways to address the performance of universities, such as qualitative analysis of accreditation agencies, partial productivity analysis and student's based value added. For a review of these aspects see the surveys from De Witte and López-Torres (2017), which includes all levels of education, Rhaiem (2017), which specialized in studies on research production efficiency, and Gralka (2018a) who focuses on parametric studies. Our research question is: which variables to include in the efficiency frontier studies of universities and how to proxy them? To answer it, we provide a review of the methodological framework commonly used in empirical research of efficiency in universities. This paper is intended to be useful for researchers who are planning to conduct an efficiency analysis, e.g. for a comparison of institutions within a country or among nations, either for political planning or for providing guidelines to the heads of administration with respect to which issues should be taken into account when dealing with efficiency in universities. After this introduction, the second section briefly summarizes the methodological approaches and materials. The third section analyzes the results in, four subsections: outputs, inputs and input prices, quality, and environmental variables). The fourth section is the discussion of the review. Finally, the fifth section includes the concluding remarks.

Methods
The empirical literature to measuring the efficiency in education has mainly used frontier methods in two forms: non-parametric (mathematical-programming) approaches and parametric (regression-based) (Furková, 2013). The most popular non-parametric technique is Data Envelopment Analysis (DEA). It determines which decisionmaking units (in this case, universities) form an envelope surface of the sample they belong. The efficient decisionmaking units are those yielding on the frontier, while those below it, are deemed as inefficient, since with the same inputs they produce less than their "peers" in the frontier. A score is attributed to each decision-making unit, based on how much it differs from the most efficient "peers". There are two types of envelopment surfaces: one assumes constant returns to scale or CRS (Charnes, Cooper, and Rhodes, 1978), and the other one supposes variable returns to scale or VRS (Banker, Charnes, and Cooper, 1984). Technical efficiency DEA models can be also input-oriented, outputoriented, or not-oriented. These orientations differ in terms of how is measured the distance to the frontier of each decisionmaking unit. As a generally deterministic method, all distance of each decision-making unit from the frontier is considered inefficiency; the method does not distinguish randomness, nor external noise affecting scores. In their standard variants, it is vulnerable to outliers and measurement errors. There are different DEA models' extensions, including twostage DEA, bootstrapping, and distance-function analysis . Besides, when efficiency is studied in different periods, productivity change of each decision-making unit can be decomposed as catching-up to the frontier, and frontier shifting-up. The Malmquist index separates both effects. Malmquist assumes CRS, which can be a restrictive assumption of the underlying technology. Another popular method is Hicks-Moorsteen Total Factor Productivity (TFP) index, which is calculated as the quotient between Malmquist output and input quantity indexes (Russell, 2018). A DEA model evaluates the efficiency performance of n decisionmaking units (universities), each one producing s outputs with m inputs. For each university, DEA solves an optimization problem seeking the optimal weights for the inputs, and for the outputs, which maximize the ratio among the weighted sum of output divided on the weighted sum of inputs. 1 We do not consider for-profit universities although they do exist in some contexts. See Sav (2012g).
DEA provides a scalar measure of the efficiency of a collection of decision-making units with a common set of multiple inputs and outputs, jointly with objectively determined weights for outputs and inputs (Charnes, Cooper, and Rhodes, 1978: 429). DEA objective is to measure the efficiency of resource utilization in every possible combinations, present in different organizations and technologies in use, to yield a measure to evaluate accomplishments, or resource conservation possibilities, for every decision-making unit with the resources assigned to it (Charnes, Cooper, and Rhodes, 1978: 443). DEA '…employs mathematical programming to obtain ex post facto evaluations of the relative efficiency of management accomplishments, however they may have been planned or executed…' (Banker, Charnes, andCooper, 1984: 1078).
Lacking engineering characterization of the underlying technology, which is a frequent problem in empirical economics, DEA method determines "relative efficiency" of each decision-making unit, by reference to "rankings" of the observed results (Charnes, Cooper, and Rhodes, 1978: 430). The efficiency measure (score) for any decision-making unit is obtained as the maximum ratio of weighted outputs to weighted inputs, subject to similar ratios for every decisionmaking unit being less or equal to unity. Following the Charnes, Cooper, and Rhodes (1978) notation, for n decisionmaking units (j = 1,…, n), s outputs and m inputs the problem is: where θ is the maximum ratio for decision-making unit 0, y r are the outputs (for r = 1,…, s), x i are the inputs (for i = 1,…, m), outputs and inputs being positive. The u r , v i are the weights yielded by the solution of the problem, by the data on all decision-making units which are being used as a reference set 2 . The efficiency of one decision-making unit of the sample is to be rated relative to the others, distinguishing it by "0" in the functional (but preserving its original subscript in the constraints). This decision-making unit has the most favorable weighting allowed by the constraints (Charnes, Cooper, and Rhodes, 1978: 430). An optimal θ * = max θ will always satisfy 0 ≤ θ * ≤ 1 with optimal solution values u r * , v i * > 0 (Banker, Charnes, and Cooper, 1984).
The "fractional program" presented in formula (1) can be converted to a "linear program", as in formula (2): Efficiency is defined as the quotient E r = y r /Y r , where y r is the actual output r produced by the decision-making unit under analysis, and Y r is the maximum feasible output obtained by the same input set, where 0 ≤ E r ≤ 1 (the score is thence relative to some maximum possibility). The weights are objectively determined to obtain a dimensionless E r scalar measure of efficiency from observational data, subject only to the constraints established in (1). Therefore, no other set of common weights will give a more favorable rating relative to the reference set (Charnes, Cooper, and Rhodes, 1978: 431). Model (1) can be converted into a linear program in two ways: inputoriented, and output-oriented versions. Here we are presenting the first version. In the same, the linear programming model is configured to determine how much could the input contract if used efficiently in achieving the same output level. In the output-oriented version (which formula we omit for brevity) the model seeks to determine how much could the output expand is same inputs' quantities are used efficiently. In the so-called CCR Model (named after the initials of the authors: "Charnes-Cooper-Rhodes" of Charnes, Cooper, and Rhodes, 1978), the set of efficient decision-making units form an envelope relative to observational data from all j = 1,…, n decision-making units. Productivity and technical efficiency are equivalent only when the technology exhibits constant returns to scale (CRS), and the Model produces an "overall efficiency" rating. The BCC Model 'extrapolate the performance of the most efficient DMUs [for decision-making unit] with efficient scale sizes (for their given input and output mixes) and identify any scale inefficiencies that may be reflected in the level of operations of other DMUs', leading to a "pure technical efficiency" rating (Banker, Charnes, andCooper, 1984: 1084), where the acronym BCC refers to the initials of the authors of this contribution, "Banker-Charnes-Cooper"). The BCC Model applies to technologies with variable returns to scale (VRS), which permits to compare the maximum average productivity attained at the most productive scale size with the average productivity at the actual scale of production to measure scale efficiency (Ray, 2004). Under VRS, it is possible to separate pure technical inefficiency from scale inefficiency. In this case, only decision-making units of similar scale are compared. Units deemed as inefficient under CRS assumption can be efficient once VRS is allowed 3 .
The regression-based approach estimates the parameters of a specific functional form for the production or cost frontier. The most popular approach is Stochastic Frontier Analysis (SFA), due to the seminar papers of Aigner, Lovell and Schmidt (1977) and Meeusen and van den Broeck (1977). The SFA models can be estimated with many different functional forms and error specifications, and with different types of quantitative data. 4 This technique decomposes the traditional random regression error term into two components: a normally distributed pure randomness term v (with zero mean and positive variance), and an inefficiency term u, (that assumes different statistical distributions). 5 For cross-sectional data, the production function can be represented as: 6 ( ) where for each decision-making unit j, j Y is the vector of actual output, j X is the vector of inputs, β is a vector of estimated coefficients, j u ≥ 0 is the production inefficiency and j v is a random error. In the case of panel data, repeated observations of the same unit j over several periods allow an estimation of unobserved producer-specific effects, that may affect efficiency but are not controlled by the producer. The general specification for production function can be written as: The variables are the same as in Equation (3) but they also include the change over time t. Unit-specific technical inefficiency can vary systematically, or it can be constant across time. Time-varying inefficiency models comprehends Cornwell, Schmidt, and Sickles (1990) and Lee and Schmidt (1993) models, Kumbhakar (1990) and the time-decay and the inefficiency-effects model of Coelli (1988, 1992). 7 Time-invariant inefficiency models are the random-effects model of Pitt and Lee (1981) and the fixed-effects version of the Schmidt and Sickles (1984) model. 8 These models ignore the possibility that is time-invariant heterogeneity may also be considered as inefficiency (Greene, 2005a). If this is the case, fixed and random SFA effects models may produce biased inefficiency estimates.
To address these shortcomings, Greene (2005b) proposed two models: the "true fixed effects" (TFE) and the "true random effects" (TRE) that allow to separate time-varying inefficiency from unit specific time-invariant unobserved heterogeneity.
To deal with observed heterogeneity, the most common approach is to parameterize the mean or the mode of the pre-truncated inefficiency distribution (Greene, 2008). Alternatively, the distribution of inefficiency can be rescaled, parametrizing the variance of the pre-truncated inefficiency distribution (Caudill and Ford, 1993;Caudill, Ford, and Gropper, 1995;and Hadri, 1999). Recent methodologies allow also separating transient from persistent or long-term inefficiency (Badunenko and Kumbhakar, 2016;Kumbhakar, Lien, and Hardaker, 2014;Tsionas and Khumbhakar, 2014;Filippini and Greene, 2016;Kumbhakar and Heshmati, 1995). Empirical results are not directly comparable, since they depend on the sample and on the method used. Nevertheless, Bauer et al. (1998) suggested a protocol to follow when the estimates to be compared are the result of different techniques. Their point is straightforward: results may not be equal, although they should be consistent. They propose six consistency conditions: (1) similar efficiency distributions, (2) similar ranking of the decision-making units, (3) the most efficient and most inefficient decision-making unit should be same among the rankings, (4) reasonable stability of efficiency along the time, (5) consistency with other performance measures (such as partial productivity or average costs), and (6) congruency with the real conditions of the activity under analysis. Of the former, conditions (1) to (3) are about internal (methodological) consistency, while conditions (4) to (6) concern about external (empirical) consistency. Table 1 groups the examined studies by methodological approaches: parametric and non-parametric, production and cost, cross sectional database and panel, etcetera. Of those articles which run quantitative estimates of efficiency 54 percent run non-parametric estimates, most being production frontiers, SFA comprehends 40 percent of the cases, mostly cost frontiers, and 6 percent uses both methods. Heterogeneity aspects, as well as the distinction between transient and permanent inefficiency, are present in the most recent SFA estimates. In our literature analysis, we examine 11 conceptual discussions on university efficiency frontiers, 5 surveys and 30 methodological studies. We reviewed studies from the following countries: 10 for the United Kingdom (UK), 15 for the United States (USA), 2 Because individual inputs and outputs need to be suitably and meaningfully aggregated, in the absence of market prices, which are the natural weights, DEA endogenously generates "shadow prices" of inputs and outputs for aggregation. Thence, the estimated weights can be understood as "shadow prices" (Ray, 2004). 3 For brevity we omit the input-oriented formulas since the underlying reasoning was explained above. In the same vein, in the case of panel data, repeated observations of the same unit j over several periods the variables also should include the change over time t.

4
The Cobb-Douglas production function is frequently chosen, because of its simplicity of estimation and interpretation. Another functional form commonly used is the Trans-Logarithmic because of its flexibility to accommodate quadratic and interaction terms between independent variables (Laureti, Secondi and Biggeri, 2014).

5
It is assumed that the distribution of the technical inefficiency (u i ) is usually half normal, truncated normal, exponential, or normal gamma.

6
In the case of the cost function, Y i is the vector of costs and the compounded error term defined as (v i +u i ).

7
"Time varying decay" or TVD model is developed in Battese and Coelli (1988), and "Time invariant" or TI model, is presented in Battese and Coelli (1992). 8 The mentioned u i can be constant across time in each decision unit i considered (that is u i = u). This assumption is made in a set of models with time-invariant efficiency: firstly in Pitt and Lee (1981), where u i is assumed a half-normal distribution with constant variance; secondly in Schmidt and Sickles (1984), in which the constant of the regression can be fixed or random; in the fixed-effect case, the unmeasured invariant component of inefficiency heterogeneity is included in the estimates' constants; and thirdly in Battese and Coelli (1988), where u i has a truncated-normal distribution with different than zero mean and constant variance. Instead, if u i varies across time t in each decision-making unit i (u = u it ), the model is a Time Varying Decay one. These include firstly, Kumbhakar (1990) in which u it = u i [1+exp(bt+ct 2 )] − 1. It is a flexible formulation where none probability distribution is attributed ex ante; secondly, Battese and Coelli (1992), where u it = u i exp[−η(t−T i )]; u i is assumed follows a truncated-normal, with mean different than zero and constant variance, while η explains the time pattern of inefficiency; and thirdly, Battese and Coelli (1995), where u it follows a truncated-normal in zero. 14 for Italy, 6 for Australia, 4 for Germany, 5 for Spain, 1 for Greece, 1 for Turkey, 3 for Brazil, 1 for the Philippines, 2 for New Zealand, 2 for India, 1, for Argentina, 1 for Bangladesh, 1 for China, 2 for the Czech Republic, 2 for Poland, 10 study two countries and 6 for transboundary studies on European Countries. With respect to the level of analysis, 76 articles study teaching, and 9 study research, none studies extension activities.
campuses ). There are complex substitution or complementarity interactions between teaching and research. On the one hand, there are potential scope economies among teaching and research; on the other hand, both consume resources and their rewards differ in the shortand long-run. Omitting research activities, implicitly, is such assuming no complementarities or substitutions exist among teaching and research (Horne and Hu, 2008). Teaching output is proxied as the number of degrees completed, sometimes distinguishing between undergraduates and graduates, results in standardized tests, head-count of enrolled students standardized by full-time equivalent, courses/ hours/credits taught to proxy the added knowledge, job or remuneration attainments by degreed to address students' potential of employment, earnings, or rate-of-return, and/or graduate students admitted. Research output is commonly proxied by published documents. They are measured by some weighted sum of articles, books or chapters, conference papers, etcetera, where the problem is how to weight the different impact factor and age of the academic products, because practices and traditions differ among disciplines. It is also complex to compute externalities from co-authorship. Other measures for research outputs include citation indexes, which measure the impact of the published research outcomes, head-count of approved dissertations, patents and other intellectual property rights, measured by the number of registers, attached with some criteria to weigh them, awards, with similar problems than the former, grants, project money and/or partnership with business. Various facts add complexity to measure research output: (1) Some research outcomes are not ex-ante observable or ex-post measurable (D'Elia and Ferro, 2019); (2) Unobserved research effort may well lead to no results, and conversely, given that "serendipity and luck may yield huge returns at little cost" (De Fraja and Valbonesi, 2012); (3) The research prestige of a whole university can be originated in a small group of researchers within that university; (4) Also, the account of outcomes may be based on historical achievements, not reflecting contemporaneous intellectual production (Johnes and Yu, 2008). Extension activities consist in generating public goods or external effects. On the one hand, they can yield good reputation for the university, leveraging fundraising or enrolment, although the connections are hard to establish. On the other hand, and because these activities include citizenship development (attitudes and values), they are in general hard to quantify. The extension services can include also cultural, sport and recreational events that can be difficult to value and to weigh, opinion or advice in community or societal issues, again difficult to measure and weigh, and non-formal education for out-of-campus groups, disadvantaged or not. The empirical analysis omits extension activities because of difficulty in quantifying their outcomes meaningfully, since externalities, not only in education, are challenging to measure (Salerno, 2003).

Inputs and input prices
The inputs can be classified in human and non-human resources (See Table 2). The former includes teaching and research effort of the university labor force and "raw materials", measured through full-time equivalent students to be taught, and the latter are physical and financial resources. Human resources are measured by the academic and nonacademic staff as headcount or salaries paid to different categories of personnel. Faculty headcount, with some weights attached, such as one for full professors, a different one for associates and the third one for assistants, is the most frequently considered input variable. Because some academics work in both teaching and research activities, the ratio of researchers or research workload over full-time academics can be calculated to attribute inputs to outputs. Non-human resources include facilities and materials, which can be measured in physical or financial units, such as surface of laboratories or classrooms, classroom seats, computers, books in libraries, etcetera, in the former, and hardware money expenditure in the latter. When costs frontiers are estimated, the unit prices of inputs result from some quotient between expenditure items and physical units employed: average labor cost of full-time academics of certain level, or an average cost for square meter of classroom, for instance.

Quality
Quality variables are present in less than 20 percent of the examined studies (see Table 2). Quality can and ideally should be assessed either in outputs or inputs, for fair and meaningful comparisons, through different coefficients or dummy variables. To address teaching activities quality, researchers use indexes of completion, achievements and recognition, given length, structure and contents of the programs, time dedication, and qualification of the staff, while in research, quality is related to value and impact. If these elements are ignored, results can be incomplete and probably biased. Quality is costly, and it is in the hands of the universities to allocate resources for its improvement. They can include drop-out rates as a proportion of the cohort, the faculty per student ratio, the staff expenditure on total expenditure ratio, the professorship or tenured academics ratio, the full-time researchers, teaching and/or management workload on total faculty. Impact factors and citation indexes account for quality in research. In empirical studies, expected signs of quality variables are negative in productive efficiency estimates since they consume inputs, and positive in cost estimates since they are costly. Nevertheless, more complex relationships can appear in the empirical work, since quality yields prestige which attracts talented professors and students, provided the system under analysis has a reasonable degree of mobility between universities. 2016), Titus, Vamosiu and McClure (2016).

Conceptual and Surveys
Surveys (5 papers

RESULTS
In this Section we review the main variables used to assess efficiency in education through the frontier methods discussed in previous Section. We first analyze the output variables considered in the different articles. We then make an overview of the input variables, quality and the contextual (environmental variables).

Outputs
University outputs can be classified in teaching (knowledge dissemination), research (basic or applied knowledge production), and extension (also known as transfer, public, community or "third mission") activities (See Table 2). The latter comprehends services which possess external effects and public goods aimed to varied audiences beyond

Environmental variables
Environmental variables are included in more than 70 percent of the analyzed studies (See Table 2). Those allow addressing for observable heterogeneity due to uncontrollable factors. The main difference between environmental and production or cost drivers is that the former influence technology structure, while the latter influence the efficiency with which the drivers are converted in outputs (costs) 9 . It can be distinguished three groups of environmental variables: students' intellectual, economic and social background (ethnic, age and gender characteristics of students); region where the university is situated (poor or rich); and type of university (big or small, old or new, private or public, for-profit or non-for profit, laic or religious, specialized or generalist, those teaching labor-intensive disciplines such as social sciences or humanities or capitalintensive disciplines, such as medical schools). With respect to students' background, the contextual variables include their intellectual background, measured through high school grades or results in selection exams, household socioeconomic conditions, proxied through the family income with respect to per capita GDP, parental qualification, measured by years of parents' schooling or degrees attained, full-time students on total students, gender and ethnic composition, foreign or out of the region students' proportion, and age of students. Related with the university region, some studies use the regional GDP with respect to national average, and some indication of the regional human capital, such as years of average education with respect to national average. Addressing the university type, studies include: size; ownership and governance, contemplating public or private ownership; non-for-profit or for-profit when this option exists, or laic or religious, degree of specialization in capital intensive disciplines to denote the different hardware intensity, typically considering the share of natural sciences, engineering and/or medicine on total, and the age, whether it is old or new with respect to a local system, in the understanding that history could matter in efficiency.

DISCUSSION
Universities produce teaching, research, and extension services. The latter are the most elusive, since they adopt mostly the form of external effects, difficult to parameterize. We did not find any empirical study including transfer activities in efficiency frontier studies. Teaching and research services, while simpler to proxy empirically than third mission services, are not always addressed jointly. A priori, it is unknown if economies or diseconomies of scope predominates, nor its intensity. If only teaching or research are included, the implicit assumption is that no scope economies or diseconomies exist. Most of nonparametric studies are intended to address technical efficiency, and in that context, it is easy to consider the multi-output perspective. While in DEA it is possible to consider multiple outputs, it is not possible to do the same in SFA production frontiers (save, when "output" is a composite or a bundle of products or services), while it is possible to consider multiple outputs in a cost frontier SFA estimate. The graduate head count is the more common output of the teaching service activity. It may underestimate outcomes, because of drop-outs consideration, that is students which consumed resources without achieving a certificate. It is important to consider whether using models with ratio variables or absolute variables because the methods for measuring efficiency are fundamentally different for such models. The same consideration is relevant with other input/output ratios. Results in standardized tests as an alternative measure of output is only possible if that kind of exams are practiced. It is worth recalling that student´s grades depend partially on the student´s capabilities, the university marking practices, and the quality of teaching and supervision given to students. Even when the number of students is a possible measure for teaching output, they are in fact the "raw material" of the process, that is, should be considered as an input (Salerno, 2003;. This fact is not always addressed and is one of the lessons of this study. Below, we propose a criterion to deal practically with the issue. Studies concentrated on research are less frequent, and the output is measured by two different ways: through bibliometric indicators of publications and / or counting patents and other intellectual property rights. Sometimes research funding is used as a proxy for research output. In fact, it is an input, since it does not guarantee some results will be achieved or even whether that money would be spent in the final output (Johnes and Yu, 2008). This fact is not always addressed in the same sense, and students are sometimes not considered as inputs, instead, they are treated as outputs. Again, we propose below a criterion to deal with this fact in empirical work. The second category of variables are those referred to inputs. As in the textbook production function where the output depends on labor and capital, in the context of universities these can be human and non-human resources (academic and non-academic personnel and facilities), plus the "raw material" of the process, students (for teaching services) and project or grant money (for research). Nonetheless, as stated, sometimes students and research money are considered as outputs. We propose as a possible solution to this ambiguity the following procedure: in DEA studies, correlating students with the output measure and research funding with the research measure. If correlations are positive, they are inputs; in SFA studies, analyzing the sign of the partial derivative of the estimated frontier with respect to students (research money): in a production function, the expected sign for inputs is positive. Human resources are usually proxied by head count or by money spent in salaries; non-human resources can be proxy by different physical measures of facilities or financial resources spent on them. The determination of meaningful input prices is also an issue when parametric cost functions are estimated. Typically, they are computed as a ratio between expenditure and some physical input measure. Quality variables try to address observable characteristics of inputs and outputs under control of the universities (present in 20 percent of the examined studies). Its omission can convey to biased results or misinterpretation of the results. Environmental variables encompass the differences in the context, out of the university control (empirically included in 70 percent of the analyzed studies). Students' socio-economic background is highly correlated with future performance of graduates thus it is a characteristic to be considered when data is available. At the same time, universities in some cases deliberately can select their students by socio-economic condition. Expected signs in inputs are positive in production estimates, input prices are positively related to costs in cost estimates, quality increasing aspects are positive in cost estimates (quality is costly) and negative in production estimates (quality improvements consume resources), while in environmental variables signs will depend on more casespecific aspects. For instance, consider the following possible environmental variables: old versus new universities, public versus private, socially diverse versus elitists one, specialized in arts, humanities, or social science, versus specialized in science. Old universities can be more attached to traditions than modern ones and being less prone to technical change; public universities can be very efficient in some environments, while not in others; ethnic diversity can yield a very rich environment of motivation or can be a load on efficiency if disadvantaged minorities need more than the average resources for reaching same attainments. Nonetheless, it is unambiguously more expensive a medicine or engineering school than a social science's one, because of the different intensity of facilities needed. The issue of distinguishing among quality and environment is easily solved in certain cases, while in others some ambiguity could appear. The delimitation criteria in our understanding is that "quality" is under control of the decision-making units: the unit is spending resources in some respect deliberately, while "environment" is not under control.

CONCLUSIONS
We explore the worldwide literature of efficiency frontiers in university systems by analyzing 89 specific studies published from 1997 to 2019. Most of the papers we review use non-parametric DEA models to estimate efficiency (54 percent), followed by SFA models (40 percent), and both methods (6 percent). Besides, we analyze 46 conceptual and methodological studies. Specifically, we are concerned with which variables to include in the efficiency frontier studies, why to consider, and how to proxy them. A fundamental part of the estimates is choosing appropriate variables to represent the production or cost process, and good proxies to measure them. In higher education, there is no consensus on which variables to include for outputs, inputs, input prices, quality, and environment, and even to model the production process and the cost structure. We concentrate in non-for profit universities and university systems as a whole and do not consider economies of scale and scope studies in universities, and on departments' or other administrative units to study efficiency within one university, as for example in Flégl and Vltavská (2013) or in Martín (2016). Graduates, publications, and patents are the most common outputs for teaching and research activities, respectively. Being the inputs human and non-human resources and stating students and research funding as the raw materials of the teaching and research processes, respectively. Quality variables address controllable input and output features, while environmental variables address the contextual and uncontrollable differences. Of the discussion in the literature, we can conclude the importance of characterizing students and research financing as raw materials of the teaching and research services, respectively, and we provide suggestions on how to deal with them empirically. Also, we clarify some discussion on the distinction between quality and environmental variables. In the near future it is expected more research on the role of heterogeneity of universities, more effort in addressing quality issues, without which some essential details can be lost, attempt to develop environmental variables to better capturing diversity, and more studies on the higher education segments not constituted by universities. Another important aspect is endogeneity and self-selection of good/wealthy students in good/wealthy universities. Universities can be chosen for a by-product as crucial as educational service itself, such as networking.
In services' sectors, the productive process and the cost attribution are more elusive than in goods' sectors. The complexity and subtlety of the processes demand great care in the definition and measurement of the variables. Our discussion, on the one hand, could help scholars trying to design empirical studies on university efficiency, and on the other hand could help policy makers to avoid unreflective cost or quality cuts based on partial productivity or average cost measures. 9 The literature discusses how to include environmental variables in efficiency estimates. In the past, a two-stage approach for including environmental variables was common, both in parametric and non-parametric approaches, however it was criticized by its limitations (Coelli et al., 2005;Simar and Wilson, 2007). In the first stage efficiency scores are estimated (without including environmental variables), and in the second stage the scores are regressed against explanatory variables. This procedure has two important econometric problems. Firstly, it assumes in the first stage that the efficiency terms are identically distributed in the estimation of the frontier model, while in the second stage the regression implicitly assumes that the scores are not identically distributed. Secondly, the explanatory (environmental) variables of the second stage must be assumed to be uncorrelated with the explanatory variables of the first stage. Otherwise, explanatory variables are omitted in the first stage, and thus the second stage estimates are biased. For these reasons, Battese and Coelli (1995) recommends a "one-stage" procedure, which solves these econometric problems, including the environmental variables in the single estimate of the efficiency frontier model.