ASSESSING AND CLASSIFICATION OF ACADEMIC EFFICIENCY IN ENGINEERING TEACHING PROGRAMS

This research uses a three-phase method to evaluate and forecast the academic efficiency of engineering programs. In the first phase, university profiles are created through cluster analysis. In the second phase, the academic efficiency of these profiles is evaluated through Data Envelopment Analysis. Finally, a machine learning model is trained and validated to forecast the categories of academic efficiency. The study population corresponds to 256 university engineering programs in Colombia and the data correspond to the national examination of the quality of education in Colombia in 2018. In the results, two university profiles were identified with efficiency levels of 92.3% and 97.3%, respectively. The Random Forest model presents an Area under ROC value of 95.8% in the prediction of the efficiency profiles. The proposed structure evaluates and predicts university programs’ academic efficiency, evaluating the efficiency between institutions with similar characteristics, avoiding a negative bias toward those institutions that host students with low educational levels.


INTRODUCTION
The teaching of science, engineering, technology, and mathematics (STEM) is a critical aspect of countries' development. Different studies reveal positive association factors between economic growth and the number of professionals in STEM areas (Hoeg and Bencze, 2017;Sharma and Yarlagadda, 2018;Suter and Camilli, 2019). Bianchi and Giorcelli (2020) demonstrate how countries with better levels of science education have higher levels of innovation, represented in patent-for-invention registrations. Corlu and Aydin (2016) show that teaching in STEM areas generates higher levels of business creation. Therefore, it is essential to generate objective assessment tools for teaching STEM-related careers. Thus, this study presents a databased model to analyze the fundamental characteristics and relationships of engineering education programs and the results of a standardized assessment to achieve academic efficiency. However, it is crucial to highlight the inequalities in terms of access, resources, and opportunities in higher education. So, to avoid the biases that represent the different levels in the basic academic competencies with which students access university education, the comparison of the programs must be fair, that is, comparing between equals. Consequently, this study identifies homogeneous groups of engineering programs to analyze and forecast their level of efficiency within their reference group. This research is aligned within the area of learning analytics, promoting the use of data as input to support decision-making in educational environments. Universities are traditionally characterized as generating large volumes of data. However, Long and Siemens (2014) show that strategic and operational decision processes are developed Full research paper under empirical and subjective schemes. In the educational field, data are mainly used to generate descriptive schemes such as the generation of reports and external and internal communication processes and transactions as accreditation and government oversight requirements. Thus, specific areas are identified where educational institutions have begun to implement data-based models to represent complex situations. First, student dropout has been studied from a predictive viewpoint. For example, Berens et al. (2019) and Suresh, Asokan and Vinodh (2016) developed models to predict student dropout using socioeconomic and academic variables. Heidrich et al. (2018) modeled student dropout using contextual variables obtained from the interaction of students in the educational process and monitoring the frequency of students' use of resources to support education, information such as the library, and complementary activities, among others (González et al., 2018). From the efficiency analysis approach, several studies use machine learning techniques and Data Envelope Analysis to generate estimates of productivity and competitiveness. Most of these studies have been developed in the commercial and industrial field (Aldamak and Zolfaghari, 2017). Among these studies, the contributions of Granadillo, Gómez and Herrera (2019) stand out; the authors integrate financial items and levels of operational performance to estimate productivity indicators in the chemical sector in Colombia. Other studies develop multistage models in a similar approach, analyzing variables' performance and implementing supervised and unsupervised data learning models with efficiency analysis models (Fuentes, Fuster, and Lillo-Bañuls, 2016;Mikušová, 2017;Visbal-Cadavid, Mendoza and Hoyos, 2019). This study analyzes how the differences between the study units can affect the results of the efficiency metrics in the Data Envelopment Analysis (DEA) models. These are applied in an educational context, considering the results of the state tests of quality of education in Colombia SABER PRO and SABER 11 as a study base. Cluster analysis, DEA, and a predictive model based on Random Forest (RF) are employed, respectively, allowing identification of the relationships between variables, creation of homogeneous groups, measurement of efficiency, and forecasting future efficiency categories. To the best of our knowledge, this efficiency analysis approach has not been previously developed in educational contexts. Therefore, one of this study's contributions is to propose an alternative approach for estimating educational efficiency, incorporating the creation of homogeneous groups to make a comparison between equals of efficiency levels. Simultaneously, the estimation of the efficiency levels is established using a direct estimation method as a reference base, considering all the observations in the database. Consequently, it is necessary to organize a method in three phases that allow the following questions to be answered. How to define university profiles of engineering education considering state exams at the secondary and university level? What is the academic efficiency of the identified engineering profiles? How to predict through machine learning the efficiency category of a university program belonging to the engineering training profiles created? Therefore, the main objective of this study is to evaluate and forecast the academic performance of Colombian engineering programs, creating a replicable and reproducible method, offering objective guidelines for decision-making in a higher education environment. The analysis of the efficiency of educational services is a field of great dynamism in the scientific community. From the first approach to the concept of efficiency applying linear programming techniques developed by Charnes, Cooper and Rhodes (1978), in recent decades there has been a greater dynamism in the measurement of efficiency in educational environments, schools, universities, and even systems countries education. DEA allows the incorporation of different combinations of input and output variables. Through a bibliographic review, Ferro and D'Elia (2020) classified the input variables in models of educational efficiency in teaching (dissemination of knowledge), research (production of basic or applied knowledge), and extension activities (also known as such as transfers, public, community or "third mission"), and input variables classified as human (teaching and research) and non-human (physical and financial resources).

Efficiency and Education
The type of educational data is a vital aspect in determining efficiency. Thus, there are different reports and databases where the results of large-scale tests are presented (e.g., PISA, SABER PRO, GMAT or TIMSS). These data can be the result of micro aggregations represented by average values of each institution or country. On the other hand, there are data at the individual level, which represent the performance of students in their interaction with a standardized test, the grades obtained in a study period, or external variables related to social, economic, and geographical aspects (Aparicio, Perelman and Santín, 2020;Thanassoulis et al., 2017;Visbal-Cadavid, Martínez-Gómez and Escorcia-Caballero, 2020). The primary consideration of these approaches is to assume that all study units have the same conditions, resources, and infrastructure, which can have fundamental implications for determining efficiency levels. Furthermore, standardized tests have limitations, such as the range of possible student responses, the context of each student to associate their reality with the questions and answers in predetermined categories, in addition to the difficulty of the test associated with the existence or lack of specific training on exam topics. The literature related to the measurement of efficiency in educational processes has shown increasing dynamics in recent years (Witte and López-Torres, 2017). Therefore, it is possible to find different approaches to evaluate efficiency in this sector (Agasisti, Munda and Hippe, 2019;Gralka, Wohlrabe and Bornmann, 2019;Khan, Khan and Hameed, 2019;Tran and Villano, 2019), in addition to studies applied to the Colombian context (Visbal-Cadavid, Mendoza and Hoyos, 2019). Using global management variables, articulating neural networks, and DEA models, Visbal-Cadavid, Martínez-Gómez and Guijarro (2017) developed a model to predict the efficiency of public universities. In a similar approach, Aparicio, Cordero, and Ortiz (2019) make comparisons between efficiency analysis models using PISA tests as input data. Other authors have highlighted the relevance of grouping processes through cluster analysis to define complex association patterns related to the specific performance of variables generated in public reports, such as institutional budget, teacher salaries, campus area, and number of students, among others (Wolszczak-Derlacz, 2017). In addition, Nazir (2019) proposed limitations to carrying out the forecasting process based on the comparability and homogeneity of observations. To guarantee the efficiency of the prediction processes based on machine learning, it is essential to determine and characterize similarities between the study objects. It is also essential to highlight the investigations in which university institutions are defined and described in homogeneous groups. For example, Najadat, Althebyan and Al-Omary (2019)

Data Collection
Through a rational analysis, the generic components of the state test for secondary education, known as SABER 11, were identified as input variables and the components of the state test for higher education, SABER PRO, as output variables of an efficiency model. With the previously refined and selected information, the following phases were carried out: i) A cluster analysis using the unsupervised learning algorithm k-means to identify the formation of homogeneous groups in the data, associated with the results of the SABER tests; ii) An academic efficiency analysis was developed under an exit optimization approach to determine academic efficiency profiles (AEP); iii) A predictive model was defined to classify and predict belonging to an academic efficiency profile of an engineering program through Random Forest (RF) and Decision Tree (DT). The process flow and the articulation of the techniques is shown in Figure 1. The components were identified through a rational analysis. The database used contains 12,411 observations, each of which represents a student. These observations come from 135 universities from Colombia (public: 30.37%, private: 69.63%) and eight academic degrees (civil, electromechanical, electrical, electronic, industrial, industrial automation, mechatronic, and chemical engineering). The base data were summarized by combining universities and undergraduate programs, leaving a total of 265 observations for analysis. It should be noted that universities do not have the same number of academic programs. In addition, the data come from the databases of the Colombian Institute for the Evaluation of Education (ICFES). The names, mean, and standard deviation of the study variables are reported in Table 1. The suffix for variables labeled S11 corresponds to the high school level test and SPRO corresponds to the college level assessment. It should be noted that the scale of academic competencies is 0-100, but the scale of the variable Formulation of engineering projects (FEP_SPRO) is 0-200. In addition, the mean and standard deviation belong to the results of the academic competencies evaluations of the school (S11) and the university (SPRO).

Academic Competences
SABER 11 is an evaluation of the level of secondary education in Colombia to provide educational institutions information on the development of basic skills that a student must develop during their time in school (ICFES, 2020). On the other hand, SABER PRO is an assessment aimed at higher education students close to graduation. Both evaluations are carried out by the Colombian Institute for the Evaluation of Education (ICFES) to measure the quality of all public or private educational institutions. The SABER PRO assessment is a mandatory requirement for all students who wish to acquire a professional degree in Colombia (ICFES, 2020). A student can also take the assessment only if they have passed 75% of the academic credits.

Cluster Analysis
For the development of the first phase of the proposed method, a non-hierarchical cluster analysis was carried out through the k-means algorithm (Clayman, Srinivasan and Sangwan, 2020;Oyelade et al., 2019). This algorithm randomly selects k points from the original data set to add as the initial clustering center. First, each unit in the data set is considered a point. Then, the distance between the data points and the core of the group is determined using the Euclidean equation, and the data set is preliminarily grouped by distance. Finally, the average distance of the observations in each group is calculated, the center of mass of the group is adjusted, and the final result of the grouping is obtained through multiple iterations. The Silhouette test (Menardi, 2011) assesses the quality of the membership of the observations, providing weights that oscillate between values of -1 and 1, where -1 is the evaluation of the observations better represented in another group; observations that are in the boundary between two clusters take the value of 0 and those that are well matched to the current group take the value of 1.

Data Envelopment Analysis
The key concept of DEA is the evaluation of the efficiency of the decision-making units that interact within a competition and development sector. Also known as border analysis, DEA has become the standard for the development of processes for comparing, measuring, and evaluating efficiency in productive organizations (Pawsey, Ananda and Hoque, 2018). Different approaches can be taken from the viewpoint of DEA analysis for educational purposes, for example, Amara, Rhaiem and Halilem (2020) evaluated research efficiency of Canadian scholars, considering aspects such as public funding seniority and university reputation.
The DEA-CCR model, known in the literature as technical efficiency, is the relationship between the weighted sum of the outputs and that of the inputs. The CCR model seeks to maximize the efficiency of a decision-making unit, within a group of reference organizations, through the optimal weights related to the input and output variables (Benicio and Mello, 2015). The optimization model associated with the DEA-CCR model endogenously calculates the weighting of the performance criteria and the result of the variables to achieve the maximum or minimum value of the objective function (Sinuany-Stern, Mehrez and Hadad, 2000). In addition, the result of the academic program results in the sum of the scores of the students who take the test for each institution adjusted by the arithmetic mean. However, the arithmetic mean is affected by outliers and, in particular, in the case of standardized tests, by the number of students taking the test. Therefore, when hypothetically considering two university programs A and B, with the same average results in the exams, different behaviors of the variability can be classified in the same technical efficiency category.
It should be noted that the DEA models assume that the input information of the models is accurate. However, in most cases, the input and output variables are imprecise, wrong, and biased. For example, when evaluating an individual student's performance through a standardized test, which was designed by experts who select the questions for each topic and dimension, adjusting the order and quantity of the questions according to standardized criteria of reading speed, comprehension, and analysis (Wolszczak-Derlacz, 2017). In summary, the process of creating the standardized test is a summation of subjective decision processes, which together generate noise in the final result. This is aligned with the main objective of this research to compare levels of efficiency among equals. Thus, although the exams are the same for all, the students do not come from the same context and, at a certain point, a university can be efficient in generating knowledge for its students, considering the student's initial learning inputs in relation to other universities (Duan, 2019;Ghasemi et al., 2020). Finally, the research model is presented below: where n : Each of which uses the same inputs (in different quantities) to obtain the same outputs (in different quantities). : Weighting of the virtual output.
i v : Weighting of the virtual input.
On the other hand, for the application of the DEA model, we start from the construction of the conceptual scheme that relates the input and output variables of the model.

Random Forest
The RF model is an assembly-type method based on the recurring and growing construction of multiple DTs through a bootstrapping aggregation process (Breiman, 2001). That is, multiple DTs are created with different composition of variables in such a way that each tree produces an independent result. A democratic process is then carried out where a category is assigned according to the resulting class with the most votes in general. This characteristic of generating separate responses for each Decision Tree and then joining them in a general prediction produces robust models that are less susceptible to extreme values and overfitting than a simple Decision Tree, thus improving the model's predictability and classification. The RF model presents a variable selection technique; in this way, it is possible to handle data sets with a large number of variables without using previous processes to reduce dimensions. The model also identifies the importance of the variables for the correct classification of the observations through a permutations test.
The success of the classification process occurs by minimizing the difference between the predicted value and the actual value. This relationship is described by the metrics True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN), and F1 Score. The metrics used to assess performance will be correct classification rate (C), positive predictive value (PPV), negative predictive value (NPV), sensitivity (S) and specificity (E), and Area under ROC. It should be noted that: specificity indicates the ability of the estimator to detect negative cases; sensitivity indicates the ability of the estimator to detect positive cases; F1 score is the harmonic mean between precision and recall (Sun, Liu and Wang, 2020), the optimal value is 1, this indicates that there is a perfect precision and recall; and the Area under ROC represents the rate of PT and FP at various discrimination thresholds. A model with a perfect classification will have Area under ROC = 1. On the other hand, a totally random model will return a value of Area under ROC = 0.5. Finally, for the training of the model, the cross-validation (10-folds) method was used and 70% of the data set, on the other hand, 30% of the data set is used for the evaluation of the model. The inputs of this model are the academic competencies of SABER 11 and SABER PRO, the output is the category of the efficiency group (Group 1 efficient, Group 1 Non-efficient, Group 2 efficient, Group 2 Non-efficient).

RESULTS
The proposed methodology consists of three phases. Detailed results for each phase are presented below.

First phase: Cluster Analysis
For the first phase of academic profiling, the k-means algorithm was used, varying the number of centroids from two to ten, to identify the optimal number of groups to which the university programs belonged. The test's highest value corresponds to the formation of two groups with a Silhouette test value of 0.49 (see Figure 2).

Figure 2: Analysis of the Silhouette test for optimal group conformation
Consequently, the representative elements of each profile are analyzed as shown in Figure 3. The university programs of cluster 1 are observed to have on average a higher score in all academic competences than the programs of Group 2. Thus, the competencies that will characterize Group 1 are those with an average higher than 70 points: ENG_SPRO, CS_SPRO, CR_SPRO, QR_SPRO, and MATH_S11. On the other hand, the characterizing competencies of Group 2 are those with an average higher than 55 points: QR_SPRO, MATH_S11, CR_S11, CS_S11, and NS_S11. Therefore, profile one can be contextualized as university programs with a high level of university entrance competencies and profile two as those with a medium-low level of university entrance competencies.

Number of clusters k Average silhouette width
Electronic ISSN 1803-1617

Results by Cluster
In this phase, to improve the interpretation of academic profiles, the two-dimensional representation of the groups generated by the k-means algorithm was developed (see Figure 4). There is a wide separation of both groups, cluster 1 located on the left side is represented as a homogeneous group, showing low variability in the results of the SABER tests among its members. Group 2 is located on the right side of the map and presents greater variability among its members, represented in the area they occupy.

Second phase: Efficiency Analysis
For the second phase of the method, the efficiency of the academic programs was calculated, adjusted by the previously determined efficiency profiles. The efficiency results by groups are presented in Table 2. The average efficiency of Profile 1 is 0.973 with a standard deviation of 0.029, 28 programs are determined as efficient out of a total of 111 members of this profile. In contrast, the average efficiency of Profile 1 is 0.941, with a deviation of 0.052, considering 34 university programs out of 154 members of this profile as efficient. The average efficiency for institutions without clustering by the cluster is 0.832, and the number of inefficient academic programs ranges from 223 in the global analysis to 203 in the group-adjusted analysis.
The integration of the results of phase one and two allows the creation of four AEP: 1) Efficient institutions in group one, 2) Non-Efficient institutions in group one, 3) Efficient institutions in group two, 4) Non-Efficient institutions in group two. Table  3 presents the value of the efficiency of some units of the study concerning the group to which they belong. The institutions that present a value equal to one in their level of efficiency are be classified as efficient and the rest as "Non-efficient.".  To contextualize the results, the variable "High-quality accreditation" of the academic program is used as an adjustment factor to provide tools to justify the differences in the levels of efficiency between the profiles. In Table 4, when comparing the results of the global efficiency analysis with the program accreditation variable, there is a higher proportion of accredited universities in the efficient universities profile.     Table 7 reports the distribution considering the adjustment by academic group. There are differences in the distribution of efficiency categories concerning the results of Table 6, for example, the Industrial Engineering program went from having a total of 13.54% of efficient units under the total efficiency measurement scheme to 22.92% considering the sum of the categories of efficient units of Profiles 1 and 2.

No cluster
One of the greatest advantages of the DEA model is determining the number of resource units that a DMU will increase/decrease to reach the efficiency frontier. In Tables 8 and 9, these values are presented as objectives that an academic program should achieve to reach the efficiency frontier established by proposed model. For example, in Table 8, which reports the results for the global efficiency model, the most significant opportunity for improvement is associated with proficiency in the English language. On the other hand, when analyzing the results of the model adjusted by profiles (Table 9), the most significant opportunity for improvement is different for each of the groups.  Group 1 presents more significant opportunities for improvement in mathematical competence. In Group 2, the improvement is mainly due to mastery of English, with mathematics being only the fourth competence with the most significant room for improvement for this profile.
These results indicate that the comparison between equals allows a better conceptualization of the efficiency results and, therefore, objective, and specific tools for developing improvement strategies in the higher education sector.

Third phase: Machine Learning
In the third phase, the RF model predicts the academic efficiency profile to which an academic program belongs. The predictors of the model are the results of academic competencies and the result will correspond to one of the four AEPs. Thus, the model obtained a mean precision of (0.833) during the training phase using 10-folds in cross-validation. Consequently, Table 10 shows the performance metrics of the RF model training, the mean, lower limit of the confidence interval (LL.CI -5% significance) and upper limit of the confidence interval (UL.IC -5% significance) of Sensitivity, Specificity and F1 Score.      Then, the Area under ROC value was equal to 95.8% for the RF predictions (ROC curve). Consequently, cross-validation is performed to generate coherence on the model. In this case, Table 10 shows a reduction in the standard deviation measurement for the precision and Area under ROC metric results with values of 0.739 and 0.035, respectively (see Table 11). Based on the model results, the importance of the predictors can be determined (see Table 12). Thus, what role academic competencies play in efficiency can be observed for each group. For example, in cluster 1, to detect the efficient study units, the variables that have a positive relationship in the model correspond to QR_SPRO, CR_SPRO, CS_SPRO, WC_SPRO, and FEP_SPRO. On the other hand, the variables that have a negative relationship with the model correspond to MATH_S11, CR_S11, CS_S11, NS_S11, ENG_S11, and ENG_SPRO. On the other hand, Table 13 presents the Random Forest model's performance metrics, evaluating the model's ability to identify group membership and associated efficiency accordingly. The results show that the model can mainly identify the academic efficiency profile to which each program belongs.    Finally, this research consists of three phases: cluster analysis, efficiency analysis, and machine learning. Exploratory cluster analysis analyzes the data by identifying clustering patterns between programs and characterizing academic profiles between university degrees. The efficiency measurement phases are performed for the profiles resulting from the cluster analysis and also for the raw data to provide a basis for comparison and contrast. The findings in Table 2 reveal the highest efficiency for non-group analysis, but this restricts the scope and complexity of the analyses that can be performed. It is also challenging to compare a university with a high level of reputation, popularity, experience, and positioning with a university for which these characteristics are low.

Model
The efficiency analysis was carried out considering the academic program and its quality accreditations, allowing estimation of how the accreditations influence the level of efficiency in both study groups. The results highlighted how efficient Group 1 (G1_EFF) is made up of 92.86% of accredited universities, in contrast to efficient Group 2 (G2_EFF), with only 26.47% of accredited universities (see Table 5). Finally, in this phase, it is possible to determine the weak and strong competencies of each efficient group. In addition, the score that must be increased to reach the efficiency threshold for each study unit can be established (see Table 9). This makes it possible to identify the competencies that higher education institutions must strengthen within their teaching curriculum to improve academic performance and, consequently, the level of efficiency. In the third phase, an RF model predicts the membership of an efficiency profile (G1_EFF, G1_Not_EFF, G2_EFF, G2_Not_ EFF) of the academic programs studied. This is very useful because if the university predicts a student's performance in advance, it could take steps to improve or maintain their efficiency.

DISCUSSION
It is essential to compare the results with other studies that use machine learning and artificial intelligence techniques to predict the efficient group (Group 1, Group 2, etc.) and/or the type of efficiency (efficient or not efficient). The Random  (2016), points out that the improvement of efficiency levels is not an easy task, since there are no "automatisms" for efficiency, identifying that it is made to believe that the improvement in the educational contexts is associated with technological change. Considering the previous approach, our research uses the outputs of the DEA linear programming model, specifically the slack variables as an objective element to objectively identify potential areas for improvement in the institutions. The comparison of institutions with similar characteristics was one of the objectives of our study, this aspect can be understood by the total range of the efficiency scores, as presented in Table 4, the minimum value of the efficiency score for Cluster 1 and 2 is 85.3% and 73.1% respectively, these values are higher than the minimum score in the global scenario without grouping, which was 68.1%. When comparing with the research (Johnes, 2006), where they analyze 130 universities in the UK using six inputs and three outputs, the minimum efficiency score was estimated at around 60%. Similarly, Klumpp (2018) in the research of 17 European universities identified a minimum efficiency score of 61.60%; The minimum threshold of 60% for the efficiency score increases in our research when institutions are grouped by similarity factors.
Kuah and Wong (2011) evaluated universities' efficiency through a DEA model. They affirm that the efficiency of a university is made up of two dimensions: teaching efficiency and research efficiency. Their research indicates an alternative to measure efficiency. However, our research's advantage is that it uses standardized tests as inputs, which are objective measures. However, one limitation of our study is that only one aspect of a university's efficiency is measured. Ramzi, Afonso and Ayadi (2016) developed an efficiency analysis of primary and secondary education in Tunisia using a DEA model, highlighting the need for clustering (cluster) and the importance of calculating the educational efficiency. By comparison, our research measures the relative impact colleges have on students when evaluating high school and college exam results. Like Agasisti, Munda and Hippe (2019), our study evaluates the university's contribution to students' professional achievement. Therefore, the proposed methodology produces good results and is relevant for the educational context when comparing our research results with similar approaches. The results of our work become an objective tool to evaluate the academic performance of university institutions. In university management, it could be useful for independent regulatory entities as a mechanism to identify representative institutions and determine objective evaluation criteria. In the specific case of university decision-makers, the structure proposed in the research allows strategically mapping the position of a program or university in an academic context, thus supporting decision-making in investments, curricular designs, and new academic programs. Finally, students have a tool to support the career's decision to study, associating their interests with the efficiency results delivered by our efficiency analysis structure. The results indicate that Quality Accreditations support higher academic efficiency for engineering programs. However, the cluster analysis isolates the quality accreditation effect, evidencing that quality accreditations have a greater impact on universities that receive students with better abilities from high school.

CONCLUSIONS
This study comprehensively evaluated the educational efficiency of 265 academic engineering programs. A threephase method was proposed that assesses the effect that large universities have on the sector's overall efficiency performance. This research's key contribution is the specific description of a method to evaluate and forecast academic efficiency in university education. The first phase (cluster analysis) groups universities with similar academic characteristics in clearly defined profiles. Consequently, the efficiency analysis is carried out through DEA (second phase), first without considering cluster analysis and then calculating each profile's efficiency. The evaluation of homogeneous universities makes it possible to correctly determine academic performance. Finally, the third phase corresponds to the machine learning model's application to predict an academic efficiency profile. From the empirical evidence, the following criteria are the research findings. The first phase results show the formation of two groups: the first with high results in basic professional skills and the second group with high results in secondary basic skills. The second phase reveals that the average efficiency value for Groups 1 and 2 is 0.973 and 0.941, respectively. Finally, in the third phase, the RF model was trained and validated, which obtained a high percentage of success for predicting the academic efficiency category. A structured method for analyzing, measuring, and forecasting efficiency in engineering education is presented to the scientific community and the education sector internationally. The proposed structure enables a decisionmaking process for continuous improvement in educational contexts.