Computational approaches may be employed to explore the educational data and study the educational queries. This paper surveys the most relevant studies carried out in this field to date. Firstly, it identifies the different tasks that can be applied in an educational environment. It then lists the most typical/common tasks in the educational environment that have been resolved through data mining techniques.


According to Wikipedia “Educational Data Mining (EDM) refers to techniques, tools, and research designed for automatically extracting meaning from large repositories of data generated by or related to people’s learning activities in educational settings”.

EDM deals with developing techniques to analyze the types of data in an educational environment and, to better understand the learners and the settings in which they learn by the use of these techniques [1]. Conversion of raw data from educational systems into instrumental information can be used by educational software developers, teachers, educational researchers, instructors, etc. The EDM process pictorially can be depicted as-

Pre-processing: First step in the process in which the data from the educational system is pre-processed to convert it into the required format such that mining techniques can be applied.

Get quality help now
Bella Hamilton
Verified writer

Proficient in: Academic And Career Goals

5 (234)

“ Very organized ,I enjoyed and Loved every bit of our professional interaction ”

+84 relevant experts are online
Hire writer

The different pre-processing techniques are noise reduction, data cleaning, attribute selection, etc.

Data mining: The second step in the EDM process, it is an intermediary step in which the data mining techniques are applied to the pre-processed data. The different DM techniques used are: Clustering, Analysis, Visualization Regression, Classification, etc.

Post-processing: It is the final step in which the results or model obtained are interpreted and used to make decision about the educational environment.

Get to Know The Price Estimate For Your Paper
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

The analysis of educational data is an old practice but the contemporary advances in educational technology have led to concern towards developing techniques for analyzing enormous amounts of data generated in an educational environment. This interest translated into a series of EDM workshops held from 2000-2007 as part of several international research conferences.[2]In 2008, a group of researchers established what has become an annual international research conference on EDM, the first of which took place in Montreal, Canada.[3]

The interest in EDM remained increasing, as a result, the EDM researchers established an academic journal in 2009, the Journal of Educational Data Mining, for sharing and disseminating research results. In 2011, EDM researchers established the International Educational Data Mining Society to connect EDM researchers and continue to grow the field.

As a result of the awakening of public educational data repositories in 2008, such as the Pittsburgh Science of Learning Centre’s (PSLC) Data Shop and the National Center for Education Statistics (NCES), the educational data mining has become more feasible and easily reachable, the public data sets have contributed to EDM’s growth[1].

This survey is organized as follows: Section 2, lists the goals of educational data mining. Section 3, provides a brief about the applications of EDM. Section 4, lists the common tasks in educational data mining and the different data mining techniques that can be employed. Section 5, describes some of the most prominent future research lines. Lastly, conclusions are laid out in the Section 6.

Goals of Educational Data Mining

Baker and Yacef[4] describes the following four goals of EDM:

  1.  Predicting student’s future learning behaviour
  2.  Discovering or improving domain models
  3.  Studying the effects of educational support
  4. Advancing scientific knowledge about learning and learners.

Predicting student’s future learning behavior – With the use of student behaviour, this goal can be achieved by creating student models that incorporate the learner’s characteristics, including detailed information such as their knowledge, behaviours and motivation to learn.

Discovering or improving domain models – Through the various methods and applications of EDM, discovery of new and improvements to existing models is possible.

Studying the effects of educational support – It can be achieved through learning systems.

Advancing scientific knowledge about learning and learners – By building and incorporating student models, the field of EDM research and the technology and software used.

Applications Of EDM

Authors in [1] suggest four key areas of application for EDM: improving student models, improving domain models, studying the pedagogical support provided by learning software, and scientific research into learning and learners.

 Improving Student Models:

Students models are those models that provide detailed information about a student’s characteristics or states, such as knowledge, motivation, meta cognition, and attitudes. Modelling the individual differences between students, in order to enable software to respond to those individual differences, is an important theme in educational software research. These models are important for finding new patterns.

First, these models have increased ability to predict student knowledge and future performance – including models of guessing and slipping into predictions of student future performance has increased the accuracy of these predictions by up to 48%. Second, these models have enabled researchers to study what factors lead students to make specific choices in a learning setting, a type of scientific discovery with models discussed below.

Improving Domain Models:

Improving models of the knowledge structure of the domain can be achieved in educational data mining through methods that have been created for rapidly discovering accurate domain models directly from data. These methods have generally combined psychometric modelling frameworks with advanced space-searching algorithms, and are generally posed as prediction problems for the purpose of model discovery (for example, attempting to predict whether individual actions will be correct or incorrect, using different domain models).

Studying the pedagogical support:

Modern educational software gives a variety of types of pedagogical support to students. Discovering which pedagogical support is most effective has been a key area of interest for educational data miners. Learning decomposition, a type of relationship mining, fits exponential learning curves to performance data, relating student success to the amount of each type of pedagogical support a student has received.

Scientific Research into learning and learners:

A fourth important area of application of educational data mining is for scientific discovery about learning and learners. This takes different forms. By the application of EDM to answer questions in any of the three areas previously discussed (e.g. student models, domain models, and pedagogical support) can have broader scientific benefits.

Tasks And Techniques In EDM

The tasks in EDM suggested by some authors in [5] are:

  1. Assessment of the student’s learning performance.
  2. Applications that provide course adaptation and learning recommendations based on the student’s learning behaviour.
  3. Approaches dealing with the evaluation of learning material and educational web-based courses.
  4. Applications that involve feedback to both teacher and students in e-learning courses, and
  5. Developments for detection of atypical students’ learning behaviours.

The following paragraphs deal with the different tasks and their objectives and the data mining mehtods. The different tasks are:

Analysis And Visualization Of Data:

The objective of the analysis and visualization of data is to highlight useful information and support decision making. In the educational environment, for example, it can help educators and course administrators to analyze the students’ course activities and usage information to get a general view of a student’s learning. Statistics and visualization information are the two main techniques that have been most widely used for this task.

Statistics is a mathematical science concerning the collection, analysis, interpretation or explanation, and presentation of data. Statistical analysis of educational data (logs files/databases) can tell us such things as: where students enter and exit, the most popular pages, the browsers students tend to use, patterns of use over time, the number of visits, origin of visitors, number of hits, patterns of use throughout various time periods, number of visits and duration per quarter, top search terms, number of downloads of e-learning resources.

Information visualization uses graphic techniques to help people understand and analyze data. Visual representations and interaction techniques take advantage of the human eye’s broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once. There are a lot of studies oriented toward visualizing different educational data such as: patterns of annual, seasonal, daily and hourly user behaviour on online forums.

 Predicting a student’s performance:

The objective is to predict the unknown value of a variable/characteristic describing the student. In education the values normally predicted are performance, knowledge, score or mark. This value can be numerical/continuous value (regression task) or categorical/discrete value (classification task). Prediction of a student’s performance is one of the oldest and most popular applications of DM in education, and different techniques and models have been applied (neural networks, Bayesian networks, rule-based systems, regression and correlation analysis).

Regression analysis finds the relationship between a dependent variable and one or more independent variables.

Several regression techniques have been used to predict students’ marks in an open university (using model trees, neural networks, linear regression, locally weighed linear regression and support vector machines) , for predicting end-of-year accountability assessment scores (using linear regression prediction models), to predict student performance from log and test scores in web-based instruction (using a multivariable regression model).

Classification is a procedure in which individual items are placed into groups based on quantitative information regarding one or more characteristics inherent in the items and based on a training set of previously labelled items.

Providing feedback for supporting instructors:

The objective is to provide feedback to support course authors/ teachers / administrators in decision making (about how to improve students’ learning, organize instructional resources more efficiently, etc). To enable them to take appropriate proactive and/or remedial action.Several DM techniques have been used in this task, although association rule mining has been the most common. Association rule mining reveals interesting relationships among variables in large databases and presents them in the form of strong rules according to the different degrees of interest they might present.

Grouping students:

The objective is to create groups of students according to their customized features, personal characteristics, etc. Then, the clusters/groups of students obtained can be used by the instructor/developer to build a personalized learning system, to promote effective group learning, to provide adaptive contents, etc. The DM techniques used in this task are classification (supervised learning) and clustering (unsupervised learning).

Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster have some points in common. Different clustering algorithms have been used to group students, such as: hierarchical agglomerative clustering, K-means and model based clustering to identify groups of students with similar skill profiles, a clustering algorithm based on large generalized sequences to find groups of students with similar learning characteristics based on their traversal path patterns and the content of each page they have visited.

Several classification algorithms have been applied in order to group students, such as: discriminant analysis, neural networks, random forests and decision trees for classifying university students into three groups (low-risk, medium-risk and high-risk).

Detecting atypical student behaviours:

The objective of detecting atypical student behavior is to discover/detect those students who have some type of problem or unusual behavior such as: erroneous actions, low motivation, playing game, misuse, cheating, dropping out, academic failure, etc. Several DM techniques (mainly, classification and clustering) have been used to reveal these types of students in order to provide them with appropriate help in stipulated time.

Several of the classification algorithms that have been used to detect atypical student behavior are decision tree neural networks, naïve Bayes, instance-based learning, logistic regression and support vector machines for predicting/preventing student dropout.

Recommendations for students:

The objective is to be able to make recommendations directly to the students with respect to their personalized activities, the next task or problem to be done, etc. and also to be able to adapt learning contents, interfaces and sequences to each student. Several DM techniques have been used for this task but the most common are association rule mining, clustering and sequential pattern mining.

Sequence/Sequential pattern mining aims to discover the relationships between occurrences of sequential events, to find if there exists any specific order in the occurrences. Sequential pattern mining has been developed to personalize recommendations on learning content based on learning style and web usage habits.

Social network analysis Social Networks Analysis (SNA):

It aims at studying relationships between individuals, instead of individual attributes or properties. A social network is considered to be a group of people, an organization or social individuals who are connected by social relationships like friendship, cooperative relations, or informative exchange. Different DM techniques have been used to mine social networks in educational environments, but collaborative filtering is the most common.

Collaborative filtering or social filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting taste preferences from many users (collaborating). Collaborative filtering systems can produce personal recommendations by computing the similarity between students’ preferences, so this task is directly related to the previous task of recommendations for students. Collaborative filtering has been used for context-aware learning object recommendation lists, to make a recommendation for a learner about what he/she should learn before taking the next step, for developing a personal recommender system for learners in lifelong learning networks.

Cite this page

A Survey on Educational Data Mining. (2021, Dec 02). Retrieved from

A Survey on Educational Data Mining

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment