At a glance.

 

  • Ability to highlight students with a high probability of withdrawing to allow targeted support
  • Proof of concept built by Telefónica Tech with close collaboration from King’s College London Analytics Team
  • Model trained with data feeds from two source systems, VLE and Student Records
  • Supports predictive analytics with machine learning
  • Utilising pan-university data lake, data warehouse and analytics platform
  • Models trained and evaluated using Azure Databricks, scikit-learn library and Imblearn library
  • Model baseline created with Auto-ML from Azure Machine Learning Services
  • Full experiment tracking implemented with MLFLow
  • Databricks Model registry used to version each trained model
  • Microsoft Azure, Databricks and Power BI
  • The King’s / Telefónica Tech project team now exploring how to implement and manage production solution

The Challenge

Student withdrawal is a big challenge for the higher education sector.

 

Today, on average 7 out of every 100 undergraduate students in the UK drop out of their higher education course before their second year. As more students embrace remote learning, it is becoming increasingly difficult for higher education institutions to identify students who are struggling. It is crucial to be able to identify these students and to make timely interventions to improve student retention, performance, and wellbeing.

 

Establishing a reliable solution to this problem that doesn’t infringe upon the privacy of individuals is challenging, as is predicting the complexities of when and why students will decide to leave their courses. The decision is often influenced by several contributing factors, including academic (progress, feedback), social (peer group, engagement) and external (family, health, f inancial).

 

For King’s College London there are significant student, reputational and commercial benefits in being able to establish an early warning system that would enable staff to identify students who require additional support. King’s and Telefónica Tech have a successful and long-standing relationship developing Data Platforms. Based on their partnership, they agreed to work together to see if training a machine learning model using data within the platform could provide an accurate and effective predictive solution.

 

King’s Director of Analytics, Richard Salter, understands that early identification is the key to success.

 

‘King’s wants to support every student to reach their potential and achieve their ambitions. Where students are disengaging or experiencing difficulties, speed is of the essence. If we can identify the issue early and ideally predict it before it happens, then we are much better able to support the student. If we are slow to identify and respond, then the chances of redressing the situation are drastically reduced.’

 

An effective proof of concept (POC) exercise is a necessary first step to prove or disprove whether a machine learning solution is viable.

 

It is advantageous to prove (or disprove) whether required predictions can be accurately generated from the available data quickly, and with a minimum of cost, before committing to a full investment in a machine learning solution. It can often turn out that data isn’t accessible, there isn’t enough of it or that even if the data does support the required predictions, that this can’t be generated in actionable timescales.

 

The project team had to be focussed, efficient in their use of time and effective in delivery of reliable results. Telefónica Tech experience and proven machine learning development approach offered the highest likelihood of success. What ethical considerations need to be made before embarking on such a project? Often where a model’s focus relates to individuals, questions of privacy and consent must be considered. Do subjects consent to the use of their data for the intended purpose?

 

Also, do any of the key features of the model involve sensitive data about individuals? Student activity and demographics provide key data points to a predictive model of this type. Achieving the fine balance to utilise these in an ethical way that complies with privacy guidelines like GDPR, whilst also meeting with the consent of subjects, would be a critical success factor.

The Solution

Project Approach

 

Telefónica Tech applied their proven AI POC approach to collect, analyse, and prepare data and then to train and evaluate candidate models quickly and rigorously. The required data was identified, sampled, cleaned, and analysed to identify any anomalies or gaps. Once the data was understood, work began to identify and trial candidate features which might help to accurately predict a withdrawal outcome. Different combinations of feature sets and algorithms were explored, analysed, and evaluated.

 

The Telefónica Tech team worked very closely with their King’s counterparts to ensure that ethical considerations were sufficiently considered during the scoping and planning of the work, and measures were agreed and implemented to ensure the privacy and security of data processing throughout.

 

Student Withdrawal Project Approach

 

This is our standard approach for a data science project that covers all they key components to ensure a successful implementation.

 

The Telefónica Tech AI team conducted two, iterative model build phases. The first proved that machine learning could accurately predict student withdrawal, however the predictions it delivered weren’t timely. Meaning they couldn’t highlight the risk of a student withdrawing with sufficient lead time to allow King’s to make a meaningful intervention.

 

Telefónica Tech and King’s reviewed the situation and quickly agreed the scope of a second, short POC phase. For the second iteration, an additional data source was made available that offered insight into a student’s engagement with King’s online learning system. Data points from this system included number of logins, interactions with forums and groups and assessment submissions. These additional activity-related data points provided far greater insight into student engagement throughout the academic year. The model that was trained subsequently delivered both accurate and more timely withdrawal indicators.

The Key Results

Telefónica Tech proved that probability indicators of student withdrawal could be delivered in a timely manner to key King’s staff members with a degree of accuracy that provides the confidence for them to act upon the insights. The most recently trained classification model reached an overall accuracy of 92 percent. Rarely though is Accuracy alone an indicator of success and so Telefónica Tech optimised the models for Precision, which aims to reduce the number of false positives. This is so that the King’s team could begin to focus on a condensed cohort of students with the final model producing a Precision score of 98 percent.

 

The solution incorporated a trained model as well as the data pipelines to enable predictions to be inferred on an ongoing basis. Furthermore, a way of working on future machine learning problems was proven. This was all achieved in a few weeks for minimal investment.

 

‘The results of the different models from the proof of concept were well beyond our expectations. They unambiguously affirmed the potential of this solution and very quickly we moved into thinking about how we could bring it into full production and leverage the value from the insights the model provided.’ Richard Salter, Director of Analytics – King’s College London

Explore our Case Studies

Stay informed

Stay updated and subscribe to our regular communications