King’s College London explore how Data Science can help to predict Student Withdrawals

King’s College London explore how Data Science can help to predict Student Withdrawals

Case Studies | 29 July 2021

At a glance.

Ability to highlight students with a high probability of withdrawing to allow targeted support
Proof of concept built by Telefónica Tech with close collaboration from King’s College London Analytics Team
Model trained with data feeds from two source systems, VLE and Student Records
Supports predictive analytics with machine learning
Utilising pan-university data lake, data warehouse and analytics platform
Models trained and evaluated using Azure Databricks, scikit-learn library and Imblearn library
Model baseline created with Auto-ML from Azure Machine Learning Services
Full experiment tracking implemented with MLFLow
Databricks Model registry used to version each trained model
Microsoft Azure, Databricks and Power BI
The King’s / Telefónica Tech project team now exploring how to implement and manage production solution

The Challenge

Student withdrawal is a big challenge for the higher education sector.

Today, on average 7 out of every 100 undergraduate students in the UK drop out of their higher education course before their second year. As more students embrace remote learning, it is becoming increasingly difficult for higher education institutions to identify students who are struggling. It is crucial to be able to identify these students and to make timely interventions to improve student retention, performance, and wellbeing.

Establishing a reliable solution to this problem that doesn’t infringe upon the privacy of individuals is challenging, as is predicting the complexities of when and why students will decide to leave their courses. The decision is often influenced by several contributing factors, including academic (progress, feedback), social (peer group, engagement) and external (family, health, f inancial).

For King’s College London there are significant student, reputational and commercial benefits in being able to establish an early warning system that would enable staff to identify students who require additional support. King’s and Telefónica Tech have a successful and long-standing relationship developing Data Platforms. Based on their partnership, they agreed to work together to see if training a machine learning model using data within the platform could provide an accurate and effective predictive solution.

King’s Director of Analytics, Richard Salter, understands that early identification is the key to success.

‘King’s wants to support every student to reach their potential and achieve their ambitions. Where students are disengaging or experiencing difficulties, speed is of the essence. If we can identify the issue early and ideally predict it before it happens, then we are much better able to support the student. If we are slow to identify and respond, then the chances of redressing the situation are drastically reduced.’

An effective proof of concept (POC) exercise is a necessary first step to prove or disprove whether a machine learning solution is viable.

It is advantageous to prove (or disprove) whether required predictions can be accurately generated from the available data quickly, and with a minimum of cost, before committing to a full investment in a machine learning solution. It can often turn out that data isn’t accessible, there isn’t enough of it or that even if the data does support the required predictions, that this can’t be generated in actionable timescales.

The project team had to be focussed, efficient in their use of time and effective in delivery of reliable results. Telefónica Tech experience and proven machine learning development approach offered the highest likelihood of success. What ethical considerations need to be made before embarking on such a project? Often where a model’s focus relates to individuals, questions of privacy and consent must be considered. Do subjects consent to the use of their data for the intended purpose?

Also, do any of the key features of the model involve sensitive data about individuals? Student activity and demographics provide key data points to a predictive model of this type. Achieving the fine balance to utilise these in an ethical way that complies with privacy guidelines like GDPR, whilst also meeting with the consent of subjects, would be a critical success factor.

The Solution

Project Approach

Telefónica Tech applied their proven AI POC approach to collect, analyse, and prepare data and then to train and evaluate candidate models quickly and rigorously. The required data was identified, sampled, cleaned, and analysed to identify any anomalies or gaps. Once the data was understood, work began to identify and trial candidate features which might help to accurately predict a withdrawal outcome. Different combinations of feature sets and algorithms were explored, analysed, and evaluated.

The Telefónica Tech team worked very closely with their King’s counterparts to ensure that ethical considerations were sufficiently considered during the scoping and planning of the work, and measures were agreed and implemented to ensure the privacy and security of data processing throughout.

Student Withdrawal Project Approach

This is our standard approach for a data science project that covers all they key components to ensure a successful implementation.

The Telefónica Tech AI team conducted two, iterative model build phases. The first proved that machine learning could accurately predict student withdrawal, however the predictions it delivered weren’t timely. Meaning they couldn’t highlight the risk of a student withdrawing with sufficient lead time to allow King’s to make a meaningful intervention.

Telefónica Tech and King’s reviewed the situation and quickly agreed the scope of a second, short POC phase. For the second iteration, an additional data source was made available that offered insight into a student’s engagement with King’s online learning system. Data points from this system included number of logins, interactions with forums and groups and assessment submissions. These additional activity-related data points provided far greater insight into student engagement throughout the academic year. The model that was trained subsequently delivered both accurate and more timely withdrawal indicators.

The Key Results

Telefónica Tech proved that probability indicators of student withdrawal could be delivered in a timely manner to key King’s staff members with a degree of accuracy that provides the confidence for them to act upon the insights. The most recently trained classification model reached an overall accuracy of 92 percent. Rarely though is Accuracy alone an indicator of success and so Telefónica Tech optimised the models for Precision, which aims to reduce the number of false positives. This is so that the King’s team could begin to focus on a condensed cohort of students with the final model producing a Precision score of 98 percent.

The solution incorporated a trained model as well as the data pipelines to enable predictions to be inferred on an ongoing basis. Furthermore, a way of working on future machine learning problems was proven. This was all achieved in a few weeks for minimal investment.

‘The results of the different models from the proof of concept were well beyond our expectations. They unambiguously affirmed the potential of this solution and very quickly we moved into thinking about how we could bring it into full production and leverage the value from the insights the model provided.’ Richard Salter, Director of Analytics – King’s College London

Customer Quote

“

“Providing a solution that records the care and demonstrates the impact on the well-being of our vulnerable young people has been a powerful use of our data. We can track the journey of these people and put in measures to improve the well-being indicators and we can illustrate to our funders how their money is being spent and the impact their money is having on our services and service users lives.”

Ros Dowey

Director of IT and Digital, Aberlour

Explore our Case Studies

Case Study: Saving Lives with Secure Data – Lancashire and South Cumbria Secure Data Environment
Case Studies

Read more

Case Study: Saving Lives with Secure Data – Lancashire and South Cumbria Secure Data Environment

The Lancashire and South Cumbria Secure Data Environment (SDE) is a digital platform designed to securely store health and care data. This platform enables the analysis of this information for...

Read more
How a major NHS Trust transformed digital experience for staff
Case Studies

Read more

How a major NHS Trust transformed digital experience for staff

Telefónica Tech helped transform the digital employee experience (DEX) for hospital staff through smart, proactive device improvements that drove productivity and mitigated...

Read more
Case Study: Discover UK Sport’s Dynamics 365 journey
Case Studies

Read more

Case Study: Discover UK Sport’s Dynamics 365 journey

The implementation of Dynamics 365 CE brought significant improvements in efficiency and effectiveness for UK Sport. Key functionalities, such as the notes feature and sport trackers, played...

Read more
AtkinsRéalis unlocks AI’s revenue protection potential
Case Studies

Read more

AtkinsRéalis unlocks AI’s revenue protection potential

The rising public awareness of AI following the release of ChatGPT has increased interest in the potential of AI across AtkinsRéalis, from the CEO to project managers, Offer says. AtkinsRéalis...

Read more
Helping Mizaic Move to a Future-Fit Cloud in Azure
Case Studies

Read more

Helping Mizaic Move to a Future-Fit Cloud in Azure

Explore how Mizaic uses Telefónica Tech's Azure Landing Zone to enhance the reliability and scalability of their Electronic Document Management System, supporting the NHS's paperless initiative...

Read more

Stay informed

Stay updated and subscribe to our regular communications