📊 CSE-328 · Data Science · DIU · Summer 2026

What Really Affects
Academic Performance?

A data-driven investigation into how sleep, social media, study habits, and class preferences shape CGPA — using real survey data from 54 DIU students.

Daffodil International University · Section 65_B · April 2026

Live Predictor

Predict Your Expected CGPA

Enter your real habits below. Both regression models trained on actual DIU survey data run instantly in your browser — and you'll get a personalised suggestion based on your inputs.

Daily Study Hours

Nightly Sleep Hours

Daily Social Media

Concentration Affected?

Class Preference

🎓

—

📐 Linear Regression Score (1–4 scale)—

🔀 Logistic: P(High CGPA ≥ 3.5)—

💡 Personalised Suggestion

—

python · model_training.py
# ══════════════════════════════════════════════════════════════════════
# What Really Affects Academic Performance? — Full Model Training Code
# CSE-328 · Daffodil International University · Section 65_B · 2026
# n=54 cleaned survey responses | Python 3.12, Pandas, Scikit-learn
# ══════════════════════════════════════════════════════════════════════

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import (r2_score, mean_squared_error,
                             accuracy_score, classification_report)

# Load cleaned dataset
df = pd.read_csv('academic_survey_clean.csv')

# Ordinal encoding
study_map  = {'Less than 1hr':1, '1-2hrs':2, '3-4hrs':3, '5+hrs':4}
sleep_map  = {'Less than 5hrs':1, '5-6hrs':2, '7-8hrs':3, '8+hrs':4}
social_map = {'Less than 1hr':1, '1-2hrs':2, '3-4hrs':3, '5+hrs':4}
conc_map   = {'Never':1, 'Sometimes':2, 'Often':3, 'Always':4}
class_map  = {'Online':1, 'No Preference':2, 'Offline':3}
cgpa_map   = {'Below 2.5':1, '2.5-2.99':2, '3.00-3.49':3, '3.5-4.00':4}

df['Study_Hours']  = df['Study_Hours'].map(study_map)
df['Sleep_Hours']   = df['Sleep_Hours'].map(sleep_map)
df['Social_Media']  = df['Social_Media'].map(social_map)
df['Concentration'] = df['Concentration'].map(conc_map)
df['Class_Pref']    = df['Class_Pref'].map(class_map)
df['CGPA_encoded']  = df['CGPA'].map(cgpa_map)

features = ['Study_Hours', 'Sleep_Hours', 'Social_Media', 'Concentration', 'Class_Pref']
X = df[features]
y_linear = df['CGPA_encoded']
y_logistic = (df['CGPA_encoded'] >= 4).astype(int)

# Linear Regression
lr_model = LinearRegression()
lr_model.fit(X, y_linear)

y_pred_lr = lr_model.predict(X)
r2 = r2_score(y_linear, y_pred_lr)
rmse = np.sqrt(mean_squared_error(y_linear, y_pred_lr))

print("Linear Regression")
print(f"Intercept: {lr_model.intercept_:.3f}")
for feat, coef in zip(features, lr_model.coef_):
    print(f"{feat}: {coef:+.3f}")
print(f"R²: {r2:.3f}")
print(f"RMSE: {rmse:.3f}")

# Logistic Regression
log_model = LogisticRegression(max_iter=1000, random_state=42)
log_model.fit(X, y_logistic)

y_pred_log = log_model.predict(X)
acc = accuracy_score(y_logistic, y_pred_log)

print("
Logistic Regression")
print(f"Intercept: {log_model.intercept_[0]:.3f}")
for feat, coef in zip(features, log_model.coef_[0]):
    print(f"{feat}: {coef:+.3f}")
print(f"Accuracy: {acc:.1%}")
print(classification_report(y_logistic, y_pred_log, target_names=['Lower CGPA', 'High CGPA']))

# Prediction helper
def predict_student(study, sleep, social, conc, class_pref):
    x_new = np.array([[study, sleep, social, conc, class_pref]])

    lr_raw = lr_model.predict(x_new)[0]
    lr_score = np.clip(lr_raw, 1, 4)
    bands = {1:'Below 2.5', 2:'2.50–2.99', 3:'3.00–3.49', 4:'3.50–4.00'}
    lr_band = bands[int(np.clip(round(lr_score), 1, 4))]

    prob_high = log_model.predict_proba(x_new)[0][1]
    verdict = 'High CGPA (≥3.5)' if prob_high >= 0.5 else 'Lower CGPA'

    print(f"Linear score: {lr_score:.3f} -> Band: {lr_band}")
    print(f"P(High CGPA): {prob_high:.1%} -> {verdict}")

Raw Responses
Collected

Valid Responses
After Cleaning

Features
Analyzed

ML Models
Applied

43%

Students on
Social Media 5+ hrs

Research Question

What Are We Trying to Find Out?

Understanding the drivers of academic performance is critical — not just for grades, but for scholarship eligibility, career outcomes, and on-time graduation. We investigated four specific sub-questions:

Q1 · Study Hours

Does the number of daily study hours significantly relate to CGPA? We hypothesize a moderate positive correlation.

Q2 · Sleep Quality

Does sleeping 7–8 hours per night lead to measurably better outcomes compared to sleeping under 5 hours?

Q3 · Social Media

Does heavy daily social media usage (5+ hours) reduce concentration, and does this affect CGPA?

Q4 · Class Format

Do students who prefer offline classes perform better academically than those preferring online instruction?

Dataset

How We Built the Dataset

Data collected via Google Forms across Telegram (2,700+ subscribers), WhatsApp Community, and Facebook Messenger — targeting DIU students aged 18–27 between April 3–4, 2026.

Raw Rows

→

Typos Fixed

→

−6

Missing Dropped

→

−5

Dupes Removed

→

Clean Rows

Feature	Type	Values	Role
Age Group	Categorical	Under 18, 18–20, 21–23, 24–26, 27+	Demographic
Gender	Categorical	Male, Female, Other	Demographic
CGPA ★	Ordinal	Below 2.5 → 3.5–4.00 (encoded 1–4)	Target
Study Hours	Ordinal	<1 hr, 1–2 hrs, 3–4 hrs, 5+ hrs	Predictor
Sleep Hours	Ordinal	<5 hrs, 5–6 hrs, 7–8 hrs, 8+ hrs	Predictor
Class Preference	Categorical	Online, Offline, No Preference	Predictor
Social Media	Ordinal	<1 hr, 1–2 hrs, 3–4 hrs, 5+ hrs	Predictor
Concentration	Ordinal	Never, Sometimes, Often, Always	Predictor

Exploratory Data Analysis

What the Data Revealed

🌙

50% of students fall in the highest CGPA band (3.5–4.00)

Half of all respondents are high-performing, suggesting a positive self-selection bias through academic outreach channels.

📱

43% use social media 5+ hours daily — the biggest behavioural risk

78% of this group report concentration "often" or "always" affected — the highest of any usage category.

📚

35% of students study less than 1 hour per day

Yet students studying 3–4 hrs/day average a full CGPA band higher — consistency over raw hours matters most.

😴

Most students get 7–8 hours of sleep — and it shows in their CGPA

Students sleeping 7–8 hrs cluster predominantly in the 3.5–4.00 band. A meaningful portion sleeps under 6 hours nightly.

🏫

48% prefer offline classes — and they tend to perform better

Offline-preferring students trend higher, likely due to fewer distractions and stronger peer accountability.

Correlation with CGPA

Sleep Hours → CGPA

+0.27

Strongest positive predictor. More sleep clearly associates with higher CGPA.

Concentration → CGPA

−0.23

Strongest negative predictor. Disrupted concentration consistently lowers CGPA.

Study Hours → CGPA

+0.06

Surprisingly weak — focus quality matters more than raw study time.

Social Media → CGPA

+0.05

Weak direct link. Harm operates indirectly through broken concentration.

Model Results

What the Models Found

Linear Regression · R²

0.164

Explains 16.4% of CGPA variance — expected for a small categorical dataset with many unmeasured factors.

Logistic Regression · Accuracy

59.3%

Correctly classified 32/54 students — meaningful lift over the 50% majority-class baseline.

CGPA = 2.343 + 0.016 × Study_Hours + 0.283 × Sleep_Hours + 0.064 × Social_Media − 0.285 × Concentration + 0.198 × Class_Preference

+0.283

😴 Sleep is the #1 positive driver

Students sleeping 7–8 hours show consistently higher CGPA — the strongest single lifestyle predictor in the model.

−0.285

📵 Concentration loss is the #1 negative driver

Social media's damage runs via reduced focus: heavy usage → broken concentration → lower CGPA.

+0.198

🏫 Offline class preference helps

Offline-preferring students trend higher — better engagement, fewer distractions, stronger peer accountability.

+0.016

⏱️ Raw study hours barely matter

Counterintuitively, hours alone had almost no direct effect. Quality and consistency of focus matters far more.

Research Team

The People Behind This Study

Gulam Murshed

232-15-336

Tanzina Rahman

232-15-296

Zubaer Rahman Jisan

232-15-152

Tawfiqur Rahman

242310005101497

Tools & Methods

🐍 Python 3.12🐼 Pandas📊 Matplotlib🎨 Seaborn🤖 Scikit-learn📋 Google Forms📡 Telegram💬 WhatsApp📘 Facebook Messenger📈 Linear Regression🔀 Logistic Regression

Data Collection

How We Collected the Survey Responses

We utilized Telegram, WhatsApp, and Facebook Messenger direct communications for primary data collection. The survey link was distributed through group channels and one-to-one conversations to reach real university students and collect authentic responses. The slots below are arranged to showcase seven examples of our collection methods.

Slot 1 — Telegram channel post

Survey link shared through a Telegram channel to reach a larger student audience quickly.

Slot 2 — WhatsApp community outreach

Survey distributed in a WhatsApp community group using a Bangla invitation message and direct form link.

Slot 3 — Personal Communication

Individual outreach through Messenger helped collect direct responses from university students.

Slot 4 — Messenger direct request

Individual outreach through Messenger helped collect direct responses from university students.

Slot 5 — One-to-one personal request

Direct communication used to ask individual students to complete the survey personally.

Slot 6 — DIU student outreach

Reserved for an additional screenshot showing direct communication with a DIU student participant.

Slot 7 — BRAC / EWU outreach

Reserved for an additional screenshot showing direct communication with BRAC University or EWU students.

These screenshots document our real primary data collection process using multiple communication channels. Group sharing increased reach, while direct messaging improved response authenticity and participation quality.

What Really AffectsAcademic Performance?

What Really Affects
Academic Performance?