๐Ÿ“Š CSE-328 ยท Data Science ยท DIU ยท Summer 2026

What Really Affects
Academic Performance?

A data-driven investigation into how sleep, social media, study habits, and class preferences shape CGPA โ€” using real survey data from 54 DIU students.

Daffodil International University  ยท  Section 65_B  ยท  April 2026
Predict Your Expected CGPA

Enter your real habits below. Both regression models trained on actual DIU survey data run instantly in your browser โ€” and you'll get a personalised suggestion based on your inputs.

๐ŸŽ“
โ€”
โ€”
๐Ÿ“ Linear Regression Score (1โ€“4 scale)โ€”
๐Ÿ”€ Logistic: P(High CGPA โ‰ฅ 3.5)โ€”
๐Ÿ’ก Personalised Suggestion

โ€”

python ยท model_training.py
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# What Really Affects Academic Performance? โ€” Full Model Training Code
# CSE-328 ยท Daffodil International University ยท Section 65_B ยท 2026
# n=54 cleaned survey responses | Python 3.12, Pandas, Scikit-learn
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import (r2_score, mean_squared_error,
                             accuracy_score, classification_report)

# Load cleaned dataset
df = pd.read_csv('academic_survey_clean.csv')

# Ordinal encoding
study_map  = {'Less than 1hr':1, '1-2hrs':2, '3-4hrs':3, '5+hrs':4}
sleep_map  = {'Less than 5hrs':1, '5-6hrs':2, '7-8hrs':3, '8+hrs':4}
social_map = {'Less than 1hr':1, '1-2hrs':2, '3-4hrs':3, '5+hrs':4}
conc_map   = {'Never':1, 'Sometimes':2, 'Often':3, 'Always':4}
class_map  = {'Online':1, 'No Preference':2, 'Offline':3}
cgpa_map   = {'Below 2.5':1, '2.5-2.99':2, '3.00-3.49':3, '3.5-4.00':4}

df['Study_Hours']  = df['Study_Hours'].map(study_map)
df['Sleep_Hours']   = df['Sleep_Hours'].map(sleep_map)
df['Social_Media']  = df['Social_Media'].map(social_map)
df['Concentration'] = df['Concentration'].map(conc_map)
df['Class_Pref']    = df['Class_Pref'].map(class_map)
df['CGPA_encoded']  = df['CGPA'].map(cgpa_map)

features = ['Study_Hours', 'Sleep_Hours', 'Social_Media', 'Concentration', 'Class_Pref']
X = df[features]
y_linear = df['CGPA_encoded']
y_logistic = (df['CGPA_encoded'] >= 4).astype(int)

# Linear Regression
lr_model = LinearRegression()
lr_model.fit(X, y_linear)

y_pred_lr = lr_model.predict(X)
r2 = r2_score(y_linear, y_pred_lr)
rmse = np.sqrt(mean_squared_error(y_linear, y_pred_lr))

print("Linear Regression")
print(f"Intercept: {lr_model.intercept_:.3f}")
for feat, coef in zip(features, lr_model.coef_):
    print(f"{feat}: {coef:+.3f}")
print(f"Rยฒ: {r2:.3f}")
print(f"RMSE: {rmse:.3f}")

# Logistic Regression
log_model = LogisticRegression(max_iter=1000, random_state=42)
log_model.fit(X, y_logistic)

y_pred_log = log_model.predict(X)
acc = accuracy_score(y_logistic, y_pred_log)

print("
Logistic Regression")
print(f"Intercept: {log_model.intercept_[0]:.3f}")
for feat, coef in zip(features, log_model.coef_[0]):
    print(f"{feat}: {coef:+.3f}")
print(f"Accuracy: {acc:.1%}")
print(classification_report(y_logistic, y_pred_log, target_names=['Lower CGPA', 'High CGPA']))

# Prediction helper
def predict_student(study, sleep, social, conc, class_pref):
    x_new = np.array([[study, sleep, social, conc, class_pref]])

    lr_raw = lr_model.predict(x_new)[0]
    lr_score = np.clip(lr_raw, 1, 4)
    bands = {1:'Below 2.5', 2:'2.50โ€“2.99', 3:'3.00โ€“3.49', 4:'3.50โ€“4.00'}
    lr_band = bands[int(np.clip(round(lr_score), 1, 4))]

    prob_high = log_model.predict_proba(x_new)[0][1]
    verdict = 'High CGPA (โ‰ฅ3.5)' if prob_high >= 0.5 else 'Lower CGPA'

    print(f"Linear score: {lr_score:.3f} -> Band: {lr_band}")
    print(f"P(High CGPA): {prob_high:.1%} -> {verdict}")
0
Raw Responses
Collected
0
Valid Responses
After Cleaning
0
Features
Analyzed
0
ML Models
Applied
43%
Students on
Social Media 5+ hrs
What Are We Trying to Find Out?

Understanding the drivers of academic performance is critical โ€” not just for grades, but for scholarship eligibility, career outcomes, and on-time graduation. We investigated four specific sub-questions:

Q1 ยท Study Hours

Does the number of daily study hours significantly relate to CGPA? We hypothesize a moderate positive correlation.

Q2 ยท Sleep Quality

Does sleeping 7โ€“8 hours per night lead to measurably better outcomes compared to sleeping under 5 hours?

Q3 ยท Social Media

Does heavy daily social media usage (5+ hours) reduce concentration, and does this affect CGPA?

Q4 ยท Class Format

Do students who prefer offline classes perform better academically than those preferring online instruction?

How We Built the Dataset

Data collected via Google Forms across Telegram (2,700+ subscribers), WhatsApp Community, and Facebook Messenger โ€” targeting DIU students aged 18โ€“27 between April 3โ€“4, 2026.

65
Raw Rows
โ†’
5
Typos Fixed
โ†’
โˆ’6
Missing Dropped
โ†’
โˆ’5
Dupes Removed
โ†’
54
Clean Rows
FeatureTypeValuesRole
Age GroupCategoricalUnder 18, 18โ€“20, 21โ€“23, 24โ€“26, 27+Demographic
GenderCategoricalMale, Female, OtherDemographic
CGPA โ˜…OrdinalBelow 2.5 โ†’ 3.5โ€“4.00 (encoded 1โ€“4)Target
Study HoursOrdinal<1 hr, 1โ€“2 hrs, 3โ€“4 hrs, 5+ hrsPredictor
Sleep HoursOrdinal<5 hrs, 5โ€“6 hrs, 7โ€“8 hrs, 8+ hrsPredictor
Class PreferenceCategoricalOnline, Offline, No PreferencePredictor
Social MediaOrdinal<1 hr, 1โ€“2 hrs, 3โ€“4 hrs, 5+ hrsPredictor
ConcentrationOrdinalNever, Sometimes, Often, AlwaysPredictor
What the Data Revealed
๐ŸŒ™
50% of students fall in the highest CGPA band (3.5โ€“4.00)
Half of all respondents are high-performing, suggesting a positive self-selection bias through academic outreach channels.
๐Ÿ“ฑ
43% use social media 5+ hours daily โ€” the biggest behavioural risk
78% of this group report concentration "often" or "always" affected โ€” the highest of any usage category.
๐Ÿ“š
35% of students study less than 1 hour per day
Yet students studying 3โ€“4 hrs/day average a full CGPA band higher โ€” consistency over raw hours matters most.
๐Ÿ˜ด
Most students get 7โ€“8 hours of sleep โ€” and it shows in their CGPA
Students sleeping 7โ€“8 hrs cluster predominantly in the 3.5โ€“4.00 band. A meaningful portion sleeps under 6 hours nightly.
๐Ÿซ
48% prefer offline classes โ€” and they tend to perform better
Offline-preferring students trend higher, likely due to fewer distractions and stronger peer accountability.
Sleep Hours โ†’ CGPA
+0.27
Strongest positive predictor. More sleep clearly associates with higher CGPA.
Concentration โ†’ CGPA
โˆ’0.23
Strongest negative predictor. Disrupted concentration consistently lowers CGPA.
Study Hours โ†’ CGPA
+0.06
Surprisingly weak โ€” focus quality matters more than raw study time.
Social Media โ†’ CGPA
+0.05
Weak direct link. Harm operates indirectly through broken concentration.
What the Models Found
Linear Regression ยท Rยฒ
0.164
Explains 16.4% of CGPA variance โ€” expected for a small categorical dataset with many unmeasured factors.
Logistic Regression ยท Accuracy
59.3%
Correctly classified 32/54 students โ€” meaningful lift over the 50% majority-class baseline.
CGPA = 2.343 + 0.016 ร— Study_Hours + 0.283 ร— Sleep_Hours + 0.064 ร— Social_Media โˆ’ 0.285 ร— Concentration + 0.198 ร— Class_Preference
+0.283
๐Ÿ˜ด Sleep is the #1 positive driver
Students sleeping 7โ€“8 hours show consistently higher CGPA โ€” the strongest single lifestyle predictor in the model.
โˆ’0.285
๐Ÿ“ต Concentration loss is the #1 negative driver
Social media's damage runs via reduced focus: heavy usage โ†’ broken concentration โ†’ lower CGPA.
+0.198
๐Ÿซ Offline class preference helps
Offline-preferring students trend higher โ€” better engagement, fewer distractions, stronger peer accountability.
+0.016
โฑ๏ธ Raw study hours barely matter
Counterintuitively, hours alone had almost no direct effect. Quality and consistency of focus matters far more.
The People Behind This Study
GM
Gulam Murshed
232-15-336
TR
Tanzina Rahman
232-15-296
ZJ
Zubaer Rahman Jisan
232-15-152
TR
Tawfiqur Rahman
242310005101497
๐Ÿ Python 3.12๐Ÿผ Pandas๐Ÿ“Š Matplotlib๐ŸŽจ Seaborn๐Ÿค– Scikit-learn๐Ÿ“‹ Google Forms๐Ÿ“ก Telegram๐Ÿ’ฌ WhatsApp๐Ÿ“˜ Facebook Messenger๐Ÿ“ˆ Linear Regression๐Ÿ”€ Logistic Regression
How We Collected the Survey Responses

We utilized Telegram, WhatsApp, and Facebook Messenger direct communications for primary data collection. The survey link was distributed through group channels and one-to-one conversations to reach real university students and collect authentic responses. The slots below are arranged to showcase seven examples of our collection methods.

Telegram survey sharing example
Slot 1 โ€” Telegram channel post
Survey link shared through a Telegram channel to reach a larger student audience quickly.
WhatsApp community sharing example
Slot 2 โ€” WhatsApp community outreach
Survey distributed in a WhatsApp community group using a Bangla invitation message and direct form link.
WhatsApp group proof example
Slot 3 โ€” Personal Communication
Individual outreach through Messenger helped collect direct responses from university students.
Messenger direct request example
Slot 4 โ€” Messenger direct request
Individual outreach through Messenger helped collect direct responses from university students.
Personal direct message survey request
Slot 5 โ€” One-to-one personal request
Direct communication used to ask individual students to complete the survey personally.
DIU student outreach
Slot 6 โ€” DIU student outreach
Reserved for an additional screenshot showing direct communication with a DIU student participant.
BRAC / EWU outreach
Slot 7 โ€” BRAC / EWU outreach
Reserved for an additional screenshot showing direct communication with BRAC University or EWU students.
These screenshots document our real primary data collection process using multiple communication channels. Group sharing increased reach, while direct messaging improved response authenticity and participation quality.