Asif Mahdin

About Me

Aspiring Data Scientist | UCSD Fall 2024 Graduate

I am an aspiring data scientist with a strong passion for uncovering insights from data and building innovative solutions. Expected to graduate from UCSD in Fall 2024 with a major in Data Science, I have gained extensive knowledge and hands-on experience in machine learning, data visualization, and deep learning through various impactful projects.

Work Experience & Projects



Sentiment Analysis of Amazon Book Reviews Code and Report 04/01/2024 to 06/10/2024

I conducted a sentiment analysis of Amazon book reviews and successfully predicted ratings using BERT from Hugging Face and XGBoost. This project follows the full lifecycle of a data scientist, from data collection and cleaning to model deployment, demonstrating my ability to deliver end-to-end solutions.

Technologies Used: BERT, XGBoost, Pandas



Detection of AI-Generated Fake News Code | Report 03/01/2023 to 04/10/2023

Developed a classifier to detect between AI-generated fake news and human-written news article descriptions using a fine-tuned BERT model, achieving 92% accuracy. As AI technology improves, there has been an increase in the number of fake news articles on the internet, making this project crucial for maintaining the integrity of online information.

Technologies Used: BERT, GPT-3 API, Scikit-learn, Pandas, Python



Shakespearean Poem Generator Code 05/01/2023 to 05/24/2023

Developed an RNN-based model to generate Shakespearean-style poems that mimic the style of poems Shakespeare could have written. This project highlights my ability to apply deep learning techniques to creative text generation and demonstrates proficiency in handling and preprocessing text data.

Technologies Used: RNN, PyTorch, NLTK, TQDM, Python

Competitions



Taxi Ride Duration Prediction GitHub | Project Presentation 04/15/2023 to 06/10/2023

I competed in a Kaggle competition where my team placed 5th out of 80 teams by predicting completetion time of taxi rides, given detailed information about the ride. We used LSTM and Multi-Layer Perceptron for time series prediction, complemented by extensive data exploration and an ensemble of XGBoost models.

Technologies Used: LSTM, Multi-Layer Perceptron, Pandas, Matplotlib, Random Forest



Loan Default Prediction GitHub | Project Report 04/01/2024 to 06/10/2024

In another Kaggle competition, my team ranked in the top 600 out of 3500 teams by predicting future loan defaults using CatBoost. This project required significant data wrangling, optimization, and domain knowledge to perform well on Kaggle using its limited RAM and GPU. Additionally, a lot of professional developers and software companies participated, and as a group of undergraduate students, we are still proud of what we managed to achieve.

Technologies Used: CatBoost, XGBoost, Polars, Pandas, NumPy

Research Work



Multimorbidity States and Mortality in Sepsis Research Paper | Final Report 09/10/2023 to 12/15/2023

For this research paper, we replicated the analysis conducted on the Medical Information Mart for Intensive Care III (MIMIC III) dataset, focusing on 36,390 patients. Our objective was to identify distinct multimorbidity subgroups and assess their association with organ dysfunction, sepsis, and mortality. Using latent class analysis, we identified six unique sub-groups, each characterized by specific disease patterns and demographics. Notably, the hepatic/addiction and complicated diabetics subgroups exhibited significantly higher rates of adverse health outcomes, including organ dysfunction and mortality. This work emphasizes the importance of incorporating multimorbidity into healthcare models, moving beyond the single-disease paradigm to enhance patient management and clinical trial design. Replicating this research paper inspired my capstone project "EARLY DETECTION OF SEPSIS".

Technologies Used: Latent Class Analysis, Scikit-learn(K-means, Chi-Sqaured), Matplotlib, SQL

My Skills

Programming Languages

  • Python
  • SQL
  • R
  • Strata
  • JavaScript
  • HTML/CSS
  • Java

Machine Learning & Deep Learning

  • TensorFlow
  • Keras
  • PyTorch
  • Scikit-learn (including different types of regressions like linear regression, logistic regressions, and SVMs)
  • Random Forest
  • Ensemble Methods

Data Visualization

  • D3.js
  • Matplotlib
  • Seaborn
  • Folium

Data Manipulation & Analysis

  • Pandas
  • NumPy
  • Polars
  • Dask
  • Geopandas

Web Development

  • Streamlit
  • HTML/CSS

Cloud Computing

  • AWS

Database Management

  • SQL

General Skills

  • Data Cleaning
  • Machine Learning
  • Data Visualization
  • Statistical Analysis
  • Working with Big Data

Libraries/Tools

  • Pandas
  • TensorFlow
  • Keras
  • Matplotlib
  • Streamlit
  • OpenCV
  • Geopandas

Software

  • Jupyter Notebook
  • Git
  • Docker

Volunteer Experience & Languages

Volunteer, Bangladesh Red Cross Society; Dhaka, Bangladesh (06/2018-12/2020)

  • Participated in awareness and vaccination campaigns.
  • Assisted in organizing events for feeding the homeless and distributing resources.
  • Developed leadership skills by coordinating volunteer teams during events.

Volunteer, Mojar School; Dhaka, Bangladesh (06/2018-05/2019)

  • Worked as a regular volunteer providing memorable experiences for street children.
  • Engaged in activities aimed at making a difference in the lives of underprivileged children.

Languages

English: Fluent

Bengali: Native

Hindi: Conversational

Urdu: Conversational

Contact Me

Contact Info