Aspiring Data Scientist | UCSD Fall 2024 Graduate
I am an aspiring data scientist with a strong passion for uncovering insights from data and building innovative solutions. Expected to graduate from UCSD in Fall 2024 with a major in Data Science, I have gained extensive knowledge and hands-on experience in machine learning, data visualization, and deep learning through various impactful projects.
For my capstone project at UCSD, I led the development of an LSTM neural network to predict the onset of sepsis before it occurs, achieving up to 81% accuracy and predicting up to 50 hours ahead in ICU settings, which is crucial. This project won the best capstone award at UCSD for 2024 among all data science students. The project demonstrates my ability to combine technical skills with medical knowledge to create life-saving solutions, saving nurses a lot of time and guesswork. The project is hosted online, accessible on both mobile and PC.
Technologies Used: TensorFlow, Pandas, NumPy, Matplotlib, Seaborn, Keras, SQL, AWS
I developed an highly interactive data visualization project using D3.js, featuring five detailed graphs that tell a compelling story about preventable diseases in Africa. This project is accessible online and optimized for both mobile and PC, providing stakeholders with clear and impactful insights.
Technologies Used: D3.js, JavaScript, HTML/CSS, AWS
I created a web application called Dino-Detector that allows users to upload pictures of dinosaurs for identification. The system uses a ResNet-based CNN and Folium for map visualization, displaying where fossils of identified dinosaurs have been found. This project ranked 3rd at a dinosaur-themed datathon at UCSD, competing against over 100 teams, and was completed in just one day! This project is available online and works seamlessly on both mobile and PC.
Technologies Used: ResNet, Folium, Streamlit, AWS
I conducted a sentiment analysis of Amazon book reviews and successfully predicted ratings using BERT from Hugging Face and XGBoost. This project follows the full lifecycle of a data scientist, from data collection and cleaning to model deployment, demonstrating my ability to deliver end-to-end solutions.
Technologies Used: BERT, XGBoost, Pandas
Developed a classifier to detect between AI-generated fake news and human-written news article descriptions using a fine-tuned BERT model, achieving 92% accuracy. As AI technology improves, there has been an increase in the number of fake news articles on the internet, making this project crucial for maintaining the integrity of online information.
Technologies Used: BERT, GPT-3 API, Scikit-learn, Pandas, Python
Developed an RNN-based model to generate Shakespearean-style poems that mimic the style of poems Shakespeare could have written. This project highlights my ability to apply deep learning techniques to creative text generation and demonstrates proficiency in handling and preprocessing text data.
Technologies Used: RNN, PyTorch, NLTK, TQDM, Python
I competed in a Kaggle competition where my team placed 5th out of 80 teams by predicting completetion time of taxi rides, given detailed information about the ride. We used LSTM and Multi-Layer Perceptron for time series prediction, complemented by extensive data exploration and an ensemble of XGBoost models.
Technologies Used: LSTM, Multi-Layer Perceptron, Pandas, Matplotlib, Random Forest
In another Kaggle competition, my team ranked in the top 600 out of 3500 teams by predicting future loan defaults using CatBoost. This project required significant data wrangling, optimization, and domain knowledge to perform well on Kaggle using its limited RAM and GPU. Additionally, a lot of professional developers and software companies participated, and as a group of undergraduate students, we are still proud of what we managed to achieve.
Technologies Used: CatBoost, XGBoost, Polars, Pandas, NumPy
For this research paper, we replicated the analysis conducted on the Medical Information Mart for Intensive Care III (MIMIC III) dataset, focusing on 36,390 patients. Our objective was to identify distinct multimorbidity subgroups and assess their association with organ dysfunction, sepsis, and mortality. Using latent class analysis, we identified six unique sub-groups, each characterized by specific disease patterns and demographics. Notably, the hepatic/addiction and complicated diabetics subgroups exhibited significantly higher rates of adverse health outcomes, including organ dysfunction and mortality. This work emphasizes the importance of incorporating multimorbidity into healthcare models, moving beyond the single-disease paradigm to enhance patient management and clinical trial design. Replicating this research paper inspired my capstone project "EARLY DETECTION OF SEPSIS".
Technologies Used: Latent Class Analysis, Scikit-learn(K-means, Chi-Sqaured), Matplotlib, SQL
Volunteer, Bangladesh Red Cross Society; Dhaka, Bangladesh (06/2018-12/2020)
Volunteer, Mojar School; Dhaka, Bangladesh (06/2018-05/2019)
English: Fluent
Bengali: Native
Hindi: Conversational
Urdu: Conversational