About Me
I am currently in Barcelona, Spain, attending the 2nd Workshop on Causal Inference and Machine Learning in Practice at ACM KDD 2024.
My Oral Presentation is on August 26th at 11:20 AM, and I will be at the conference from August 25th to August 30th.
If you’re also attending, feel free to connect with me via LinkedIn or email!
Hello, my name is Yoon Tae Park. I’m currently a Data Scientist at Best Buy, where I specialize in Customer and Marketing Data Science. My expertise includes causal inference, marketing science, and customer analysis. I’ve successfully implemented uplift modeling to target customers in marketing campaigns, developed customer lifetime value (CLV) models, and conducted churn predictions. Additionally, I focus on digital data analysis to uncover user characteristics based on app experiences. Recently, I’ve been leveraging Large Language Models (LLMs) to identify new topics from NPS detractor survey data.
I hold a Master’s degree in Data Science from NYU. My Data Science career began at Samsung, where I led a machine learning project as a Finance Manager. To further deepen my knowledge, I moved to the USA and completed my degree at NYU. During my studies, I interned at Walmart Global Tech, where I built a machine learning pipeline to address e-commerce challenges.
My professional interests lie in solving real-world problems through data science, with a particular focus on causal inference and Natural Language Processing (NLP)/Natural Language Understanding (NLU).roblems in e-commerce.
Please refer to Project section to find out more.
My current skill set includes:
-
Language: Python(including pytorch/tensorflow), SQL, PySpark
-
Tools: GCP, MS office, Tableau, Git/Github
If you are interested, please contact me via email or linkedin.
Resume: Resume_YoonTaePark.pdf
Projects
Creating a model that extracts relevant information from extremely long legal documents
-
Created a long legal document dataset through web scraping of US Supreme Court judicial cases
-
Developed a summary generation pipeline using SOTA models (BART, T5, and LED), resulting in a RougeL score 0.47
-
Implemented text extraction methods, such as TextRank, to improve the performance of the summarization model
Dissecting Compositional Generalization (mentored by Faculty Fellow)
Link) Dissecting Compositional Generalization
Correlation Analysis on Generalization Capacities of Different Compositional Problems
-
Evaluated the accuracy of the model on 9 different compositional tasks, using 20 different initial weights
-
Built a codebase for a training evaluation pipeline, primarily used in the team’s research
-
Identified a positive correlation on certain tasks, providing evidence of compositional generalization capacity
2022 HUMANA/MAYS Healthcare Analytics Case Competition (top 10)
Link) Healthcare Analytics Case Competition
Identification of medicare members facing housing insecurity through predictive modeling and recommendations on how to help achieve there best health
-
Conducted EDA of medicare members facing housing insecuiry and segmented by three factors: Age, Region and Medical enviromnet
-
Bulit a model using lightGBM that showed 0.7573 ROC-AUC score in classifing medicare members facing insecurity issues
-
Suggested recommendations based on segmentations such as direct support on younger generations and government support on older generations
Created a toxicity dataset from Reddit that can be used for any NLP/NLU research
-
Created a new toxicity detection dataset using comments from r/WallStreetBets and toxicity labels from Perspective API
-
Evaluated the dataset by comparing human baseline to current SOTA models: GPT-3, BERT, RoBERTa, and DeBERTa
Created a movie recommender model that suggests top 100 movies for each users
- Created a recommender system using PySpark’s ALS method that provides top 100 movies for each user
- Evaluated the model on Normalized Discounted Cumulative Gain (NDCG), and created a comparison to a single-machine implementation using LensKit
Citadel Fall 2021 Central Regional Datathon (1st place) - Tobacco usage data analysis
Link) Tobacco usage data analysis
Created a comprehensive analysis of Tobacco usage that improved MPOWER Effectiveness
- Generated geographic data visualizations that shows the pattern of tobacco usage in relation to MPOWER (WHO Framework Convention on Tobacco Control)
- Built a regression model on tobacco usage using MPOWER and Geographic location
NYU Center for Data Science - Graduate Student Analysis and Visualization
Link) Graduate Student Analysis
Created a comprehensive analysis of Graduate Students in NYU Data Science department
- Analyzed NYU data science past graduate students’ academic data to uncover distinct correlations
- Visualized student academic performances and career outcomes by tableau that will assist prospective incoming students
Experience
Walmart Global Tech
Data Science Intern
May.2022 - Aug.2022
Worked in e-commerce team, leveraging rest of market data source to improve Walmart e-commerce platform
-
Created a benchmark dataset using rest of market and Walmart item information that enriched the team’s research
-
Proposed and implemented a boosting model using a benchmark dataset, resulting in a 35% improvement in the current category mapping performance
-
Deployed a machine learning pipeline to predict categories for new items, replacing the team’s method
UNDP
Research Assistant
Dec. 2021 - Mar.2022
Worked as a research assistant in SDG(Sustainable Development Goals) AI Lab, contributing on Disaster Risk Reduction project.
-
Collaborated with Bilkent University to develop a model for safely routing people from an earthquake
-
Evaluated the reference risk of streets and suggested the safest path in Istanbul by calculating the total risk
Samsung Electronics
Machine Learning Project Lead
Oct. 2018 - Jun.2021
Elected as chair of automation projects for finance division that applies machine learning solutions.
-
Improved task accuracy from 60% to 95% through the application of Decision Tree and Naive Bayes methods
-
Identified problems and information on data structure and preprocessing and proposed categorization of input variables to prevent data overfitting
-
Analyzed and enhanced over six-hundred SAP finance programs on ABAP
-
Developed training materials and educated 300,000 employees, as well as over 100 foreign subsidiaries
-
Integrated two separate systems on approval and expense management and established automated processes that eliminated inefficiency and directly connects with accounting system to reduce the number of approvals from 1.8 million to 1.2 million a year
-
Created a system that uses OCR technology to pull accounting information from vendor invoice
-
Built an automation scenario from invoice to accounting system and proposed learning methodologies for each field with a developer
Finance Manager
Feb. 2014 - Oct.2018
Worked as a finance manager in Mobile division
-
Analyzed, investigated, and reported financial information of the mobile division in monthly closing of financial statements
-
Compared forecasts with actual data and added new benchmarks to save 20% in annual budget planning
-
Developed a chatbot that provides information on expense management and reimbursement in company intranet; used node.js and natural language processing and completed prototype production
-
Managed facility investments of 8 overseas production plants, reviewed the investment plans and saved USD 730 million by comparing plans with executed investments and reallocating excessive investment budgets to future years
-
Lowered production cost per unit from USD 3.73/unit in 2015 to USD 3.43/unit in 2016, saving USD 240 million in total; analyzed yield and capacity utilization of the production plants and identified ways to boost productivity such as production re-scheduling and optimizing internal manufacturing cost
-
Won a prize in Samsung Electronics Hackathon
Education
New York University
Master of Science in Data Science (GPA 3.95)
Expected Graduation) May 2023
Studying foundation of data science, with the focus on NLP/NLU
-
Studying relevant courseworks such as Machine Learning, Big Data, Natural Language Understanding, and Time Series Forecasting
-
Working on various project that uses data science to solve real world problems.
-
Currently working on two NLP/NLU projects. (One with World Bank, and one with Faculty Fellow)
Seoul National University
Bachelor of Arts in Business Administration
Graduated Feb. 2014
Studied business administration, with the focus on finance
-
Studied finance, marketing, accounting and calculus that had lots of team projects
-
Learned my key skills such as teamwork and working on a tight deadlines.
-
A committee member of the SNU Finance Club (Team Leader) and SNU Mentoring Club (Vice President)
A Little More About Me
I enjoy working on side projects/Datathon. I was recently placed 1st place at Citadel Regional Datathon. I am constantly participating on data challenges(including Kaggle), so contact me if you are interested in those challenges!
Outside of data science works, I enjoy traveling overseas and drinking coffee.