Data Scientist @ Bestbuy MS in Data Science Alumni @ NYU

About Me

I am currently in Barcelona, Spain, attending the 2nd Workshop on Causal Inference and Machine Learning in Practice at ACM KDD 2024. My Oral Presentation is on August 26th at 11:20 AM, and I will be at the conference from August 25th to August 30th.
If you’re also attending, feel free to connect with me via LinkedIn or email!

Hello, my name is Yoon Tae Park. I’m currently a Data Scientist at Best Buy, where I specialize in Customer and Marketing Data Science. My expertise includes causal inference, marketing science, and customer analysis. I’ve successfully implemented uplift modeling to target customers in marketing campaigns, developed customer lifetime value (CLV) models, and conducted churn predictions. Additionally, I focus on digital data analysis to uncover user characteristics based on app experiences. Recently, I’ve been leveraging Large Language Models (LLMs) to identify new topics from NPS detractor survey data.

I hold a Master’s degree in Data Science from NYU. My Data Science career began at Samsung, where I led a machine learning project as a Finance Manager. To further deepen my knowledge, I moved to the USA and completed my degree at NYU. During my studies, I interned at Walmart Global Tech, where I built a machine learning pipeline to address e-commerce challenges.

My professional interests lie in solving real-world problems through data science, with a particular focus on causal inference and Natural Language Processing (NLP)/Natural Language Understanding (NLU).roblems in e-commerce.

Please refer to Project section to find out more.

My current skill set includes:

Language: Python(including pytorch/tensorflow), SQL, PySpark
Tools: GCP, MS office, Tableau, Git/Github

If you are interested, please contact me via email or linkedin.

Resume: Resume_YoonTaePark.pdf

Projects

Automated Judicial Case Briefing (mentored by World Bank)

Link) Automated Judicial Case Breifing

Creating a model that extracts relevant information from extremely long legal documents

Created a long legal document dataset through web scraping of US Supreme Court judicial cases
Developed a summary generation pipeline using SOTA models (BART, T5, and LED), resulting in a RougeL score 0.47
Implemented text extraction methods, such as TextRank, to improve the performance of the summarization model

Dissecting Compositional Generalization (mentored by Faculty Fellow)

Link) Dissecting Compositional Generalization

Correlation Analysis on Generalization Capacities of Different Compositional Problems

Evaluated the accuracy of the model on 9 different compositional tasks, using 20 different initial weights
Built a codebase for a training evaluation pipeline, primarily used in the team’s research
Identified a positive correlation on certain tasks, providing evidence of compositional generalization capacity

2022 HUMANA/MAYS Healthcare Analytics Case Competition (top 10)

Link) Healthcare Analytics Case Competition

Identification of medicare members facing housing insecurity through predictive modeling and recommendations on how to help achieve there best health

Conducted EDA of medicare members facing housing insecuiry and segmented by three factors: Age, Region and Medical enviromnet
Bulit a model using lightGBM that showed 0.7573 ROC-AUC score in classifing medicare members facing insecurity issues
Suggested recommendations based on segmentations such as direct support on younger generations and government support on older generations

A Toxicity Detection Dataset with r/WallStreetBet Comments

Link) Toxicity Detection Dataset

Created a toxicity dataset from Reddit that can be used for any NLP/NLU research

Created a new toxicity detection dataset using comments from r/WallStreetBets and toxicity labels from Perspective API
Evaluated the dataset by comparing human baseline to current SOTA models: GPT-3, BERT, RoBERTa, and DeBERTa

Collaborative-Filter Based Modeling for Movie Recommendation

Link) Movie Recommendation Model

Created a movie recommender model that suggests top 100 movies for each users

Created a recommender system using PySpark’s ALS method that provides top 100 movies for each user
Evaluated the model on Normalized Discounted Cumulative Gain (NDCG), and created a comparison to a single-machine implementation using LensKit

Citadel Fall 2021 Central Regional Datathon (1st place) - Tobacco usage data analysis

Link) Tobacco usage data analysis

Created a comprehensive analysis of Tobacco usage that improved MPOWER Effectiveness

Generated geographic data visualizations that shows the pattern of tobacco usage in relation to MPOWER (WHO Framework Convention on Tobacco Control)
Built a regression model on tobacco usage using MPOWER and Geographic location

NYU Center for Data Science - Graduate Student Analysis and Visualization

Link) Graduate Student Analysis

Created a comprehensive analysis of Graduate Students in NYU Data Science department

Analyzed NYU data science past graduate students’ academic data to uncover distinct correlations
Visualized student academic performances and career outcomes by tableau that will assist prospective incoming students

Experience

Walmart Global Tech

Data Science Intern

May.2022 - Aug.2022

Worked in e-commerce team, leveraging rest of market data source to improve Walmart e-commerce platform

Created a benchmark dataset using rest of market and Walmart item information that enriched the team’s research
Proposed and implemented a boosting model using a benchmark dataset, resulting in a 35% improvement in the current category mapping performance
Deployed a machine learning pipeline to predict categories for new items, replacing the team’s method

UNDP

Research Assistant

Dec. 2021 - Mar.2022

Worked as a research assistant in SDG(Sustainable Development Goals) AI Lab, contributing on Disaster Risk Reduction project.

Collaborated with Bilkent University to develop a model for safely routing people from an earthquake
Evaluated the reference risk of streets and suggested the safest path in Istanbul by calculating the total risk

Samsung Electronics

Machine Learning Project Lead

Oct. 2018 - Jun.2021

Elected as chair of automation projects for finance division that applies machine learning solutions.

Improved task accuracy from 60% to 95% through the application of Decision Tree and Naive Bayes methods
Identified problems and information on data structure and preprocessing and proposed categorization of input variables to prevent data overfitting
Analyzed and enhanced over six-hundred SAP finance programs on ABAP
Developed training materials and educated 300,000 employees, as well as over 100 foreign subsidiaries
Integrated two separate systems on approval and expense management and established automated processes that eliminated inefficiency and directly connects with accounting system to reduce the number of approvals from 1.8 million to 1.2 million a year
Created a system that uses OCR technology to pull accounting information from vendor invoice
Built an automation scenario from invoice to accounting system and proposed learning methodologies for each field with a developer

Finance Manager

Feb. 2014 - Oct.2018

Worked as a finance manager in Mobile division

Analyzed, investigated, and reported financial information of the mobile division in monthly closing of financial statements
Compared forecasts with actual data and added new benchmarks to save 20% in annual budget planning
Developed a chatbot that provides information on expense management and reimbursement in company intranet; used node.js and natural language processing and completed prototype production
Managed facility investments of 8 overseas production plants, reviewed the investment plans and saved USD 730 million by comparing plans with executed investments and reallocating excessive investment budgets to future years
Lowered production cost per unit from USD 3.73/unit in 2015 to USD 3.43/unit in 2016, saving USD 240 million in total; analyzed yield and capacity utilization of the production plants and identified ways to boost productivity such as production re-scheduling and optimizing internal manufacturing cost
Won a prize in Samsung Electronics Hackathon

Education

New York University

Master of Science in Data Science (GPA 3.95)

Expected Graduation) May 2023

Studying foundation of data science, with the focus on NLP/NLU

Studying relevant courseworks such as Machine Learning, Big Data, Natural Language Understanding, and Time Series Forecasting
Working on various project that uses data science to solve real world problems.
Currently working on two NLP/NLU projects. (One with World Bank, and one with Faculty Fellow)

Seoul National University

Bachelor of Arts in Business Administration

Graduated Feb. 2014

Studied business administration, with the focus on finance

Studied finance, marketing, accounting and calculus that had lots of team projects
Learned my key skills such as teamwork and working on a tight deadlines.
A committee member of the SNU Finance Club (Team Leader) and SNU Mentoring Club (Vice President)

A Little More About Me

I enjoy working on side projects/Datathon. I was recently placed 1st place at Citadel Regional Datathon. I am constantly participating on data challenges(including Kaggle), so contact me if you are interested in those challenges!

Outside of data science works, I enjoy traveling overseas and drinking coffee.

Yoon Tae Park