
ABOUT ME
Hello and welcome to my website! I am Ran. Nice meeting you virtually :)
With over 2 years of professional experiences, I'm successful at solving business questions using machine learning models, translating large dataset into actionable insights and boosting operational performance, which includes bringing exceptional abilities in business intelligence development, market trend assessment, and database management.
​
Proven ability to communicate with technical professionals and end users to translate business requirements.
Adept at SQL, Python, R, Tableau, Microsoft Office suite, Java, Google Analytics. Experienced in Hadoop, MapReduce, Spark, Azure, Stata, MATLAB.
​
​
​
​
​
Let's get connected through the channels below
PROJECTS

Truck Delivery Routes Optimization
Problem: Design an optimal delivery routes for trucks between cities
Approach: Used a revised version of Dijkstra’s algorithm and Graph Theory (all cities serve the function of vertices and all roads are edges). Data structures such as ArrayList and PriorityQueue were used.
Outcome: The total traveling distances were down roughly by 10%.
-
Java Data Structure
-
Graph Theory
​
​
Java

Trading Strategy Optimization
Problem: Proposed trading strategies under different market regimes
Approach: Detected market regimes using "Correlation Matrix & Clustering" and "PCA & Changepoint Detection"; defined low-risk trading strategies using “Graph Algorithms”
Outcome: Gained a 20-year cumulative return 310% and beat the pre-set benchmark (equally weighted approach 165%).
​
-
Data Science Consulting
-
Trading Strategies
-
Predictive Modeling
R

Creditcard Fraud Risk Detection
Problem: Identified fraudulent transactions
Approach: Exploratory data analysis, feature engineering, modeling using PyOD (KNN, PCA, IForest, AutoEncoder), model stability, hyper-parameters turning, statistical analysis
Outcome: Detected the outlier group and summarized the insights of each feature
​
-
Unsupervised Machine Learning
-
Feature Engineering
-
Anomaly Detection, Statistical Analysis
Python, R

Covid Impact on People&Goods Mobility
Problem: Studied the impact of Covid on mobility of people and goods in the U.S.
Approach: Visualized the people movement and logistic transportation in Tableau, tracked the correlation between industrial index and confirmed cases, used ARIMA model to quantified the covid effect on goods, predicted the future 3 months impact using XGBoost and Random Forest
Outcome: 2nd prize in Columbia University x Two Sigma 2020 Data Science Hackathon
-
Predictive Modeling
-
Trends Analysis
PySpark, Python

Mortgage Default Detection
Problem: Identified fraudulent transactions
Approach: Missing data imputation, categorical variable encoding, Python H2O random forest modeling, over-sampling/ under-sampling the imbalanced data
Outcome: Improved the model predictability using under-sampling and gained a 2.12 lift value
​
-
Supervised Machine Learning
-
Imbalanced Data
Python

Coffee Shop Profit Simulation
Problem: Determined the optimal number of employee to hire for maximizing profit
Approach: Built an event-driven simulation to estimate a tradeoff between a high operational cost (more employee) and a high customer churn rate (less employee),
through customers arrival and departure events in time order.
Outcome: Found the number of employee to hire for different customer arriving rates.
​
-
Event-driven Programming
-
Java Data Structure & Algorithm
Java

Problem: Compared three models (1) Binomial Tree model (2) Monte-Carlo Simulation (3) Analytic Model, the Black-Scholes formula, using European put option and American put option
Approach: Studied the convergence pattern of the Binomial Tree method with different number of paths, ranging from 30 to 39, 300 to 309; simulated price and performed statistical analysis to show the price properties of Monte-Carlo Simulation result; compared how fast the Binomial Tree method converged using the Black-Scholes as a benchmark.
Outcome: Three approaches of pricing a European put option converged
-
Derivatives Quantitative Modeling
Excel, Python, R

Vaccine Model Simulation
Problem: Determined vaccine release date
Approach: Build two models (without vaccine and with vaccine), used differential equation to simulate the change in population group (susceptible group, infected group, immune group), conducted cost & benefit analysis to study the net effect of different vaccine release date and vaccine efficacy rate
Outcome: Changing the vaccine release date generated a higher net impact than varying the vaccine efficacy rate
​
-
Differential Equation & Regression
-
Cost & Benefit Analysis
MATLAB

Crypo Price Prediction
Problem: Predicted Ethereum return
Approach: Web scraped 5 years ETH using Python Selenium, manipulated and cleaned the raw data, tested stationarity, eliminated the seasonality effect, compared three models and predicted the future three months price return; conducted lead-lag analysis between ETH and bitcoin; studied the impact of macro news on the price of ETH fluctutaion
Outcome: Mean error 2.79
​
-
Time Series & Predictive Modeling
-
Lead-lag analysis
​
​
Python
Experiences
Education
Data Analyst
Compass
​Proposed actionable business insights from large datasets; updated customer segmentation in response to Covid and built pricing models; created Manhattan & BK rental market report with data analysis and visualization; led A/B testing on email marketing to increase the response rate
​
Mar.2020-Sep.2020​
Data Analyst
The Alliance
Analyzed the trend and seasonality of global warming time series data over the last 15 years; conducted multivariate liner regression analysis of 8 factors to evaluate the global warming impact; quantified the result by building a gas emission calculator and presented to the local high schools
​
Oct.2016-May2017

Data Toolkit
SQL, R, Python, Java,
Excel (pivot table, vlookup), MATLAB, Stata
​

Database & Big Data
ETL, NoSQL (MongoDB, Cassandra), Hadoop, MapReduce, Spark, AWS
​

Visualization
Tableau, Shiny, PowerPoint,
Python (matplotlib, plotly, seaborn, bokeh)
​

Marketing & Product
A/B Testing, Multivariate Testing
SEO Optimization
Marketing Plans & Strategies
Client, Research

Machine Learning
Regression, PCA, Clustering, Classification, Tree models, Recommendation System, Anomaly Detection

Soft skills
Enthusiastic self-starter, fast learner, multi-tasker, and problem solver.
​
​

"Ran is an amazing student and analyst. I had the pleasure of teaching her in my SQL class at Columbia. She never earned a score less than 100% in that class. But, I am not surprised by that. She had assisted me outside of class on various data-related projects to apply her learned skills. So I was aware of her advanced skills. Her determination to squeeze every learning opportunity for its maximum value will make her a valuable asset to any organization. I am certain of that." - Day Yi, Associate Faculty at Columbia University
.jpeg)
"Ran is a very hard working and dedicated team player. She was the data analyst in our company. Her managers reported that she completed all her work on time and were all done well. She is also very responsible, kind and easy to work with." - Angela Sta. Cruz, Associate Manager at Gugnir & Partners
