top of page
Industrial Abstract Object

Analytics + Machine Learning

Analytics and machine learning (ML) uses statistics, programming & automation to identify patterns in data to create algorithms, probability estimation and statistical models to predict outcomes. Unlike data analysis which focuses on current and past data, analytics helps uncover trends, patterns, and key performance indicators to guide decisions,  while machine learning takes it further by enabling systems to learn from data and make predictions, automating processes without explicit programming. Examples of use cases include:

  • Customer Churn Prediction – Using historical customer data to forecast which users are at risk of leaving.

  • Sales Forecasting – Leveraging time-series analysis and ML to predict monthly or seasonal sales.

  • Product Recommendation Systems – Recommending products based on user behavior.

  • Operational Risk Scoring – Identifying risks in processes using real-time data inputs and historical patterns.

  • Forecast Demand and Customer Behavior

Whether your business is looking to improve operational efficiency, boost sales, or understand customer behavior, I will provide you with the analytics and machine learning tools and capabilities to make your data work smarter for you.

​​​

Analytics
Descriptive Analytics – Understanding what has happened (dashboards, KPIs, trend analysis).

Diagnostic Analytics – Understanding why it happened (correlation analysis, root cause exploration).

Exploratory Data Analysis (EDA) – Identifying patterns, outliers, and structure within datasets.

Data Cleaning & Preparation – Ensuring data quality for accurate modeling.

​

Machine Learning
Supervised Learning – Training algorithms on labeled datasets for prediction (e.g., regression, classification).

Unsupervised Learning – Discovering hidden structures in data (e.g., clustering, anomaly detection).

Model Evaluation & Tuning – Using metrics like accuracy, precision, and recall to assess model performance.

Pipeline Automation – Building reproducible and scalable ML workflows.

​

Combined Approach
Begin with business goal alignment and data discovery

Conduct exploratory analysis to inform hypotheses

Build and train ML models to predict, classify, or optimize

Deploy models into BI tools or operational systems for real-time insights

Continuously monitor and retrain models as new data emerges

Methodologies

Work Samples

Augmented Analytics: Blend of AI/ML Powered Insights

Key Influencers and Decomposition Trees are just two examples of the powerful AI/ML tools that allow users to dive deeper into root cause analysis. Using regression analysis and the power of AI, I can help you identify individual factors (key influencers) that are driving a specific outcome as well as visualize the data across multiple dimensions (decomposition tree).

Key Influencers AI.png
Decomposition Tree AI.png

Investment Portfolio Optimization

OBJECTIVE

Use statistics and regression analysis to decide how to allocate dollars earmarked for investments into mortgage-backed securities while looking at factors that drive housing prices to optimize the portfolio.

ASSUMPTION

Factors of the economy such as unemployment rate, consumer price index (CPI-possible inflation), GDP, and stock market stability are not accounted for in this analysis.

PARAMETERS

Out of the 80 variables, 14 factors (independent variables) were selected to start testing based on variables that historically affected housing prices (dependent variable) in various markets. 

Neighborhood

Size (square footage)

Property Size (Lot Footage)

Bedroom/Bathroom

Quality Rating

Age

Sale Type & Condition

Zoning

Remodel

House Style (# of Floors)

Garage 

Basement

Central Air

Seasonality

​​

DATA CLEANSING & TRANFORMATION

Certain data based on relevance and frequency as well as outliers were removed. All other data was normalized. Data was first examined for relevance. If a data value in a column was over 95% representative of the data in the column, the column was dropped (present in 1388+ of the 1460 rows of data).

PROCESS

NEIGHBORHOOD

Observation: Although the neighborhood may affect the sale price of a home, the sales prices of neighborhoods alone are not enough to determine the risk associated with investment. A "high-tier" neighborhood did not necessarily yield investment opportunities that posed the least risk. On the contrary, the sales prices fluctuated the most within this group.

​

Strategy: YoY% change analysis was done for each neighborhood to understand performance of sale prices of each neighborhood over a four year period and a linear regression analysis was completed to forecast future valuation to determine if an investment would be recommended or too risky.

​

Results: Based on the results of the YoY analysis, the 25 neighborhoods were divided into three groups: Invest (green), Wild Card (grey), and Risk (red), and nine neighborhoods were identified as potential investment candidates. Linear forecast prediction models were created for these nine and using a combination strategy, the risk levels of these nine were broken down even further: Least Risk, Mid-Risk, Higher Risk. Although the "higher risk" neighborhoods have a negative linear forecast, it should be noted that this is due to an outlier year that may skew the linear analysis and shadow the continual increase in sales price over the most recent years.

​

Neighborhood Evaluation.png
Linear regression of neighborhoods.png
LEAST RISK
HIGHER RISK
MID-RISK

SQFT  | LOT SIZE  |  BEDROOMS & BATHROOMS  |  OVERALL QUALITY  REGRESSION ANALYSIS

RESULTS: These variables present a statistical significance on the sales price of a home with an overall R value of .75.

 

SQFT: $57 Increase in sales price (SP) every sqft

​

Lot Size: Although R square is low (the model does not explain much of the variation in the data), p-value < .05 and is therefore statistically significant. There is a $.92 increase in SP every sqft.

​

Bedrooms & Bathrooms: Coefficients are unstable. It is advised that houses that present with zero bedrooms and bathrooms be reviewed for errors.

​

Age : $501 Decrease in value every year

Quality Rating: $ 25,687 increase in SP every time the quality rating of the neighborhood increases by 1.

Regression Analysis 1.png
Residual Plot.png
S.png
Lot Size.png
Ag.png
Q.png

HOUSE STYLE  |  GARAGE  |  BASEMENT  |  CENTRAL AIR

Strategy: Two-Sample T-Test for each variable and Regression Analysis across all variables

​

T-test Results: p-value was < .05 for all t-tests indicating that each variable has a statistical significance on sales price using a 95%confidence interval.

​

Regression Analysis Results:

     House Style (# of Floors): $26,441 Increase

     Garage: $63,939 Increase

     Basement: $53,421 Increase

     Central Air: $62,136 Increase

Housing Style.png
Garage.png
basement.png
Central air.png

Assumptions for Housing Style

  • Homes are divided into homes with >=2 floors and <2 floors

  • Unfinished homes were not considered as they can be considered outliers that can skew the data (1.7%).

  • Split levels and split foyers are considered as having >=2 floors.

*Error bars are not overlapping indicating that the findings may be statistically significant

Other factors such as Zoning and Remodeled homes were considered as well:

ZONING:​

Homes with zoning RL and FV are worth investing in for the MBS portfolio.

 

1) These two zones make up 83% of the houses being considered

​

2) Their average sale prices are above $191k.

 

Following RL, there is a 36% drop in price with homes in zone RH. Homes in zones RH, RM, and C have sale prices that average < $131k.

Zoning.png

REMODEL:

With a p-value >.05, we can determine that the remodeling of a house does not have a statistical significance on sales prices.

Remodel.png
Remodel t-test.png
bottom of page