
Analytics + Machine Learning
Analytics and machine learning (ML) uses statistics, programming & automation to identify patterns in data to create algorithms, probability estimation and statistical models to predict outcomes. Unlike data analysis which focuses on current and past data, analytics helps uncover trends, patterns, and key performance indicators to guide decisions, while machine learning takes it further by enabling systems to learn from data and make predictions, automating processes without explicit programming. Examples of use cases include:
-
Customer Churn Prediction – Using historical customer data to forecast which users are at risk of leaving.
-
Sales Forecasting – Leveraging time-series analysis and ML to predict monthly or seasonal sales.
-
Product Recommendation Systems – Recommending products based on user behavior.
-
Operational Risk Scoring – Identifying risks in processes using real-time data inputs and historical patterns.
-
Forecast Demand and Customer Behavior
Whether your business is looking to improve operational efficiency, boost sales, or understand customer behavior, I will provide you with the analytics and machine learning tools and capabilities to make your data work smarter for you.
​​​
Analytics
Descriptive Analytics – Understanding what has happened (dashboards, KPIs, trend analysis).
Diagnostic Analytics – Understanding why it happened (correlation analysis, root cause exploration).
Exploratory Data Analysis (EDA) – Identifying patterns, outliers, and structure within datasets.
Data Cleaning & Preparation – Ensuring data quality for accurate modeling.
​
Machine Learning
Supervised Learning – Training algorithms on labeled datasets for prediction (e.g., regression, classification).
Unsupervised Learning – Discovering hidden structures in data (e.g., clustering, anomaly detection).
Model Evaluation & Tuning – Using metrics like accuracy, precision, and recall to assess model performance.
Pipeline Automation – Building reproducible and scalable ML workflows.
​
Combined Approach
Begin with business goal alignment and data discovery
Conduct exploratory analysis to inform hypotheses
Build and train ML models to predict, classify, or optimize
Deploy models into BI tools or operational systems for real-time insights
Continuously monitor and retrain models as new data emerges
Methodologies
Work Samples
Augmented Analytics: Blend of AI/ML Powered Insights
Key Influencers and Decomposition Trees are just two examples of the powerful AI/ML tools that allow users to dive deeper into root cause analysis. Using regression analysis and the power of AI, I can help you identify individual factors (key influencers) that are driving a specific outcome as well as visualize the data across multiple dimensions (decomposition tree).


Investment Portfolio Optimization
OBJECTIVE
Use statistics and regression analysis to decide how to allocate dollars earmarked for investments into mortgage-backed securities while looking at factors that drive housing prices to optimize the portfolio.
ASSUMPTION
Factors of the economy such as unemployment rate, consumer price index (CPI-possible inflation), GDP, and stock market stability are not accounted for in this analysis.
PARAMETERS
Out of the 80 variables, 14 factors (independent variables) were selected to start testing based on variables that historically affected housing prices (dependent variable) in various markets.
Neighborhood
Size (square footage)
Property Size (Lot Footage)
Bedroom/Bathroom
Quality Rating
Age
Sale Type & Condition
Zoning
Remodel
House Style (# of Floors)
Garage
Basement
Central Air
Seasonality
​​
DATA CLEANSING & TRANFORMATION
Certain data based on relevance and frequency as well as outliers were removed. All other data was normalized. Data was first examined for relevance. If a data value in a column was over 95% representative of the data in the column, the column was dropped (present in 1388+ of the 1460 rows of data).
PROCESS
NEIGHBORHOOD
Observation: Although the neighborhood may affect the sale price of a home, the sales prices of neighborhoods alone are not enough to determine the risk associated with investment. A "high-tier" neighborhood did not necessarily yield investment opportunities that posed the least risk. On the contrary, the sales prices fluctuated the most within this group.
​
Strategy: YoY% change analysis was done for each neighborhood to understand performance of sale prices of each neighborhood over a four year period and a linear regression analysis was completed to forecast future valuation to determine if an investment would be recommended or too risky.
​
Results: Based on the results of the YoY analysis, the 25 neighborhoods were divided into three groups: Invest (green), Wild Card (grey), and Risk (red), and nine neighborhoods were identified as potential investment candidates. Linear forecast prediction models were created for these nine and using a combination strategy, the risk levels of these nine were broken down even further: Least Risk, Mid-Risk, Higher Risk. Although the "higher risk" neighborhoods have a negative linear forecast, it should be noted that this is due to an outlier year that may skew the linear analysis and shadow the continual increase in sales price over the most recent years.
​


LEAST RISK
HIGHER RISK
MID-RISK
SQFT | LOT SIZE | BEDROOMS & BATHROOMS | OVERALL QUALITY REGRESSION ANALYSIS
RESULTS: These variables present a statistical significance on the sales price of a home with an overall R value of .75.
SQFT: $57 Increase in sales price (SP) every sqft
​
Lot Size: Although R square is low (the model does not explain much of the variation in the data), p-value < .05 and is therefore statistically significant. There is a $.92 increase in SP every sqft.
​
Bedrooms & Bathrooms: Coefficients are unstable. It is advised that houses that present with zero bedrooms and bathrooms be reviewed for errors.
​
Age : $501 Decrease in value every year
Quality Rating: $ 25,687 increase in SP every time the quality rating of the neighborhood increases by 1.






HOUSE STYLE | GARAGE | BASEMENT | CENTRAL AIR
Strategy: Two-Sample T-Test for each variable and Regression Analysis across all variables
​
T-test Results: p-value was < .05 for all t-tests indicating that each variable has a statistical significance on sales price using a 95%confidence interval.
​
Regression Analysis Results:
House Style (# of Floors): $26,441 Increase
Garage: $63,939 Increase
Basement: $53,421 Increase
Central Air: $62,136 Increase




Assumptions for Housing Style
-
Homes are divided into homes with >=2 floors and <2 floors
-
Unfinished homes were not considered as they can be considered outliers that can skew the data (1.7%).
-
Split levels and split foyers are considered as having >=2 floors.
*Error bars are not overlapping indicating that the findings may be statistically significant
Other factors such as Zoning and Remodeled homes were considered as well:
ZONING:​
Homes with zoning RL and FV are worth investing in for the MBS portfolio.
1) These two zones make up 83% of the houses being considered
​
2) Their average sale prices are above $191k.
Following RL, there is a 36% drop in price with homes in zone RH. Homes in zones RH, RM, and C have sale prices that average < $131k.

REMODEL:
With a p-value >.05, we can determine that the remodeling of a house does not have a statistical significance on sales prices.

