Python

Preprocessing Text Data for Machine Learning

Unstructured text data requires unique steps to preprocess in order to prepare it for machine learning. This article walks through some of those steps including tokenization, stopwords, removing punctuation, lemmatization, stemming, and vectorization.

Read
Python

Filling Gaps in Time Series Data

Time Series data does not always come perfectly clean. Some days may have gaps and missing values. Machine learning models may require no data gaps, and you will need to fill missing values as part of the data analysis and cleaning process. This article walks through how to identify and fill those gaps using the pandas resample method.

Read
Python

Automated Exploratory Data Analysis

Exploratory data analysis is a critical initial step to building a machine learning model. Better understanding your data can make discovering outliers, feature engineering, and ultimately modeling more effective. Some pieces of exploratory data analysis such as reviewing feature histograms and missing values can be automated. This article walks through an open source library I created that runs some basic automated EDA processes.

Read
Python

Favorite Places to Find Datasets

Interesting datasets can make personal machine learning projects more fun and exciting. Here are some of my favorite places to go looking for datasets to hone my data science and ML skills.

Read