Posts

Showing posts from January, 2018

Dimension reduction with Python

Image
Introduction Sometimes, no matter how good an algorithm is, it just doesn’t work. Or worse, it doesn’t pick up anything. Data can be quite noisy, and sometimes it’s just about impossible to figure out what went wrong. It's worth noticing that the most interesting machine learning challenges always involve some sort of feature engineering, where we try to use our insight into the problem to carefully craft additional features, that the machine learner hopefully picks up. Garbage in, garbage out, that's what we know from real life. Not surprisingly this pattern also holds true, when applying machine learning methods to training data. To tackle the issue, we will go in the opposite direction with dimensionality reduction involving cutting away features that are irrelevant or redundant. There are several good reasons to trim down the dimensions as much as possible: Most of the models hate high-dimensional spaces and superfluous features, which often irritate or mislead th