When basic addition leads to exponential gains: Feature Engineering for Machine Learning.
Typically, once you join your first Kaggle Competition, you may hear the term “Feature Engineering” tossed around a lot in the discussions, and you may be confused as to what it means. According to Wikipedia:
Feature engineering is the process of using domain knowledge to extract features (characteristics, properties, attributes) from raw data. Feature engineering has been employed in Kaggle competitions and machine learning projects. ~ Wikipedia
If that didn’t make sense, don’t worry. In this article, I’m going to explain it, and give an example of some of the feature engineering which I did to achieve one of the best metadata-only CV scores.
What is feature engineering?
All models have 3 basic components:
- Input (what you feed the model for training)
- The model itself
- Output (the result)
While we can train the model to optimize the second and third components, we can’t exactly train the input. Most of the times, we take our input from a pre-made dataset, clean it up, and feed it into our model. However, this underutilizes the input!
Enter feature engineering: Feature engineering is simply mixing and matching some of the data you get to form new data. This can be helpful for the model, as it can bring to light new “features” which it wouldn’t have access to before.
These new features can mean the difference for your model’s performace. They essentially allow the model to “skip” a few layers and get features earlier than what it could have trained by itself. In other words, you’re giving the model more time to work with this feature than it would have otherwise had (and sometimes you give it features it wouldn’t even get on its own).
How exactly do I engineer features?
One of the ways that I like to “engineer” features is to play around with the data (Exploratory Data Analysis — EDA), and figure out what has a major impact on the results. For me, EDA truly shines when given lots of quantitative data (numbers, booleans, etc), and less qualitative data (text, images, etc).
Feature engineering works best in large amounts of data. The more data, the more likely you are to get significant results out of it.
For example, maybe you choose to add two channels together to create a new channel. Or, you multiply two channels. Either way, you should check to see what impact the new channel has on the results. The heavier the impact, the better.
In general, there are 4 main ways to “engineer” features:
- 1. Binning
- 2. Transforming
- 3. Splitting
- 4. Combining
You can read more about the ways to “engineer” features here.
Can I have an example?
Sure!
Lets take a look at some of the feature engineering I did for the “PetFinder.my” kaggle competition.
Here (above), I add 2 channels and subtract a channel in order to get an “Overall Focus Channel.” In comparison to the original channels, this channel has a significant impact on the result.
Here (above) is another example of a feature which I “engineered.” I combined multiple differend factors which contributed to the overall neatness of the picture. After doing this, I was able to come up with a feature called “good looks” which had a significant impact on the pawpularity score.
The results of this simple feature engineering were pretty staggering. From a CV score of 23.35
they were able to take it all the way down to 19.81
(in this case lower is better). My public lb score ended up being 19.15
This score was one of the best scores for metadata-only submissions. I’ll update this with more information once the contest ends and private leaderboard scores are available!
In conclusion, feature engineering can be a very powerful strategy for increasing your model’s accuracy. Hopefully, you were able to learn a thing or two from this blog. Make sure to follow and clap!