Feature Engineering For Machine Learning


What is feature engineering?

Feature engineering is the act of extracting features from raw data and transforming them into formats that are suitable for the machine learn‐ ing model.

If there are too many features, or if most of them are irrelevant, then the model will be more expen‐ sive and tricky to train.

What means good features?

Good features should not only represent salient aspects of the data, but also conform to the assumptions of the model.

\rightarrow Transformations are often necessary.

How to check?

  1. whether the magnitude matters
  2. consider the scale of the features
    1. min-max scaling
    2. standardization(but don’t center sparse data, computational burden)
    3. 2\ell^{2} normalization
  3. consider the distribution of numeric features. linear function assumes the gaussian distribution(use log transforms)
  4. data integration

Theory

  1. Feature Engineering
  2. Discover Feature Engineering, How to Engineer Features and How to Get Good at It
  3. カテゴリカル変数のEncoding手法のまとめ
  4. Feature Engineering: Data scientist’s Secret Sauce !

Practice

  1. DataFrameで特徴量作るのめんどくさ過ぎる。。featuretoolsを使って自動生成したろ
  2. Automated Feature Engineering Basics
  3. Introduction to Manual Feature Engineering
  4. Introduction to Manual Feature Engineering P2
  5. KaggleのWinner solutionにもなった「K近傍を用いた特徴量抽出」のPython実装
  6. カテゴリカル変数のEncoding手法について
  7. fast_feng
  8. 学習アルゴリズム以外のscikit-learn便利機能と連携ライブラリ
  9. 遺伝的プログラミングによる特徴量生成
  10. 遺伝的プログラミングによる特徴量生成でLightGBMの精度向上

Course

  1. How to Win a Data Science Competition: Learn from Top Kagglers

Book

  1. Feature Engineering for Machine Learning

Welcome to share or comment on this post:

Table of Contents