Types of Feature Transformation
Why do we need to do transformation of the features?
Scaling down the feature is very much important because if we don’t scale down the values the value of the Euclidian distance Gradient descent will also increases.
Also, If the values are large in any of the feature our model will give the more priority to the large value. Because large value means more importance for the model.
For every model building we don’t need to do transformation but the model which uses the concepts like Euclidian distance (K means) and Gradient descent (Linear Regression) we need to do transformation.
In Deep learning also, you need to do transformation as internally they use the concept of gradient descent.
Types of Transformation
- Standardization
- Scaling to Minimum and Maximum
- Scaling to Median and Quartiles
- Gaussian Transformation
a. Logarithmic Transformation
b. Reciprocal Transformation
c. Square root Transformation
d. Exponential Transformation
e. Box Cox Transformation
Standardization
- All the variables or features are brought to similar scale.
- Standardization also means centering the values to zero.
- Standardization is done using the formula of z score
Standardization |
- Standardization is widely used for machine learning algorithms.
- Standardization can be very easily implemented using sklearn StandardScaler library.
Min Max Scaling
- Widely used in deep learning like CNN (Convolutional Neural Network).
- Min Max Scaling scale the values between 0 to 1.
- Min max Scaler Formula
- Min Max Scaling can be very easily implemented using sklearn MinMaxScaler library.
Robust Scaler
It is used to scale the feature to median and quartiles.
Scaling using median and quantiles consists of subtracting the median to all the observations and then dividing by the interquartile difference.
Interquartile difference is the difference between the 75th and 25th quartile.
IQR = 75th - 25th quartile.
Robust Scaler Formula
Robust Scaler can be very easily implemented using sklearn RobustScaler library.
Gaussian Transformation
- Gaussian Transformation is used to covert data into normal distribution if it is not normally distributed.
- It is very much important that our data is normally distributed so that our model performs well and gives better accuracy.
- To check whether data is normally distributed we can use Q-Q plot.
- Plotted different types of gaussian transformations in the code below.
Code for different types of feature transformation GitHub Link