Table of Contents
Dependency: A variable whose value depends on the value assigned to another variable (independent variable). Correlation: The relationship between two or more variables is considered as correlation. The correlation coefficient always assumes linear relationship regardless of whether that assumption is correct or not.
How is a dependent variable linked to independent variable?
The independent variable is the cause. Its value is independent of other variables in your study. The dependent variable is the effect. Its value depends on changes in the independent variable.
What is the impact of correlated independent variables?
When independent variables are highly correlated, change in one variable would cause change to another and so the model results fluctuate significantly. The model results will be unstable and vary a lot given a small change in the data or model.
Why are dependent and independent variables not applicable?
Descriptive studies only describe the current state of a variable, so there are no presumed cause or effects, therefore no independent and dependent variables. Since neither variable in a correlational design is manipulated, it is impossible to determine which is the cause and which is the effect.
The only reason to remove highly correlated features is storage and speed concerns. Other than that, what matters about features is whether they contribute to prediction, and whether their data quality is sufficient.
How do you deal with correlated features in machine learning?
There are multiple ways to deal with this problem. The easiest way is to delete or eliminate one of the perfectly correlated features. Another way is to use a dimension reduction algorithm such as Principle Component Analysis (PCA).