Machine learning interview questions
1 . What is outliers , How to locate them , what are the treatment methods
Outliear - A value that "lies outside " (is much smaller or longer than ) most of the value in set of data
Example - 23, 29, 32, 3, 27, 83, 28
both 3 and 83 are outlier
There are several methods for locating outliers in a dataset including
Visualization - ploting the data on a graph and visually inspecting it can often reveal outliers
Z- Scores - A z-score is a measure of how many standard deviation a data point with a z-score greater than or less than -3 are considered outliers
Interquartile Range - The IQR is the difference between the 75th and 25th percentiles of a dataset . Data points that are more than 1.5 times the IQR below the first quartile or above the third quatile are considered outliers
Mahalanobis distance - This methods calculate the distance of each data point from the mean of the dataset taking into account the convariance between variables
Once outliers have been located there are several way to handle them including
Removing Them - outliers can be removed from the dataset if it is detemined that they are error
Replacing Them - if the outliers are genuine observation but their values are considered extreme they can be replaced with more reasonable values
Analysing them separately - if the outliers are considered important they can be analyesd separately from the rest of the dataset
Using Robust Statistics - Robust statistics are designed to be more resistant to ouliers and can be used to analye the data insted of traditional statistics
Comments
Post a Comment