Machine learning interview questions


1 . What is outliers , How to locate them , what are the  treatment methods 



Outliear -  A value that "lies outside " (is much smaller or longer than ) most of the value in set of data 

Example -  23, 29, 32, 3, 27, 83, 28 

                    both 3 and 83 are outlier

There are several methods for locating outliers in a dataset including

Visualization - ploting the data on a graph and visually inspecting it can often reveal outliers 

Z- Scores - A z-score is a measure of how many standard deviation a data point with a z-score greater than or less than -3 are considered outliers 

Interquartile Range - The IQR is the difference between the 75th and 25th percentiles of a dataset . Data points that are more than 1.5 times the IQR below the first quartile or above the third quatile are considered outliers 

Mahalanobis distance - This methods calculate the distance of each data point from the mean of the dataset taking into account the convariance between variables 

Once outliers have been located there are several way to handle them including

Removing Them - outliers can be removed from the dataset if it is detemined that they are error

Replacing Them - if the outliers are genuine observation but their values are considered extreme they can be replaced with more reasonable values

Analysing them separately - if the outliers are considered important they can be analyesd  separately from the rest of the dataset

Using Robust Statistics - Robust statistics are designed to be more resistant to ouliers and can be used to analye the data insted of traditional statistics










Comments

Popular posts from this blog

Data Analysis