Buy Me A Coffee

Descriptive Statistics

Size (n)

This is how many values are in your data set, whether sample or population.

Note: data set must contain at least 30 values to be considered statistically significant.

Problems with a smaller (n<30) data set include:



Minimum (min)

When ordering a data set from least to greatest, the minimum is the least value.



Maximum (max)

When ordering a data set from least to greatest, the minimum is the least value.



Range (R)

Range is the difference between the greatest and least values:



Arithmetic Mean (sample: , population: μ)

The arithmetic mean, better known as the average, is the sum of the data points divided by the size. The basic formula is the same for both the sample and population. The arithmetic mean is best used for regular numbers (e.g., test scores or temperatures).

Note 1: if the mean is for population, use μ; otherwise use for sample. The formula is the same for both.

Note 2: if mean > median, there are outlying values at the upper end. If mean < median, there are outlying values at the lower end.



Geometric Mean (GM)

The geometric mean finds the middle value by multiplying the data points and finding the square root of the product instead of adding them and dividing by the number of data points. The geometric mean is best used for anything that compounds or multiplies over time (e.g., growth rates or ratios).



Median (x˜)

Median is the sweet middle value spot between the greatest and least values.

There are two possibilities. If there's an odd number of values:

if there is an even number of values:



Mode (Z)

Mode is the value(s) that appears most often in a data set.

Note: there is an interesting dilemma that says if all values appear the same amount of times (e.g. once, twice), either all values are the mode or there is no mode.



Sum (sum, Σ)

Sum is the total of all values:



Sum of Squares (SS)

The sum of squares quantifies the spread of data points against the mean and is used in regression analysis.

The basic formula is the same for both the sample and population. For clarity:

For a sample:

For a population:



Standard Deviation (σ)

Standard deviation measures how far from the mean a group of numbers is. The formulas for both sample and population are slightly different. The sample formula has an adjustment in the denominator to account for the increased uncertainty involved in a data subset. For clarity:

The formula for a sample looks like:

While the formula for a population looks like:



Variance (σ2)

Variance is the average of the squared differences of each data point from the mean. In other words, variance is the square of the standard deviation. Again, the formulas for both sample and population are slightly different. The sample formula has an adjustment in the denominator to account for the increased uncertainty involved in a data subset. For clarity:

The formula for the sample is:

While the formula for a population is:



Coefficient of Variation (CV)

The coefficient of variation shows how much variability there is in relation to the mean. Here, the basic formula is the same for sample and population. For clarity:

Formula for a sample:

While the formula for a population is:



Relative Standard Deviation (RSD)

The relative standard deviation is simply a percentage expression of the coefficient of variation. The basic formula is the same for both a sample and a population.

For preciseness, the formulas are illustrated as follows:

For a sample:

For a population:

Quartiles (Q1, Q2, Q3)

Quartiles are a trio of values that split the dataset into 25th, 50th, and 75th percentiles. Each quartile has its own formula.

First quartile:

Second quartile:

Third quartile:



Interquartile Range (IQR)

The interquartile range is between Q1 and Q3

Midrange (MR)

The midrange is considered a type of average (mean) using only the minimum and maximum values of a data set instead of all values. The formula is:



Outliers

Mean Absolute Deviation (MAD)

Mean absolute deviation is the sum of absolute values of data spread from the mean.

The basic formula is the same for both sample and population. For clarity:

The sample is as follows:

The population is as follows:



Root Mean Square (RMS)

Root mean square shows the magnitude of a data set.

The formula is as follows:



Standard Error of the Mean (SE)

Standard error measures how much the mean might vary from the true population value ("wiggle room").

The basic formula is the same for both sample and population. For clarity:

The sample:

The population:



Skewness (γ1)

Skewness measures the asymmetry of a data set. Unlike a normal distribution, a skewed distribution will either have a longer left tail (negative skew) or a longer right tail (positive skew).

The sample formula is:

The population formula is:



Kurtosis (β2)

Kurtosis describes the extremeness of a distribution's outliers on either tail end. The greater the value, the more outliers there are from the mean.

The formula for a sample is:

The formula for a population is:



Kurtosis Excess (α4)

Kurtosis excess describes the height of a distribution's outliers on either tail end. The greater the value, the more outliers there are from the mean.

The formula for a sample is:

The formula for a population is: