# Lean Six Sigma Basic Statistics

Sign up for our weekly newsletter for updates, articles and free giveaways of case studies, templates and training materials every issue! Simply enter your e-mail on the right

Common basic statistics:

Miles per gallon (liter); mpg (mpl)

Median home prices

Consumer price index

Inflation rate

Stock market average

Airline on-time-arrival rate

Statistics are computed using data.

Statistics summarize the data and help us to predict future performance.

We use basic statistics every day. The list above captures just a few of the many places that simple statistics are used to summarize data. Throughout your project, we will use data to focus our efforts. Our ultimate goal is to find the Y =f(x1, x2, x3, ….xn) relationship, which gives us the power of prediction rather than detection.

### Basic Statistics

Serve as a means to analyze data collected in the Measure phase.

Allow us to numerically describe the data that characterizes our process’ Xs and Ys.

Use past process and performance data to make inferences about the future.

Serve as a foundation for advanced statistical problem-solving methodologies.

Are a concept that creates a universal language based on numerical facts rather than intuition.

The fundamental objective in any Six Sigma project is to find the relationship between outputs and inputs, Y= f(x1, x2, x3, … , xn). Basic statistics allows us to quantify the behavior of the Xs and Ys. By establishing the relationship of each input to the outputs we are interested in improving, we then can predict the response given a set of input conditions. This is a marked change from counting the scrap at the end of the day.

### Data Visualisation

Before any statistical tools are applied, visually display and look at your data.

A histogram allows us to look at how the data is distributed across our Y scale of measure.

A histogram is an easy way to present data visually. Although you will learn many advanced statistical methods in this class, it is always a good idea to look at the data visually before using basic statistics.

Measures of Central Tendency

In addition to counting occurrences and graphing the results, we can describe processes in terms of central tendency and dispersion.

Measures of Central Tendency

sMean (m, Xbar)—The arithmetic average of a set of values

sUses the quantitative value of each data point

sIs strongly influenced by extreme values

sMedian (M)—The number that reflects the middle of a set of values

sIs the 50th percentile

sIs identified as the middle number after all the values are sorted from high to low

sIs not affected by extreme values

sMode—The most frequently occurring
value in a data set

Distribution shape affects the location of our central tendency basic statistics. Mean values are influenced by extreme values and thus move in the direction of the long tails in the skewed distributions shown above. In these extreme cases, the median is a better measure of the center or balance point of the distribution. Home sale prices would be a good example of a positive skewed distribution where median is used over mean as the statistic of choice.

Also notice that in the normal or bell shaped curve, mean = median = mode.

Any business would be simple to run if all outcomes were perfectly predictable. Since variation exists, we need a way to quantify it. These “statistics” help us to describe the amount of variation in our product or process.

Range, variance, and standard deviation are all measures of process/product variation. They describe the dispersion of the data. The job of your project is to minimize these “statistics.”

Note that outliers will greatly influence these computed basic statistics.