Key Concepts & Terminology 📚🔍

Last updated: 2026-01-01

Applies to version: 1.0+

Reading time: ~8 min

Quick Summary / TL;DR

Statistics is a language. And just like you do when start learning a new language, 'ya gotta start somewhere bruh! Where? Right from the beginning! What the fuck is wrong with you? 🤦‍♂️

🤧And so... so that you don't suck in life, or at least, make your progress to sucking less, this guide is going to explain to your dumbass how to start speaking Statistics, so you can then at some point be cool with it and say До свидания: Das Vedanya 'ya 'lil biatch!🔫☠️

So, without further ado, let's begin.

Datapoints & Datasets
Types of Data
Random Variables and Random Samples
Population vs Sample
Measures of Central Tendency
Measures of Dispersion
Measures of Shape
Graphical Representations
Summary

Datapoints & Datasets 📋

🧬 Where we start with Statistics...

Datapoint 🧬

A datapoint is a SINGLE measurement or observation

Example: [78] (one test score)

Dataset 🧬🧬🧬🧬🧬

A Dataset is a COLLECTION of datapoints

This is a dataset: [78 85 90 72 88 92 80 85 76 82 84]

⚡and no matter where you are, you ALWAYS START BY ORDERING A DATASET!⚡

Always from min to max. If you are in school bro and your moron teacher asks you, do it by hand, but if using software it does it for you.

Unordered dataset: [78 85 90 72 88 92 80 85 76 82 84]

Ordered dataset: [72 76 78 80 82 84 85 85 88 90 92]

Is this a dataset?: [72]. Yeps, it just so happens to be a dataset with one single datapoint.

Types of Data 📊

Now, onward to types of data. All data is different bruh! Blonde, Brunette, Black, Asian, I mean bruh...

Back to Statistics. Here, you have 2 types of data: Qualitative and Quantitative!

Qualitative (also called Categorical) Data

Nominal: No rank/order - just names or categories

Examples: "John", "Mary", "blue", "red"
Favorite colors, names, brands

Ordinal: Has rank/order but intervals aren't equal

Examples: "1st place", "2nd place", "like", "dislike"
Survey ratings (poor/fair/good/excellent)

Quantitative (also called Numerical) Data

Discrete: Countable, whole numbers

Examples: number of students, marbles
You can count it: 1, 2, 3...

Continuous: Measurable, can have decimals

Examples: height, weight, temperature
Interval: No true zero (Celsius, Fahrenheit). 0 Celsius is different from 0 Fahrenheit
Ratio: Ratio has a true zero: like height (you measure 0, get it?) or Kelvin which is an absolute measurement of temperature: in Kelvin, there are no negative values. The lowest you can go is 0, just like height, get it? 0.

💡 Memory Tip: if you are dyslexic like me: Remember "NOIR" - Nominal, Ordinal, Interval, Ratio

Random Samples and Random Variables 📊

Now...Data goes into a storage place called Variable.

Constant vs Variable

//5 is a constant - A constant has a fixed value that doesn't change

constant != from variable - A constant is different from a variable

//V is a variable - A variable can take different values

Random Variables

If V is a random variable it can take in any value from a sample with the same likelihood

Examples (even some we be talking about later...don't worry bruh...For now just think: data goes into a Storage place. Let's call our Variable V):

V = 5 (number assignment)
V = 5a + b (expression assignment )
V ~ Bernoulli(0.5) (function assignement. The '~' means V 'follows' the probability distribution (more about this later), Bernoulli, that takes on the argument p = 0.5)

Random Samples

//S = {6 Red Marbles, 4 Blue Marbles } - Example population/set

Random sample - this means that from a Population we gathered a data set of points, each with the same likelihood

If V is a random variable, it can take in any value from S with the same likelihood of the Radom Sample

This concept is called: I.I.D.-> Independent Identically Distributed Datapoints in the dataset of which if you take any of them randomly you get a same probability for all.

It just means: no favorites bruh...any datapoint is equally likely to gettig picked.

Population vs Sample 🎯

In Statistics you are always dealing from the Sample side or the Population side. You need to know who you're working with:

If you are in exams, you need to pay attention to the variable letters cause they tell you which one your teacher is referring to.

Usually its the sample you are working with, but "usually" ain't gonna cut it, as sometimes you actually deal with the whole population.

So, bruh: how do you know which one is which? Notice the following table and the variable names.

Population	↔	Sample
Parameters	Description	Statistics
μ	mean	x̄
σ	standard deviation	S
σ²	variance	S²
p	proportion	p̂
N	size	n

Note how the description of what you are dealing with is the same but has different letters. It's to tell your dumbass teacher which side you are referring to: the population side or the sample side.

about the "description": the technical term is "descriptive measures". So if you are referring to a descriptive measure and you are referring to the popullation you say parameters. If you are referring to the sample side you call it statistics. I know bruh... "statistics" terrible name, but that is what we are working with over here...

So here are some examples for your dumbass to catch up.

🎯 Example Time:

Population: N = Every single person in your college campus: 20.000

Sample: n = 50 people in your class that drank your coffee.

The Inference you wanna make: So If 80% of your class liked your coffee (p̂ = 80%), 40/50 people, you want to infer if the total 20.000 in your college campus (p) will like it too.☕

In Descriptive Statistics, like I mentioned before, we describe data and show it in graphs. So Im 'bout to explain to 'ya dumbass how we describe the data.

We do it in 3 ways using 3 different measure types:

Measures of Central Tendency
Measures of Dispersion
Measures of Shapes

Measures of Central Tendency 🎯

Mean (Average) 📏

You know this one bruh: The sum of all values divided by the count

It's called the Arithmetic Mean: (3 + 6 + 9 + 12) ÷ 4 = 7.50

When to use: For symmetrical data (heights, test scores)

What you might now know is that there are more types of means, for bros in different fields that require some specialized..eheeem...means bruh!

Other types:

Harmonic Mean: For rates and ratios like average speed
Geometric Mean: For growth rates like investments (finance bro alert!) and populations (the bacteria people...)

The Three Means Compared

Arithmetic Mean (Average):

Sum of values divided by count

\frac{3 + 6 + 9 + 12}{4} = 7.50

Use for: Symmetrical data (heights, test scores, temperatures)

Geometric Mean:

Nth root of the product of N values

\sqrt[4]{3 \times 6 \times 9 \times 12} = \sqrt[4]{1944} \approx 6.64

Use for: Growth rates (investments, populations), ratios, multiplicative processes

Harmonic Mean:

Reciprocal of the arithmetic mean of reciprocals

\frac{4}{\frac{1}{3} + \frac{1}{6} + \frac{1}{9} + \frac{1}{12}} = \frac{4}{0.333 + 0.167 + 0.111 + 0.083} \approx 5.76

Use for: Rates and ratios (average speed, parallel resistances, P/E ratios)

Key Insight: For the same positive values: Harmonic Mean ≤ Geometric Mean ≤ Arithmetic Mean. Here: 5.76 ≤ 6.64 ≤ 7.50

Median (Middle Value) 📍

Median is the Karate chop. It's the value that separates your dataset "log" in half

Bro Miyagi Splitting Log in Half. Doing the Median on Sato's Ass. Bruh...EPIC! AYAAAAAAAAAH🥋💪

Stats Example Dataset: X = [72 76 78 80 82 84 85 85 88 90 92]

Split view: [72 76 78 80 82] ⬅️ [84] ➡️ [85 85 88 90 92]

Median (Q2) = 84 (the middle value)

Mode (Most Frequent) 🔢

The value that appears most often

Example: [72 76 78 80 82 84 85 85 88 90 92]

Mode = 85 (appears twice, others appear once)

Types:

Unimodal: One mode ([2,4,4,5])
Bimodal: Two modes ([2,4,4,5,5,6])
Multimodal: More than 2 modes. ([2,4,4,5,5,6,6,7])

Now note the folllowing: If you have ([2,4,4,5,5,6,6,7, 7,7]) the mode is 7 because it is the one with highest frequency, regardless that you have others that repeat too. Its always about the highest one, or highest ones if they repeat but need to have the highest count. So, in our case the highest one is 7.

Measures of Dispersion (Spread) 📈

If Central tendency tells you WHERE the data "tends" to, dispersion tells you HOW SPREAD OUT it is. I know bruh...spread... i know bruh...

Range or Amplitude

It's the Maximum value of the dataset - Minimum value of the dataset

Example: [72 76 78 80 82 84 85 85 88 90 92]

Range = 92 - 72 = 20

See why you need to have the data from the smallest to the largest? How are you gonna tell me the amplitude if the data is not ordered bruh!...

Quartiles & IQR

Quartiles divide the dataset into 4 equal parts so you end up with 3 quartiles: 25 50 75; same as a Quarter (for you Finance Bros) divides the year in 4 x 3 month parts (4 x 3 = 12 months) so you end up: Q1, Q2, Q3 and Q4 :

Q1 (25th percentile): 25% of data is below this
Q2 (50th percentile, Median): 50% of data is below this
Q3 (75th percentile): 75% of data is below this

IQR (Interquartile Range) is exactly that: the range between the quartiles, namely: Q3 (75%) - Q1 (25%) = Which gives you exactly the middle 50% of the data. And if you are paying attention, 50% of the data is right in the middle, hence we have a special case here where Q2 = Median. Cause the Bro Median divides the data at 50%. Bruh!

Quartiles or Median are addressing specific ones. The key word here is Percentiles. From now one you dont call everything Quartiles. It's Percentiles as you can do them in any flavor: quintiles, the 99.97% percentile and so on.

Now, that you know abit more on percentiles, and likely you could care less if you are using computers right from the getgo, but if not, and doing calculations by hand or using your computer tools as sidekicks, I'm 'bout to explain to 'ya dumbass how to calculate any percentile.

How to Calculate Percentiles...like a BOSS!

Calculating the 80th and 90th Percentiles (n = 20)

Sorted dataset (already in order):

Index	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
Value	3	7	8	12	15	18	21	24	27	30	33	36	39	42	45	48	51	54	57	60

80th Percentile – Step by Step

n = number of values = 20
Formula for position: (percentile / 100) × (n + 1)
80th percentile position = (80 / 100) × (20 + 1) = 0.8 × 21 = 16.8
Split: whole number k = 16, fraction f = 0.8
Look at the values at positions 16 and 17 (1-based indexing):
→ Index 16 = 48
→ Index 17 = 51
Because there is a fraction (f = 0.8), we interpolate:
value = value at 16 + f × (value at 17 – value at 16)
= 48 + 0.8 × (51 – 48)
= 48 + 0.8 × 3
= 48 + 2.4
= 50.4

→ 80th percentile = 50.4
(80% of the values are ≤ 50.4)

90th Percentile – Step by Step

n = number of values = 20
Formula for position: (percentile / 100) × (n + 1)
90th percentile position = (90 / 100) × (20 + 1) = 0.9 × 21 = 18.9
Split: whole number k = 18, fraction f = 0.9
Look at the values at positions 18 and 19 (1-based indexing):
→ Index 18 = 54
→ Index 19 = 57
Because there is a fraction (f = 0.9), we interpolate:
value = value at 18 + f × (value at 19 – value at 18)
= 54 + 0.9 × (57 – 54)
= 54 + 0.9 × 3
= 54 + 2.7
= 56.7

→ 90th percentile = 56.7
(90% of the values are ≤ 56.7)

So again...

Percentiles 📊

The value below which a given percentage of observations fall:

To find the 90th percentile:

1. Order dataset: [1, 3, 5, 7, 9]

2. Index = (P/100) × (n-1) + 1 = (90/100) × 4 + 1 = 4.6

3. Value = 7 + (9-7) × 0.6 = 8.2

→ This means that: 90% of values of this dataset are below 8.2

As for Quartiles

Odd number of observations (n = 7)

Index	1	2	3	4	5	6	7
Value	1	4	5	6	7	12	45

Q1 (first quartile):
Position = (n + 1)/4 = (7 + 1)/4 = 2 → value = 4

Q2 (median):
Position = (n + 1)/2 = (7 + 1)/2 = 4 → value = 6

Q3 (third quartile):
Position = 3(n + 1)/4 = 3×8/4 = 6 → value = 12

Even number of observations (n = 12)

Index	1	2	3	4	5	6	7	8	9	10	11	12
Value	1	2	2	3	3	4	4	5	5	6	7	9

Q1 (first quartile):
Position = (n + 1)/4 = (12 + 1)/4 = 3.25 → value ≈ 2 (3rd position)

Q2 (median):
Position = (n/2) and (n/2 + 1) → 6th & 7th values
(4 + 4)/2 = 4

Q3 (third quartile):
Position = 3(n + 1)/4 = 3×13/4 = 9.75 → value ≈ 6 (10th position)

Outliers and Tukey's Fences 🦃🚧

Outliers are values that lie outside the typical range. Think: "Out"side lying data: hence: out-liers:

Say Male's pee-pee extra small and yours is MASSIVE before even reaching folded in half status. Yours is an outlier bruh!

same thing if average is a certain dimension and your eheem, "neighboor's" is smaller than most. He is an outlier too cause his pee-pee much smaller than most people...poor dude...poor poor dude.

Anyways...enough sausage fest. for fuck sakes! 🍆🍑😩👉👌💦

So back to Statistics. To find the outliers there was this famous Statistician with a funny name like a Turkey🦃. Bro figured out that all values outside the lower fence and upper fence are the outliers. Here are the formulas:

Lower Fence = Q1 - 1.5 × IQR

Upper Fence = Q3 + 1.5 × IQR

Example: With Q1=4, Q3=12, IQR=8

Lower Fence = 4 - 12 = -8

Upper Fence = 12 + 12 = 24

Value 45 is an outlier (above 24)!

Interactive Box Plot: See How Outliers Work!

Adjust the sliders to see how changing Q1, Q3, and outlier values affects the box plot visualization:

Adjust Values

Q1 (First Quartile) 4

Q3 (Third Quartile) 12

Outlier Value 45

Dataset for this example:
[1, 3, 4, 4, 5, 6, 8, 12, 13, 14, 15, 18, 20, 45]

Live Calculation

IQR = Q3 - Q1 = 8

Lower Fence = Q1 - 1.5 × IQR = -8

Upper Fence = Q3 + 1.5 × IQR = 24

Outlier check: Value 45 is an outlier (above 24)!

Interquartile Range (IQR)

Data Range (within fences)

Outlier

Fences (1.5 × IQR from Q1/Q3)

And with this you already put together your first Stats Bro Graph. its called: Box-Plot...Bruh...Proud of 'ya.

Note how it is formed with the quartiles, including the median and it has the Turkey Bro 🦃🚧 fences and the outliers outside as little red marbles 🔴...That's how they do it bruh! Red Marbles...for fuck sakes!.🤦🏻‍♀️

Variance & Standard Deviation: Measuring Your Consistency

As a gym bro, you can say your average bench over a certain period. You get the max lifts you did over say a period of 5 days and get the average.

But Statistics also allows you to say you much you deviated from your average bench in each of the sessions. Say one above, other below and by how much.

The Process: From Data to Understanding

Example: Your last 5 bench press sets: 100kg, 102kg, 98kg, 101kg, 99kg

Step 1: The Average (Mean)

(100 + 102 + 98 + 101 + 99) / 5 = 100 kg

You could stop here, so just saying one day +2, other day -1 and so on. But if you have many days it would start becoming a problem, so you would present all these in a measurement called variance or spread, which is a weird measurement, as you are about to see.

Step 2: Why We Calculate Variance

To measure the "variance" or "spread," we find how far each lift is from the mean:

Differences from 100kg: 0, +2, -2, +1, -1
We square these differences. This does two things:
- Eliminates negative signs (a -2kg difference is still a +2kg variation).
- Gives more weight to larger deviations (a 5kg off-day is more significant than a 1kg off-day).
Squared: 0, 4, 4, 1, 1
Average of the squares: (0+4+4+1+1) / 5 = 2

Sample Variance s² = 2 Kg² "kilogram-squared"?... What the hell does that even mean bruh?

The "squared units" are a mathematical middle step. They aren't meant to be intuitive on their own.

Step 3: Standard Deviation - The Useful Result

so we take the square root of variance to undo the squaring and return to the original units (kg).

We have not accomplished 2 things: 1. we have turned the mean to 0; 2. We are standardizing the spread data into a new unit:

Standard Deviation = √Variance = √2 ≈ 1.4 kg

Interpretation: The mean is now 0, and your lifts typically vary by about 1.4 kg above or below the mean.

So instead of referring to: 1.4 Kg chunks, you now say 1 SD from the mean, or 2 SD from the mean (2 x 1.4kg) or negative 3 standard deviations from the mean, get the idea?

What Standard Deviation Reveals

High Standard Deviation

Lifts: 80kg, 120kg, 70kg, 130kg, 100kg

Average = 100kg | SD ≈ 24kg

Performance is unpredictable. This level of variation makes it difficult to gauge true progress or plan effective training.

Low Standard Deviation

Lifts: 98kg, 101kg, 100kg, 99kg, 102kg

Average = 100kg | SD ≈ 1.6kg

Performance is controlled and repeatable. This consistency allows for reliable progression and indicates good technique and recovery management.

Which one is better? None, they are different. It just means one has more spread than the other. You could argue that the gym bro that is more reliable, if you made a bet on any given day on how much he was gonna lift, you would be close to his mean since bro doesn't deviate much from it. As for the other dude its more of a wild card. Ine day up +24 the other who knows. Get it? Spread of the lifts around the mean of what the bro did around those days.

The Empirical Rule (68-95-99.7)

Later on, we will be talking about a concept called distribution. One of them is called normal. For data that follows a normal distribution(more on this later. For now, just think "normal" as in lift data from a big representative sample of gym bros in the same category), the standard deviation of this big sample, creates predictable known ranges so you can assess what you can do based off of it.

Say if your bench average is 100kg with a standard deviation of 10kg:

68% of your lifts will be between 90kg and 110kg (100 ± 10).
95% of your lifts will be between 80kg and 120kg (100 ± 20).
99.7% of your lifts will be between 70kg and 130kg (100 ± 30).

What does this mean?: The whole point of estatistics is to estimate what's gonna happen. And with enough data you give a model, say your height, weight, consistency and so on, it will estimate where will your results lie in it.

Measures of Shape 📐

This measure is about how does the data LOOK like. Fat 'ho, Skinny 'ho...

But in Statistics, we the 'ho': a.k.a. the "data", in 2 ways:

1. Skewness
2. Kurtosis

Skewness ⚖️

How symmetrical or lopsided your data is. Think skiing ⛷️

Negative Skew. It skiis to the left: Tail on left(x-values): Mean < Median < Mode
No Skew (Normal). Stays at the top, right in the middle.: Perfectly symmetrical(x-values). Mean = Median = Mode
Positive Skew. It skiis to the right.: Tail on right(x-values): Mean > Median > Mode

Gym Bro Examples:

Red graph (Negatively Skewed): Competitive powerlifting gym - most lift heavy weights
Blue graph (Normal/Centered): Typical commercial gym - most lift moderate weights
Green graph (Positively Skewed): Beginner fitness class - most lift light weights

💡 Memory Tip for dyslexic morons like you and me: The tail tells you the skew type: where it "skiis" to.

Key insight: The "peak" of each curve (the highest point) tells you where the majority of people fall. Look at where the curve is tallest - that's the weight range most people in that group can lift.

Kurtosis 📊

Kurtosis from the greek "curvature". We want to know how "peaked" or "flat" the distribution 'ho is:

Leptokurtic: High peak
Mesokurtic: In the middle, close to normal.
Platykurtic: Low peak

💡 Memory Tip: "Lepto" = leap = high peak, "Platy" = "plate" = low peak

⚠️WARNING!⚠️ GEEK ALERT: When talking about kurtosis, college morons talk a lot about fat tails, thin tails. I never understood this. Fat how, lol...

So I'll "translate" this for you: fat means long, thin means short.

So: Leptokurtic has longer "fat", tails and Platykurtic has "thinner" tails. For fuck sakes...🤦‍♂️

Type	Tails vs. Normal	What it Means for Lifters (in kg)	Likelihood of Seeing an Extreme Lift
Leptokurtic	Fatter Tails	More extreme lifts. You'll see more people failing with just the bar (20kg) AND more monsters lifting 300kg+.	HIGHER
Mesokurtic	Normal Tails	Lift distribution is predictable. A 120kg lift is common, a 280kg lift is rare but expected.	BASELINE
Platykurtic	Thinner Tails	Fewer extreme lifts. The group is more uniform. You're very unlikely to see a 100kg or 300kg lift here.	LOWER

Skewness and Kurtosis Formulas

Skewness & Kurtosis Formulas

Skewness - Pearson (Median): G

G = \frac{3 \times (x̄ - Median)}{s}

Skewness - Pearson (Mode): G₁

G_{1} = \frac{x̄ - Mode}{s}

Skewness - Bowley: G₂

G_{2} = \frac{Q_{3} + Q_{1} - 2 \times Median}{Q_{3} - Q_{1}}

Kurtosis: K (Reference = 0.263)

K = \frac{n \times Σ {(x_{i} - x̄)}^{4}}{{(Σ {(x_{i} - x̄)}^{2})}^{2}}

8. Graphical Representations 📊

In Statistics we decribe data and show it. This is about the "show it" part. With a bunch of graphs you need to know bruh...

Discrete Data	Continuous Data
• Bar Chart (simple, stacked, grouped, 100%) • Column Chart • Pie Chart / Donut Chart • Pareto Chart • Mosaic Plot / Marimekko Chart • Heatmap (contingency table) • Dot Plot / Cleveland Dot Chart • Stem-and-Leaf Plot • Frequency Polygon (discrete version with gaps) • Treemap • Sankey Diagram • Alluvial Diagram • Chord Diagram • Word Cloud (text frequency)	• Histogram (no gaps) • Density Plot / Kernel Density Estimate (KDE) • Box Plot / Box-and-Whisker Plot • Violin Plot • Ridgeline Plot (Joyplot) • Scatter Plot • Line Plot / Time Series • Area Chart • Step Plot / Step Function / Staircase Plot • ECDF (Empirical CDF) • Q-Q Plot • P-P Plot • Bee Swarm Plot • Strip Plot / Jittered Dot Plot • Raincloud Plot • Contour Plot / 2D Density • Hexbin Plot • Rug Plot
Used for both types (especially mixed data)
• Box Plot (continuous values × discrete groups) • Violin Plot (continuous values × discrete groups) • Frequency Polygon (both versions exist) • Scatter Plot (continuous + categorical with jitter/color) • Dot Plot (both — with/without binning)

We will go through all of them. But for now here's what you need to know:

Discrete	Continuous
Bar Diagram	Histogram
Integral Diagram (Step Function)	Frequency Polygon
Dispersion Diagram	Integral Polygon
	Box Plot
	Stem-and-Leaf Plot

Note: Stem-and-Leaf plots can also work with discrete integer data.

Bar Diagram

Use it to: Compare the exact weight lifted by different individuals. Each bar is separate, perfect for distinct categories like people.

Histogram

Use it to: See how many lifters fall into weight ranges (e.g., 70-80kg). Bars touch to show continuous data, revealing the distribution's shape.

Integral Diagram (Step Function)

Use it to: Track the running total of weight lifted as you add each person. Shows the combined load so far.

Frequency Polygon

Use it to: Compare multiple distributions (e.g., men vs. women) on one chart. The line makes it easy to see trends in frequency across weight classes.

Dispersion Diagram / Scatter Plot

Use it to: Spot relationships between two variables, like a lifter's body weight and their max lift. Shows individual data points.

Integral Polygon (Ogive)

Use it to: Find percentiles. Easily see what percentage of lifters are below a certain weight, like how many lift under 100kg.

Box Plot

Use it to: Show how weightlifting results are spread out. A box plot highlights the median, range, and allows to check for outliers.

Stem-and-Leaf Plot

Dataset (weights lifted in kg):
X = [45, 48, 50, 52, 56, 61, 64, 69, 70]

How it works:
The stem is the tens digit (4, 5, 6, 7). The leaf is the ones digit. For example, 6 | 9 represents 69 kg.

The graph breaks down the frequency per class. We have 50, 52, 56; so the Stem is 5 | and the leaves are: 0 2 6

Use it to: Display individual weightlifting results while keeping the original data visible. Stem-and-leaf plots are best for small-to-medium datasets where you want to see both exact values and overall distribution.

🎯 Summary: Quick Reference Cheat Sheet

Core Concepts

Datapoint: Single measurement/observation
Dataset: Collection of datapoints (always order min→max)
Variable: Storage place for data
Random Sample: Subset where each point has equal likelihood of selection (I.I.D.)
Population vs Sample: Parameters (μ, σ) vs Statistics (x̄, s)

Data Types

Qualitative: Nominal (names), Ordinal (ranked)
Quantitative: Discrete (counts), Continuous (measures)
NOIR: Nominal, Ordinal, Interval, Ratio

Central Tendency

Mean: Average,Geometric, Harmonic
Median: Middle value (karate chop!)
Mode: Most frequent value (uni/bi/multi-modal)

Dispersion

Range: Max - Min
IQR: Q3 - Q1 (middle 50%)
Quartiles: Q1(25%), Q2(50%/Median), Q3(75%)
Percentiles: Value below which P% of data falls
Variance: Average squared deviation
Std Dev: √Variance (in original units)
Outliers: Values outside Tukey's fences: Q1/Q3 ± 1.5×IQR

Shape

Skewness: Symmetry/lopsidedness (left/negative, normal, right/positive)
Kurtosis: Peak height & tail thickness (lepto/meso/platy)
Tail Types: Fat tails = more extremes, thin tails = fewer extremes

Graphical Representations

Discrete Data: Bar chart, Step Function, Dispersion
Continuous Data: Histogram, Frequency Polygon, Integral Polygon, Box-Plot, Stem-and-Leaf

WHO LET ME COOK? 👩‍🍳🔥

This tutorial is basically my cry for help disguised as educational content 🤡📚

Statistics Tutorial