Data Tables – Aggregated vs Disaggregated 📊🧮
Last updated: 2026-01-02
Applies to version: 1.0+
Reading time: ~12 min
Quick Summary / TL;DR
Tables are where your data goes to get organized or get lost forever. 😾
You have two choices: show every single datapoint (disaggregated) or group them into intervals (aggregated) like a lazy ass.
Disaggregated tables are raw, honest, and painful to look at. Aggregated tables are tidy, clean, and hide the ugly truth.
Choose wisely, 'cause if you fuck this up, your graphs will lie and your professor will fail you. 🚫📉
Let's dive in, dumbass.
Table of Contents
- Trick Question – Discrete or Continuous?
- The Chess Dataset – Wins, Draws, and .5 Bullshit
- Table Types: Disaggregated vs Aggregated
- Disaggregated Table – Full Transparency
- Aggregated Table – Binned & Hidden
- Binning Methods – How Many Classes?
- Formulas Change – Know Your Math
- When to Use Which (Don’t Be Stupid)
- Cheat Sheet & Next Steps
1. Trick Question – Discrete or Continuous? 🤔
Alright, check this out:
Dataset values: 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5...
Trick question: Are these discrete or continuous?
If you said "continuous", go back to tutorial #1, bruh. You're falling behind. 😤
If you said "can be either", okay – you're thinking, but still wrong.
The Truth:
This is chess data. A win = 1 point, a draw = 0.5 points.
So 2.5 means: 2 wins + 1 draw. You're counting games, not measuring height or temperature.
Counting = Discrete. Even with decimal points, you're still counting outcomes, not measuring a continuum.
Remember: if you're not measuring (heights, temps, weights), you're counting. Discrete ≠ only integers. Discrete = distinct, separate outcomes.
💡 Memory tip: Decimals don't automatically mean continuous. Think: "Can I have 2.3 of this thing in real life?" For chess games? No. For temperature? Yes.
2. The Chess Dataset – Wins, Draws, and .5 Bullshit ♟️📈
Here's the raw dataset I gathered from analyzing chess player streaks:
| Xi | Fi |
|---|---|
| 2 | 72 |
| 2.5 | 9 |
| 3 | 41 |
| 3.5 | 20 |
| 4 | 28 |
| 4.5 | 12 |
| 5 | 29 |
| 5.5 | 14 |
| 6 | 23 |
| 6.5 | 11 |
| 7 | 23 |
| 7.5 | 8 |
| 8 | 8 |
What this means:
- Xi = The streak value (e.g., 2 = two consecutive wins)
- Fi = Frequency (e.g., 72 players had a streak of 2 wins)
- So the dataset is:
[2, 2, 2, ... (72 times), 2.5, 2.5, ... (9 times), 3, 3, ... (41 times), ...]
Note: X (without subscript) = entire dataset. Xi (with subscript) = individual value.
And as you know, you always order data from min to max before working with it. This shit's already ordered, so we're good.
3. Table Types: Disaggregated vs Aggregated 🧾↔️📦
Tables come in two flavors. Choose your poison:
1. Disaggregated Data / Raw Data Table
Every single datapoint is shown with its frequency.
Transparent, honest, and painful to look at if you have lots of data.
Used when you have distinct values (like chess streaks) or small datasets.
2. Aggregated Data Table
Data grouped into intervals (classes). You don't see individual values.
Tidy, clean, and hides the ugly details.
Two sub-flavors:
- Same amplitude intervals: All classes have equal width
- Different amplitude intervals: Classes have different widths
Now, why would you use aggregated? Because sometimes you're given data that way, or you have too much data to list individually. Or you're lazy. Probably lazy. 😴
4. Disaggregated Table – Full Transparency 🔍📋
Let's build the full table for our chess data. This is where we calculate everything.
Table Headers & Meanings:
- Xi: Datapoint
- Fi: Absolute Frequency
- CumFi: Cumulative Absolute Frequency
- fi: Relative Frequency (Fi ÷ total)
- cumfi: Cumulative Relative Frequency
- FiXi: For calculating mean
- |Xi - μ|: Absolute deviation from mean
- Fi|Xi - μ|: For Mean Absolute Deviation
- (Xi - μ)²: Squared deviation
- Fi(Xi - μ)²: For variance
- Xi²: Squared value
- FiXi²: Alternative variance calculation
Most of these are intermediate calculations you'll never look at again, but you need 'em to get the real stats.
The Full Disaggregated Table:
| Xi | Fi | CumFi | fi | cumfi | FiXi | |Xi - μ| | Fi|Xi - μ| | (Xi - μ)² | Fi(Xi - μ)² | Xi² | FiXi² |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 72 | 72.0000 | 0.1686 | 0.1686 | 144.0000 | 7.9801 | 574.5667 | 63.6819 | 4585.0964 | 4.0000 | 288.0000 |
| 2.5 | 9 | 81.0000 | 0.0211 | 0.1897 | 22.5000 | 7.4801 | 67.3208 | 55.9518 | 503.5662 | 6.2500 | 56.2500 |
| 3 | 41 | 122.0000 | 0.0960 | 0.2857 | 123.0000 | 6.9801 | 286.1838 | 48.7217 | 1997.5900 | 9.0000 | 369.0000 |
| 3.5 | 20 | 142.0000 | 0.0468 | 0.3326 | 70.0000 | 6.4801 | 129.6019 | 41.9916 | 839.8323 | 12.2500 | 245.0000 |
| 4 | 28 | 170.0000 | 0.0656 | 0.3981 | 112.0000 | 5.9801 | 167.4426 | 35.7615 | 1001.3226 | 16.0000 | 448.0000 |
| 4.5 | 12 | 182.0000 | 0.0281 | 0.4262 | 54.0000 | 5.4801 | 65.7611 | 30.0314 | 360.3771 | 20.2500 | 243.0000 |
| 5 | 29 | 211.0000 | 0.0679 | 0.4941 | 145.0000 | 4.9801 | 144.4227 | 24.8013 | 719.2387 | 25.0000 | 725.0000 |
| 5.5 | 14 | 225.0000 | 0.0328 | 0.5269 | 77.0000 | 4.4801 | 62.7213 | 20.0712 | 280.9974 | 30.2500 | 423.5000 |
| 6 | 23 | 248.0000 | 0.0539 | 0.5808 | 138.0000 | 3.9801 | 91.5422 | 15.8411 | 364.3464 | 36.0000 | 828.0000 |
| 6.5 | 11 | 259.0000 | 0.0258 | 0.6066 | 71.5000 | 3.4801 | 38.2810 | 12.1111 | 133.2216 | 42.2500 | 464.7500 |
| 7 | 23 | 282.0000 | 0.0539 | 0.6604 | 161.0000 | 2.9801 | 68.5422 | 8.8810 | 204.2620 | 49.0000 | 1127.0000 |
| 7.5 | 8 | 290.0000 | 0.0187 | 0.6792 | 60.0000 | 2.4801 | 19.8407 | 6.1509 | 49.2069 | 56.2500 | 450.0000 |
| 8 | 8 | 298.0000 | 0.0187 | 0.6979 | 64.0000 | 1.9801 | 15.8407 | 3.9208 | 31.3662 | 64.0000 | 512.0000 |
Key Observations:
- Relative frequency (fi) tells you what percentage of the total each value represents. 2 appears 16.86% of the time.
- Cumulative frequency shows running totals. By streak 5.5, you've seen 52.69% of all data.
- This table lets you calculate everything: mean, variance, skewness, kurtosis, percentiles...
Descriptive Stats from This Table:
| Stat | Value | Stat | Value |
|---|---|---|---|
| n | 427.0000 | μ (mean) | 9.9801 |
| Mo (mode) | 2.0000 | Me (median) | 5.5000 |
| σ² (variance) | 242.2051 | σ (std dev) | 15.5629 |
| DAM (MAD) | 8.2875 | Q1 | 3.0000 |
| Q3 | 9.0000 | IQR | 6.0000 |
| CV | 1.5594 | Skewness (G) | 0.8636 |
| Kurtosis (K) | 0.2000 | Range | 111.5000 |
Interpretation: Positive skew (0.8636), leptokurtic (K < 0.263), mean > median > mode. Typical right-skewed distribution with a few long winning streaks pulling the mean up.
5. Aggregated Table – Binned & Hidden 📦🔢
Now for the lazy (or practical) approach: binning data into intervals.
The Problem:
You're given this table already aggregated. You don't know the individual values anymore.
Example: Interval [2-10] has 35 data points. Which 35 values? Fuck if I know. Could be 2, 3, 5.5, 9... anything in that range.
This is common in published data, surveys, or when someone already processed the data for you (probably poorly).
Example Aggregated Table:
| Class | Interval | Linf | Lsup | ai | Fi | CumFi | fi | cumfi | Ci | FiCi | fiCi | fiCi² | fi|Ci - μ| | hi = fi/ai | Hi = Fi/ai |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | [2-10] | 2.0000 | 10.0000 | 8.0000 | 35.0000 | 35.0000 | 0.3500 | 0.3500 | 6.0000 | 210.0000 | 2.1000 | 12.6000 | 2.9138 | 0.0438 | 4.3750 |
| 2 | [10-15] | 10.0000 | 15.0000 | 5.0000 | 25.0000 | 60.0000 | 0.2500 | 0.6000 | 12.5000 | 312.5000 | 3.1250 | 39.0625 | 0.4563 | 0.0500 | 5.0000 |
| 3 | [15-20] | 15.0000 | 20.0000 | 5.0000 | 10.0000 | 70.0000 | 0.1000 | 0.7000 | 17.5000 | 175.0000 | 1.7500 | 30.6250 | 0.3175 | 0.0200 | 2.0000 |
| 4 | [20-25] | 20.0000 | 25.0000 | 5.0000 | 20.0000 | 90.0000 | 0.2000 | 0.9000 | 22.5000 | 450.0000 | 4.5000 | 101.2500 | 1.6350 | 0.0400 | 4.0000 |
| 5 | [25-32] | 25.0000 | 32.0000 | 7.0000 | 10.0000 | 100.0000 | 0.1000 | 1.0000 | 28.5000 | 285.0000 | 2.8500 | 81.2250 | 1.4175 | 0.0143 | 1.4286 |
Column Definitions:
- Linf, Lsup: Lower/upper class limits
- ai: Class width = Lsup - Linf
- Ci: Class midpoint = (Linf + Lsup) ÷ 2
- hi: Frequency density = fi ÷ ai (for histograms)
- Hi: Cumulative frequency density
Note: This is purely an academic exercise. In real life, why would you start with data already in a table? Because your professor is a sadist, that's why.
6. Binning Methods – How Many Classes? 📏🔢
If you're aggregating raw data yourself, how many classes (intervals) should you create?
There are formulas. Take each with a grain of salt – they're mild suggestions, not divine truth.
Example Data:
- N = 100 (sample size)
- Range = 8
- Q1 = 1, Q3 = 4, IQR = 3
- Standard deviation = 3
Different Methods Give Different Answers:
| Method | Formula | Number of Classes |
|---|---|---|
| Square Root Rule | √N | √100 = 10 |
| Sturges' Rule | 1 + 3.322·log₁₀(N) | 1 + 3.322·log₁₀(100) = 7.644 |
| Scott's Rule | Range ÷ (3.5·σ·N⁻¹/³) | 8 ÷ (3.5·3·100⁻¹/³) ≈ 3.536 |
| Freedman-Diaconis | Range ÷ (2·IQR·N⁻¹/³) | 8 ÷ (2·3·100⁻¹/³) ≈ 6.189 |
Which One to Use?
- Square Root: Simple, okay for small datasets
- Sturges: Classic, biased toward normal distributions
- Scott: Good for normal data, uses standard deviation
- Freedman-Diaconis: Robust, uses IQR (good for skewed data)
My advice: Try a few, see which gives meaningful intervals. Don't blindly follow formulas.
7. Formulas Change – Know Your Math 🧮⚠️
PAY ATTENTION: The math changes when working with aggregated data.
You're not starting from raw data anymore – you're interpolating from binned data. Formulas are different.
Key Differences:
- Mean: μ = Σ(fi·Ci) where Ci = class midpoint
- Variance: σ² = Σ[fi·(Ci - μ)²] (using midpoints)
- Percentiles/Median: Use linear interpolation within classes
- Mode: Use formula based on adjacent class frequencies
Stats from Aggregated Table:
| Stat | Value | Interpretation |
|---|---|---|
| N | 100.0000 | Total observations |
| μ | 14.3250 | Mean |
| Mo | 10.8621 | Mode (using interpolation formula) |
| Me | 13.0000 | Median |
| σ² | 59.5569 | Variance |
| σ | 7.7173 | Standard deviation |
| Q1 | 7.7143 | First quartile |
| Q3 | 21.2500 | Third quartile |
| Skewness (G1) | 0.4487 | Positive skew |
| Kurtosis (K) | 0.3267 | Platykurtic (K > 0.263) |
Percentile Calculation Example (Median):
P50 = 0.5 (50th percentile)
cum f(Me-1) = 0.3500 (cumulative frequency before median class)
f(Me) = 0.2500 (frequency of median class)
li(Me) = 10.0000 (lower limit of median class)
a(Me) = 5.0000 (width of median class)
Formula: Me = li(Me) + [(0.5 - cum f(Me-1)) ÷ f(Me)] × a(Me)
Me = 10 + [(0.5 - 0.35) ÷ 0.25] × 5 = 13.0000
Bottom line: If you use raw data formulas on aggregated data, you'll be wrong. Know which table type you have.
8. When to Use Which (Don't Be Stupid) 🤔✅
Use Disaggregated Tables When:
- You have few distinct values (like chess streaks)
- Data is already categorical or discrete
- You need exact calculations (no approximation)
- Sample size is small (< 100 observations)
- You're presenting data transparently
Use Aggregated Tables When:
- You have continuous data with many unique values
- Sample size is large (> 100 observations)
- Data is already given to you in intervals
- You need to simplify for presentation
- Creating histograms or density plots
- You're lazy (most common reason)
Graph Choice Depends on Table Type:
- Disaggregated discrete data: Bar charts, dot plots, stem-and-leaf
- Aggregated continuous data: Histograms, frequency polygons, ogives
- Mixed situations: Box plots work for both (using raw or binned data)
For our chess example (discrete): bar chart or stem-and-leaf. For the aggregated example: histogram or cumulative frequency graph.
🎯 Cheat Sheet & Next Steps
Key Concepts
- Disaggregated Table: Shows every value with frequency
- Aggregated Table: Groups data into intervals (classes)
- Discrete ≠ Integers: Chess scores with .5 are still discrete
- Binning Methods: √N, Sturges, Scott, Freedman-Diaconis
- Formulas Change: Aggregated data uses different formulas
Table Headers
- Xi: Datapoint
- Fi: Absolute frequency
- fi: Relative frequency
- Ci: Class midpoint (aggregated)
- ai: Class width
When to Use
- Disaggregated: Small datasets, exact calculations, transparency
- Aggregated: Large datasets, continuous data, simplification
- Graphs: Match table type (bar vs histogram)
Common Mistakes
- Using raw formulas on aggregated data
- Blindly following binning rules
- Confusing discrete decimal data with continuous
- Forgetting to check cumulative columns
Next Steps
- Learn: Histograms & frequency polygons
- Practice: Convert raw data to aggregated and compare stats
- Read: Exploratory Data Analysis (Tukey)