Get the most out of the little data you have by capturing data through bootstrapping.


Imagine you are trying to measure the average height of all the trees in a large forest. It is not practical to measure each. Instead, measure a small sample and use those measurements to estimate the forest-wide average. Bootstrapping in statistics works on a similar principle.
This involves taking small samples from the data and estimating the statistics of the dataset (such as the mean, median, and standard deviation) through repeated sampling methods. This technique allows you to make more reliable inferences about a population from small samples.
This article covers:
- Bootstrap basics, what exactly is it?
- How to achieve bootstrapped samples with BigQuery
- Experiments to understand how results change based on different sample sizes and how it relates to known statistics
- Stored procedures that you can take home and use yourself
At its core, bootstrapping involves randomly selecting a large number of observations from a dataset and replacing them to form what is called a “bootstrap sample.”
Let’s simplify this concept using a scenario where you have a basket of 25 apples and are interested in the average weight of the apples in a larger context, such as a market.
Techniques for grabbing and taking notes
First, dive into the basket, grab a random apple, weigh it, and immediately put it back in the basket instead of setting it aside. This way, every time you reach for apples, you can pick all the apples again, including the ones you just weighed.
repeat
Now, repeat the action of grabbing, weighing, and replacing as many times as there are apples.