Learn 7 different ways to visualize data distribution
Exploratory data analysis and data visualization often involves examining the distribution of a dataset. This provides important insights into your data, such as identifying ranges, outliers or unusual groupings, the central tendency of your data, and any bias in your data. Comparing subsets of data reveals more information about the data at hand. Gain instant insight with professionally constructed visualizations of the distribution of datasets. This guide details several options for quickly creating clean and meaningful visualizations using Python.
Visualizations covered:
- histogram
- KDE (density) plot
- Joy plot or ridge plot
- boxplot
- violin plot
- Strip plots and swarm plots
- ECDF plot
Data and code:
This article uses fully synthetic weather data generated according to the concepts in the previous article. Data for this article and the complete Jupyter notebook are available here. Linked GitHub page. Feel free to download both and follow the steps, or refer to the code block below.
The libraries, imports, and settings used for this are:
# Data Handling:
import pandas as pd
from pandas.api.types import CategoricalDtype# Data Visualization Libraries:
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from joypy import joyplot
# Display Configuration:
%config InlineBackend.figure_format='retina'
First, let’s load and prepare the data. This is a simple synthetic weather dataframe showing various temperature measurements for three cities over four seasons.
# Load data:
df = pd.read_csv('weatherData.csv')# Set season as a categorical data type:
season = CategoricalDtype(['Winter', 'Spring', 'Summer', 'Fall'])
df['Season'] = df['Season'].astype(season)
Note that in this code, the Season column is set to a categorical data type. This will…