In modern data visualization, understanding the distribution of variables is one of the most fundamental and insightful analysis techniques. Seaborn’s displot()
is a figure-level function that provides a flexible, high-level interface to visualize univariate and bivariate distributions. It combines the power of histplot()
, kdeplot()
, and ecdfplot()
into one unified and advanced plotting function.
In this guide, we’ll explore how to create advanced displot()
visualizations using the built-in penguins
dataset, go through all parameters with real examples, and apply them in a real-life analytical scenario.
Step 1: Import Libraries and Load the Dataset
We begin by importing essential libraries and loading Seaborn’s built-in dataset penguins
.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load Seaborn's penguins dataset
df = sns.load_dataset("penguins")
# Drop missing values for clarity
df = df.dropna()
# Set a consistent theme for all plots
sns.set_theme(style="whitegrid", palette="muted", font_scale=1.1)
The penguins
dataset contains data about different penguin species and their physical measurements.
Columns include:
species
(Adelie, Gentoo, Chinstrap)island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
Step 2: Basic Displot Example
Let’s start simple—visualizing the distribution of penguin flipper lengths.
sns.displot(data=df, x="flipper_length_mm")
plt.title("Distribution of Penguin Flipper Lengths")
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Count")
plt.show()
This gives you a default histogram, which is equivalent to kind="hist"
in displot()
.
Step 3: Understanding the displot()
Function
Here’s the complete function signature of sns.displot()
:
seaborn.displot(
data=None, *, x=None, y=None, hue=None, row=None, col=None,
weights=None, kind='hist', rug=False, rug_kws=None, log_scale=None,
legend=True, palette=None, hue_order=None, hue_norm=None, color=None,
col_wrap=None, row_order=None, col_order=None, height=5, aspect=1,
facet_kws=None, **kwargs
)
Let’s break down every major parameter and use them in a real example.
Step 4: Advanced Univariate Displot (Histogram with KDE and Rug)
Here’s a comprehensive example showing how to combine several parameters for a detailed visualization.
sns.displot(
data=df,
x="body_mass_g", # Variable on x-axis
hue="species", # Different colors per species
kind="hist", # Plot type: hist | kde | ecdf
bins=25, # Number of bins
stat="density", # Show density instead of count
kde=True, # Add KDE curve
rug=True, # Add rug marks for individual observations
rug_kws={"height": 0.05, "alpha": 0.6, "color": "black"}, # Rug styling
multiple="stack", # Overlay options: stack, layer, dodge, fill
common_bins=True, # All hues share same bins
common_norm=False, # Normalize each hue separately
palette="Set2", # Custom palette
log_scale=(False, False), # Logarithmic axes
legend=True, # Show legend
height=5, # Height of each facet
aspect=1.3, # Width-to-height ratio
)
plt.title("Distribution of Penguin Body Mass by Species", fontsize=14, fontweight="bold")
plt.xlabel("Body Mass (g)")
plt.ylabel("Density")
plt.tight_layout()
plt.show()
Explanation of Parameters Used
kind="hist"
→ Specifies histogram type plot (default).bins=25
→ Defines number of bins.stat="density"
→ Normalizes the histogram.kde=True
→ Adds smooth density estimate.rug=True
→ Adds tick marks for each observation.multiple="stack"
→ Stacks hues vertically.common_norm=False
→ Prevents normalization across all species.palette="Set2"
→ Assigns visually balanced colors.
Step 5: Bivariate Displot (Two Variables)
The displot()
can visualize relationships between two continuous variables using either kind="hist"
or kind="kde"
.
sns.displot(
data=df,
x="flipper_length_mm",
y="bill_length_mm",
hue="species",
kind="kde", # KDE-style contour plot
fill=True, # Fill contours
thresh=0.05, # Density threshold for coloring
levels=10, # Number of contour levels
cmap="coolwarm", # Colormap
rug=True, # Show rug marks
height=6,
aspect=1.2
)
plt.suptitle("Bivariate KDE: Flipper Length vs Bill Length", fontsize=14, fontweight="bold")
plt.show()
Key Parameters
x
,y
→ Assigns variables for 2D plotting.kind='kde'
→ Enables contour-style kernel density estimate.fill=True
→ Fills between contour levels.levels=10
→ Sets density bands.cmap='coolwarm'
→ Chooses color mapping for contours.
Step 6: Faceting with row
and col
Parameters
Faceting creates multiple subplots divided by category variables—perfect for multi-dimensional comparison.
sns.displot(
data=df,
x="bill_length_mm",
col="species", # Separate columns for each species
row="sex", # Separate rows for each sex
hue="island", # Color by island
kind="hist",
kde=True,
height=3.8,
aspect=1.1,
palette="muted",
bins=15,
facet_kws={"margin_titles": True}
)
plt.suptitle("Faceted Distribution of Bill Length by Species, Sex, and Island", fontsize=14, fontweight="bold", y=1.03)
plt.show()
Explanation
row
/col
→ Create grid layouts for category-based subplots.facet_kws
→ Controls grid layout (e.g., margins, spacing).col_wrap
→ Wraps subplots across rows if there are too many columns.height
/aspect
→ Adjust subplot size and shape.
Step 7: Using ECDF Plots for Cumulative Distribution
An Empirical Cumulative Distribution Function (ECDF) shows how data values accumulate.
sns.displot(
data=df,
x="flipper_length_mm",
hue="species",
kind="ecdf", # ECDF plot type
palette="pastel",
height=5,
aspect=1.3,
rug=True,
rug_kws={"height": 0.05, "alpha": 0.5},
)
plt.title("ECDF Plot of Flipper Length by Penguin Species", fontsize=14, fontweight="bold")
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Cumulative Probability")
plt.tight_layout()
plt.show()
Why ECDF is Useful
- Highlights percentile-based patterns.
- Useful for comparing distribution overlap between species.
- Ideal for continuous variable insights in real-world datasets.
Step 8: Real-Life Scenario – Penguin Conservation Research
Let’s imagine a real-world application.
You are part of a wildlife research team analyzing penguin morphometric data to understand species adaptation to environmental conditions across islands.
Using sns.displot()
, you can:
- Compare body mass distributions across species and islands.
- Use bivariate KDE plots to identify clusters (e.g., Gentoo penguins tend to have longer flippers and heavier body mass).
- Use faceting to detect sex-based differences in size across species.
- Visualize ECDF to observe which species have more variation or tighter distribution in size measurements.
These visualizations support data-driven insights in ecology, evolution, and conservation policy.
Step 9: Complete Example – Multi-Level Advanced Displot
Let’s combine everything into one powerful visualization.
sns.displot(
data=df,
x="body_mass_g",
hue="species",
col="sex",
kind="kde",
rug=True,
fill=True,
multiple="stack",
common_norm=False,
height=4,
aspect=1.2,
palette="Set1",
facet_kws={"margin_titles": True}
)
plt.suptitle("Body Mass Distribution of Penguins by Species and Sex", fontsize=15, fontweight="bold", y=1.03)
plt.show()
This multi-dimensional visualization simultaneously reveals:
- Body mass variations by species and sex.
- Relative distribution densities with KDE curves.
- Easy comparison across subplots for male and female penguins.
Step 10: Save the Plot
Always export your visuals in high resolution for reports or dashboards.
plt.savefig("penguins_displot_advanced.png", dpi=300, bbox_inches="tight")
Step 11: Summary of Key Takeaways
✅ sns.displot()
is a figure-level function that manages multiple subplots using FacetGrid.
✅ Supports histograms, KDEs, and ECDFs with extensive customization.
✅ Parameters like hue
, row
, col
, multiple
, and facet_kws
make it ideal for storytelling.
✅ Great for univariate and bivariate visualizations.
✅ Real-world applications include data science, ecology, and behavioral research.
By mastering seaborn.displot()
, you can craft visually rich, data-driven stories that clearly communicate patterns, trends, and distributions across complex datasets — essential for any serious data analyst or scientist.