How to Create an Advanced Displot in Seaborn Using the Penguins Dataset (Complete Guide)
In modern data visualization, understanding the distribution of variables is one of the most fundamental and insightful analysis techniques. Seaborn’s displot()
is a figure-level function that provides a flexible, high-level interface to visualize univariate and bivariate distributions. It combines the power of histplot()
, kdeplot()
, and ecdfplot()
into one unified and advanced plotting function.
In this guide, we’ll explore how to create advanced displot()
visualizations using the built-in penguins
dataset, go through all parameters with real examples, and apply them in a real-life analytical scenario.
Step 1: Import Libraries and Load the Dataset
We begin by importing essential libraries and loading Seaborn’s built-in dataset penguins
.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load Seaborn's penguins dataset
df = sns.load_dataset("penguins")
# Drop missing values for clarity
df = df.dropna()
# Set a consistent theme for all plots
sns.set_theme(style="whitegrid", palette="muted", font_scale=1.1)
The penguins
dataset contains data about different penguin species and their physical measurements.
Columns include:
species
(Adelie, Gentoo, Chinstrap)island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
Step 2: Basic Displot Example
Let’s start simple—visualizing the distribution of penguin flipper lengths.
sns.displot(data=df, x="flipper_length_mm")
plt.title("Distribution of Penguin Flipper Lengths")
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Count")
plt.show()
This gives you a default histogram, which is equivalent to kind="hist"
in displot()
.
Step 3: Understanding the displot()
Function
Here’s the complete function signature of sns.displot()
:
seaborn.displot(
data=None, *, x=None, y=None, hue=None, row=None, col=None,
weights=None, kind='hist', rug=False, rug_kws=None, log_scale=None,
legend=True, palette=None, hue_order=None, hue_norm=None, color=None,
col_wrap=None, row_order=None, col_order=None, height=5, aspect=1,
facet_kws=None, **kwargs
)
Let’s break down every major parameter and use them in a real example.
Step 4: Advanced Univariate Displot (Histogram with KDE and Rug)
Here’s a comprehensive example showing how to combine several parameters for a detailed visualization.
sns.displot(
data=df,
x="body_mass_g", # Variable on x-axis
hue="species", # Different colors per species
kind="hist", # Plot type: hist | kde | ecdf
bins=25, # Number of bins
stat="density", # Show density instead of count
kde=True, # Add KDE curve
rug=True, # Add rug marks for individual observations
rug_kws={"height": 0.05, "alpha": 0.6, "color": "black"}, # Rug styling
multiple="stack", # Overlay options: stack, layer, dodge, fill
common_bins=True, # All hues share same bins
common_norm=False, # Normalize each hue separately
palette="Set2", # Custom palette
log_scale=(False, False), # Logarithmic axes
legend=True, # Show legend
height=5, # Height of each facet
aspect=1.3, # Width-to-height ratio
)
plt.title("Distribution of Penguin Body Mass by Species", fontsize=14, fontweight="bold")
plt.xlabel("Body Mass (g)")
plt.ylabel("Density")
plt.tight_layout()
plt.show()
Explanation of Parameters Used
kind="hist"
→ Specifies histogram type plot (default).bins=25
→ Defines number of bins.stat="density"
→ Normalizes the histogram.kde=True
→ Adds smooth density estimate.rug=True
→ Adds tick marks for each observation.multiple="stack"
→ Stacks hues vertically.common_norm=False
→ Prevents normalization across all species.palette="Set2"
→ Assigns visually balanced colors.
Step 5: Bivariate Displot (Two Variables)
The displot()
can visualize relationships between two continuous variables using either kind="hist"
or kind="kde"
.
sns.displot(
data=df,
x="flipper_length_mm",
y="bill_length_mm",
hue="species",
kind="kde", # KDE-style contour plot
fill=True, # Fill contours
thresh=0.05, # Density threshold for coloring
levels=10, # Number of contour levels
cmap="coolwarm", # Colormap
rug=True, # Show rug marks
height=6,
aspect=1.2
)
plt.suptitle("Bivariate KDE: Flipper Length vs Bill Length", fontsize=14, fontweight="bold")
plt.show()
Key Parameters
x
,y
→ Assigns variables for 2D plotting.kind='kde'
→ Enables contour-style kernel density estimate.fill=True
→ Fills between contour levels.levels=10
→ Sets density bands.cmap='coolwarm'
→ Chooses color mapping for contours.
Step 6: Faceting with row
and col
Parameters
Faceting creates multiple subplots divided by category variables—perfect for multi-dimensional comparison.
sns.displot(
data=df,
x="bill_length_mm",
col="species", # Separate columns for each species
row="sex", # Separate rows for each sex
hue="island", # Color by island
kind="hist",
kde=True,
height=3.8,
aspect=1.1,
palette="muted",
bins=15,
facet_kws={"margin_titles": True}
)
plt.suptitle("Faceted Distribution of Bill Length by Species, Sex, and Island", fontsize=14, fontweight="bold", y=1.03)
plt.show()
Explanation
row
/col
→ Create grid layouts for category-based subplots.facet_kws
→ Controls grid layout (e.g., margins, spacing).col_wrap
→ Wraps subplots across rows if there are too many columns.height
/aspect
→ Adjust subplot size and shape.
Step 7: Using ECDF Plots for Cumulative Distribution
An Empirical Cumulative Distribution Function (ECDF) shows how data values accumulate.
sns.displot(
data=df,
x="flipper_length_mm",
hue="species",
kind="ecdf", # ECDF plot type
palette="pastel",
height=5,
aspect=1.3,
rug=True,
rug_kws={"height": 0.05, "alpha": 0.5},
)
plt.title("ECDF Plot of Flipper Length by Penguin Species", fontsize=14, fontweight="bold")
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Cumulative Probability")
plt.tight_layout()
plt.show()
Why ECDF is Useful
- Highlights percentile-based patterns.
- Useful for comparing distribution overlap between species.
- Ideal for continuous variable insights in real-world datasets.
Step 8: Real-Life Scenario – Penguin Conservation Research
Let’s imagine a real-world application.
You are part of a wildlife research team analyzing penguin morphometric data to understand species adaptation to environmental conditions across islands.
Using sns.displot()
, you can:
- Compare body mass distributions across species and islands.
- Use bivariate KDE plots to identify clusters (e.g., Gentoo penguins tend to have longer flippers and heavier body mass).
- Use faceting to detect sex-based differences in size across species.
- Visualize ECDF to observe which species have more variation or tighter distribution in size measurements.
These visualizations support data-driven insights in ecology, evolution, and conservation policy.
Step 9: Complete Example – Multi-Level Advanced Displot
Let’s combine everything into one powerful visualization.
sns.displot(
data=df,
x="body_mass_g",
hue="species",
col="sex",
kind="kde",
rug=True,
fill=True,
multiple="stack",
common_norm=False,
height=4,
aspect=1.2,
palette="Set1",
facet_kws={"margin_titles": True}
)
plt.suptitle("Body Mass Distribution of Penguins by Species and Sex", fontsize=15, fontweight="bold", y=1.03)
plt.show()
This multi-dimensional visualization simultaneously reveals:
- Body mass variations by species and sex.
- Relative distribution densities with KDE curves.
- Easy comparison across subplots for male and female penguins.
Step 10: Save the Plot
Always export your visuals in high resolution for reports or dashboards.
plt.savefig("penguins_displot_advanced.png", dpi=300, bbox_inches="tight")
Step 11: Summary of Key Takeaways
✅ sns.displot()
is a figure-level function that manages multiple subplots using FacetGrid.
✅ Supports histograms, KDEs, and ECDFs with extensive customization.
✅ Parameters like hue
, row
, col
, multiple
, and facet_kws
make it ideal for storytelling.
✅ Great for univariate and bivariate visualizations.
✅ Real-world applications include data science, ecology, and behavioral research.
By mastering seaborn.displot()
, you can craft visually rich, data-driven stories that clearly communicate patterns, trends, and distributions across complex datasets — essential for any serious data analyst or scientist.