How to Create an Advanced Displot in Seaborn Using the Penguins Dataset (Complete Guide)

Farook Mohammad

2 days ago

How to Create an Advanced Displot in Seaborn Using the Penguins Dataset (Complete Guide)

In modern data visualization, understanding the distribution of variables is one of the most fundamental and insightful analysis techniques. Seaborn’s displot() is a figure-level function that provides a flexible, high-level interface to visualize univariate and bivariate distributions. It combines the power of histplot(), kdeplot(), and ecdfplot() into one unified and advanced plotting function.

In this guide, we’ll explore how to create advanced displot() visualizations using the built-in penguins dataset, go through all parameters with real examples, and apply them in a real-life analytical scenario.

Step 1: Import Libraries and Load the Dataset

We begin by importing essential libraries and loading Seaborn’s built-in dataset penguins.

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load Seaborn's penguins dataset
df = sns.load_dataset("penguins")

# Drop missing values for clarity
df = df.dropna()

# Set a consistent theme for all plots
sns.set_theme(style="whitegrid", palette="muted", font_scale=1.1)

The penguins dataset contains data about different penguin species and their physical measurements.
Columns include:

species (Adelie, Gentoo, Chinstrap)
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex

Step 2: Basic Displot Example

Let’s start simple—visualizing the distribution of penguin flipper lengths.

sns.displot(data=df, x="flipper_length_mm")
plt.title("Distribution of Penguin Flipper Lengths")
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Count")
plt.show()

This gives you a default histogram, which is equivalent to kind="hist" in displot().

Step 3: Understanding the `displot()` Function

Here’s the complete function signature of sns.displot():

seaborn.displot(
    data=None, *, x=None, y=None, hue=None, row=None, col=None,
    weights=None, kind='hist', rug=False, rug_kws=None, log_scale=None,
    legend=True, palette=None, hue_order=None, hue_norm=None, color=None,
    col_wrap=None, row_order=None, col_order=None, height=5, aspect=1,
    facet_kws=None, **kwargs
)

Let’s break down every major parameter and use them in a real example.

Step 4: Advanced Univariate Displot (Histogram with KDE and Rug)

Here’s a comprehensive example showing how to combine several parameters for a detailed visualization.

sns.displot(
    data=df,
    x="body_mass_g",                # Variable on x-axis
    hue="species",                  # Different colors per species
    kind="hist",                    # Plot type: hist | kde | ecdf
    bins=25,                        # Number of bins
    stat="density",                 # Show density instead of count
    kde=True,                       # Add KDE curve
    rug=True,                       # Add rug marks for individual observations
    rug_kws={"height": 0.05, "alpha": 0.6, "color": "black"},  # Rug styling
    multiple="stack",               # Overlay options: stack, layer, dodge, fill
    common_bins=True,               # All hues share same bins
    common_norm=False,              # Normalize each hue separately
    palette="Set2",                 # Custom palette
    log_scale=(False, False),       # Logarithmic axes
    legend=True,                    # Show legend
    height=5,                       # Height of each facet
    aspect=1.3,                     # Width-to-height ratio
)
plt.title("Distribution of Penguin Body Mass by Species", fontsize=14, fontweight="bold")
plt.xlabel("Body Mass (g)")
plt.ylabel("Density")
plt.tight_layout()
plt.show()

Explanation of Parameters Used

kind="hist" → Specifies histogram type plot (default).
bins=25 → Defines number of bins.
stat="density" → Normalizes the histogram.
kde=True → Adds smooth density estimate.
rug=True → Adds tick marks for each observation.
multiple="stack" → Stacks hues vertically.
common_norm=False → Prevents normalization across all species.
palette="Set2" → Assigns visually balanced colors.

Step 5: Bivariate Displot (Two Variables)

The displot() can visualize relationships between two continuous variables using either kind="hist" or kind="kde".

sns.displot(
    data=df,
    x="flipper_length_mm",
    y="bill_length_mm",
    hue="species",
    kind="kde",                     # KDE-style contour plot
    fill=True,                      # Fill contours
    thresh=0.05,                    # Density threshold for coloring
    levels=10,                      # Number of contour levels
    cmap="coolwarm",                # Colormap
    rug=True,                       # Show rug marks
    height=6,
    aspect=1.2
)
plt.suptitle("Bivariate KDE: Flipper Length vs Bill Length", fontsize=14, fontweight="bold")
plt.show()

Key Parameters

x, y → Assigns variables for 2D plotting.
kind='kde' → Enables contour-style kernel density estimate.
fill=True → Fills between contour levels.
levels=10 → Sets density bands.
cmap='coolwarm' → Chooses color mapping for contours.

Step 6: Faceting with `row` and `col` Parameters

Faceting creates multiple subplots divided by category variables—perfect for multi-dimensional comparison.

sns.displot(
    data=df,
    x="bill_length_mm",
    col="species",                  # Separate columns for each species
    row="sex",                      # Separate rows for each sex
    hue="island",                   # Color by island
    kind="hist",
    kde=True,
    height=3.8,
    aspect=1.1,
    palette="muted",
    bins=15,
    facet_kws={"margin_titles": True}
)
plt.suptitle("Faceted Distribution of Bill Length by Species, Sex, and Island", fontsize=14, fontweight="bold", y=1.03)
plt.show()

Explanation

row / col → Create grid layouts for category-based subplots.
facet_kws → Controls grid layout (e.g., margins, spacing).
col_wrap → Wraps subplots across rows if there are too many columns.
height / aspect → Adjust subplot size and shape.

Step 7: Using ECDF Plots for Cumulative Distribution

An Empirical Cumulative Distribution Function (ECDF) shows how data values accumulate.

sns.displot(
    data=df,
    x="flipper_length_mm",
    hue="species",
    kind="ecdf",                    # ECDF plot type
    palette="pastel",
    height=5,
    aspect=1.3,
    rug=True,
    rug_kws={"height": 0.05, "alpha": 0.5},
)
plt.title("ECDF Plot of Flipper Length by Penguin Species", fontsize=14, fontweight="bold")
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Cumulative Probability")
plt.tight_layout()
plt.show()

Why ECDF is Useful

Highlights percentile-based patterns.
Useful for comparing distribution overlap between species.
Ideal for continuous variable insights in real-world datasets.

Step 8: Real-Life Scenario – Penguin Conservation Research

Let’s imagine a real-world application.

You are part of a wildlife research team analyzing penguin morphometric data to understand species adaptation to environmental conditions across islands.

Using sns.displot(), you can:

Compare body mass distributions across species and islands.
Use bivariate KDE plots to identify clusters (e.g., Gentoo penguins tend to have longer flippers and heavier body mass).
Use faceting to detect sex-based differences in size across species.
Visualize ECDF to observe which species have more variation or tighter distribution in size measurements.

These visualizations support data-driven insights in ecology, evolution, and conservation policy.

Step 9: Complete Example – Multi-Level Advanced Displot

Let’s combine everything into one powerful visualization.

sns.displot(
    data=df,
    x="body_mass_g",
    hue="species",
    col="sex",
    kind="kde",
    rug=True,
    fill=True,
    multiple="stack",
    common_norm=False,
    height=4,
    aspect=1.2,
    palette="Set1",
    facet_kws={"margin_titles": True}
)
plt.suptitle("Body Mass Distribution of Penguins by Species and Sex", fontsize=15, fontweight="bold", y=1.03)
plt.show()

This multi-dimensional visualization simultaneously reveals:

Body mass variations by species and sex.
Relative distribution densities with KDE curves.
Easy comparison across subplots for male and female penguins.

Step 10: Save the Plot

Always export your visuals in high resolution for reports or dashboards.

plt.savefig("penguins_displot_advanced.png", dpi=300, bbox_inches="tight")

Step 11: Summary of Key Takeaways

✅ sns.displot() is a figure-level function that manages multiple subplots using FacetGrid.
✅ Supports histograms, KDEs, and ECDFs with extensive customization.
✅ Parameters like hue, row, col, multiple, and facet_kws make it ideal for storytelling.
✅ Great for univariate and bivariate visualizations.
✅ Real-world applications include data science, ecology, and behavioral research.

By mastering seaborn.displot(), you can craft visually rich, data-driven stories that clearly communicate patterns, trends, and distributions across complex datasets — essential for any serious data analyst or scientist.