Site icon DataDevX

How to Create a Histogram Plot in Seaborn Using the penguins Dataset (Advanced Guide)

How to Create a Histogram Plot in Seaborn Using the penguins Dataset (Advanced Guide)

How to Create a Histogram Plot in Seaborn Using the penguins Dataset (Advanced Guide)

When analyzing real-world datasets, understanding the distribution of continuous variables is crucial. The histogram plot is one of the most powerful visual tools for this purpose. In this advanced guide, we will show you how to create, customize, and enhance histogram plots using Seaborn’s built-in penguins dataset. We will also explore all key parameters of the sns.histplot() function to take your visualizations to a professional and publication-ready level.


Step 1: Import Required Libraries and Load Dataset

Let’s begin by loading Seaborn’s famous penguins dataset and preparing it for plotting.

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the built-in penguins dataset
df = sns.load_dataset("penguins")

# Drop missing values for clean plotting
df = df.dropna()

# Set a modern Seaborn theme
sns.set_theme(style="whitegrid", palette="deep", font_scale=1.1)

The penguins dataset contains measurements for different species of penguins including:

We’ll use these continuous variables to create advanced histograms.


Step 2: Basic Histogram Plot

Let’s start with a simple histogram showing the distribution of penguin body masses.

sns.histplot(data=df, x="body_mass_g")
plt.title("Distribution of Penguin Body Mass")
plt.xlabel("Body Mass (g)")
plt.ylabel("Count")
plt.show()

This basic plot provides an overview of how penguin body mass is distributed, but we can do much more to refine and enrich the visualization.


Step 3: Advanced Histogram with All Parameters Explained

Seaborn’s histplot() function is incredibly flexible.
The full signature is:

seaborn.histplot(
    data=None, *,
    x=None, y=None, hue=None, weights=None, stat='count', bins='auto',
    binwidth=None, discrete=None, cumulative=False, common_bins=True,
    common_norm=True, multiple='layer', element='bars', fill=True,
    shrink=1, kde=False, kde_kws=None, line_kws=None, color=None,
    palette=None, hue_order=None, hue_norm=None, log_scale=None,
    legend=True, ax=None, **kwargs
)

Let’s build a comprehensive example using most of these parameters in a meaningful way.


Step 4: Advanced Histogram Example (Multi-Feature Encoding)

We will visualize how penguin body mass is distributed across species and sex, adding a KDE curve, custom bins, and style enhancements.

plt.figure(figsize=(12,6))

sns.histplot(
    data=df,
    x="body_mass_g",              # Continuous variable for the x-axis
    hue="species",                # Different species shown by color
    multiple="stack",             # Overlay style: stack, layer, dodge, fill
    bins=25,                      # Number of bins
    binwidth=None,                # Can specify exact bin width instead of count
    stat="density",               # Options: count, frequency, probability, density
    kde=True,                     # Show smooth density curve
    kde_kws={"linewidth":2, "alpha":0.7},  # KDE line settings
    element="bars",               # Draw as bars (can be 'bars', 'step', 'poly')
    fill=True,                    # Fill bars with color
    common_bins=True,             # Use same bins for all subsets
    common_norm=False,            # Normalize densities independently for each hue
    palette="Set2",               # Color palette
    shrink=0.9,                   # Reduce bar width for spacing
    alpha=0.8,                    # Transparency
    legend=True,                  # Display legend
    linewidth=1.2,                # Border width of bars
    edgecolor="white",            # Edge color for clean separation
)
plt.title("Distribution of Penguin Body Mass by Species", fontsize=14, fontweight="bold")
plt.xlabel("Body Mass (grams)")
plt.ylabel("Density")
plt.legend(title="Species", bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

Explanation of Parameters Used:


Step 5: Combining Multiple Variables with Faceting

Seaborn’s FacetGrid allows creating histograms by categories (e.g., species and sex) in one figure.

g = sns.FacetGrid(df, col="species", row="sex", margin_titles=True, height=3.5, aspect=1.3)
g.map_dataframe(
    sns.histplot,
    x="flipper_length_mm",
    bins=20,
    color="#1f77b4",
    kde=True,
    stat="count",
    edgecolor="black",
    alpha=0.8
)
g.set_axis_labels("Flipper Length (mm)", "Count")
g.fig.suptitle("Distribution of Flipper Length by Species and Sex", fontsize=15, fontweight="bold", y=1.05)
plt.show()

Benefits of Faceting


Step 6: Adding Log Scale and Custom Binning

If your data spans wide ranges, using a logarithmic scale or custom bin edges enhances visibility.

plt.figure(figsize=(10,5))
sns.histplot(
    data=df,
    x="flipper_length_mm",
    hue="species",
    bins=[170,180,190,200,210,220,230,240],
    log_scale=(False, True),      # Log scale for y-axis
    stat="count",
    multiple="dodge",
    palette="coolwarm",
    element="bars",
    linewidth=1,
    edgecolor="black"
)
plt.title("Flipper Length Distribution with Custom Bins", fontsize=13, fontweight="bold")
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Count (Log Scale)")
plt.tight_layout()
plt.show()

Explanation


Step 7: Overlaying a Histogram with a Rug Plot and KDE

To further emphasize individual data points, overlay a rug plot.

plt.figure(figsize=(10,5))
sns.histplot(
    data=df,
    x="bill_depth_mm",
    bins=20,
    color="skyblue",
    stat="density",
    kde=True,
    edgecolor="black",
    alpha=0.7
)
sns.rugplot(data=df, x="bill_depth_mm", height=0.05, color="darkblue")
plt.title("Bill Depth Distribution with KDE and Rug Plot", fontsize=13, fontweight="bold")
plt.xlabel("Bill Depth (mm)")
plt.ylabel("Density")
plt.show()

This adds a subtle layer of granularity—each rug mark represents one observation along the x-axis.


Step 8: Real-Life Use Case

Let’s consider a wildlife research scenario:
You are studying how body mass distributions differ among penguin species to estimate feeding habits and energy needs.

The advanced histogram we created can help biologists:

By using hue, density normalization, and KDE overlays, we can easily interpret data that would otherwise be overwhelming in tabular format.


Step 9: Saving and Exporting High-Resolution Plots

Always save your final figure in publication-quality format.

plt.savefig("penguins_histogram.png", dpi=300, bbox_inches='tight')

Use .png for high quality images or .svg for vector graphics (ideal for reports).


Step 10: Summary of Key Takeaways

sns.histplot() combines power and simplicity for advanced histogram visualizations.
✅ Use hue, multiple, kde, and element for rich, multi-dimensional plots.
Faceting is perfect for comparisons across multiple groups.
Custom bins and log scales make your plots precise and insightful.
✅ Combine KDE and rug plots for distributional storytelling.
✅ Always export plots in high resolution for dashboards or publications.


By mastering these techniques, you’ll be able to create publication-ready, data-rich histogram plots that are both visually stunning and analytically powerful — perfect for professional analytics, academic research, or executive dashboards.

Exit mobile version