When analyzing real-world datasets, understanding the distribution of continuous variables is crucial. The histogram plot is one of the most powerful visual tools for this purpose. In this advanced guide, we will show you how to create, customize, and enhance histogram plots using Seaborn’s built-in penguins
dataset. We will also explore all key parameters of the sns.histplot()
function to take your visualizations to a professional and publication-ready level.
Step 1: Import Required Libraries and Load Dataset
Let’s begin by loading Seaborn’s famous penguins
dataset and preparing it for plotting.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the built-in penguins dataset
df = sns.load_dataset("penguins")
# Drop missing values for clean plotting
df = df.dropna()
# Set a modern Seaborn theme
sns.set_theme(style="whitegrid", palette="deep", font_scale=1.1)
The penguins
dataset contains measurements for different species of penguins including:
species
(Adelie, Gentoo, Chinstrap)island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
We’ll use these continuous variables to create advanced histograms.
Step 2: Basic Histogram Plot
Let’s start with a simple histogram showing the distribution of penguin body masses.
sns.histplot(data=df, x="body_mass_g")
plt.title("Distribution of Penguin Body Mass")
plt.xlabel("Body Mass (g)")
plt.ylabel("Count")
plt.show()
This basic plot provides an overview of how penguin body mass is distributed, but we can do much more to refine and enrich the visualization.
Step 3: Advanced Histogram with All Parameters Explained
Seaborn’s histplot()
function is incredibly flexible.
The full signature is:
seaborn.histplot(
data=None, *,
x=None, y=None, hue=None, weights=None, stat='count', bins='auto',
binwidth=None, discrete=None, cumulative=False, common_bins=True,
common_norm=True, multiple='layer', element='bars', fill=True,
shrink=1, kde=False, kde_kws=None, line_kws=None, color=None,
palette=None, hue_order=None, hue_norm=None, log_scale=None,
legend=True, ax=None, **kwargs
)
Let’s build a comprehensive example using most of these parameters in a meaningful way.
Step 4: Advanced Histogram Example (Multi-Feature Encoding)
We will visualize how penguin body mass is distributed across species and sex, adding a KDE curve, custom bins, and style enhancements.
plt.figure(figsize=(12,6))
sns.histplot(
data=df,
x="body_mass_g", # Continuous variable for the x-axis
hue="species", # Different species shown by color
multiple="stack", # Overlay style: stack, layer, dodge, fill
bins=25, # Number of bins
binwidth=None, # Can specify exact bin width instead of count
stat="density", # Options: count, frequency, probability, density
kde=True, # Show smooth density curve
kde_kws={"linewidth":2, "alpha":0.7}, # KDE line settings
element="bars", # Draw as bars (can be 'bars', 'step', 'poly')
fill=True, # Fill bars with color
common_bins=True, # Use same bins for all subsets
common_norm=False, # Normalize densities independently for each hue
palette="Set2", # Color palette
shrink=0.9, # Reduce bar width for spacing
alpha=0.8, # Transparency
legend=True, # Display legend
linewidth=1.2, # Border width of bars
edgecolor="white", # Edge color for clean separation
)
plt.title("Distribution of Penguin Body Mass by Species", fontsize=14, fontweight="bold")
plt.xlabel("Body Mass (grams)")
plt.ylabel("Density")
plt.legend(title="Species", bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
Explanation of Parameters Used:
hue="species"
→ Different species represented by different colors.multiple="stack"
→ Bars stacked on top of each other (use"dodge"
for side-by-side comparison).bins=25
→ Histogram divided into 25 bins.stat='density'
→ Normalizes the histogram to show relative distribution.kde=True
→ Adds a smooth density estimate curve.element='bars'
→ Bars as the visual element (alternatives:'poly'
,'step'
).palette='Set2'
→ Soft color palette for professional look.fill=True
→ Filled bars for visual clarity.shrink=0.9
→ Slightly narrows bars to add spacing.common_norm=False
→ Each species normalized separately.edgecolor='white'
→ Separates bars clearly for cleaner visuals.linewidth=1.2
→ Slight outline for better contrast.
Step 5: Combining Multiple Variables with Faceting
Seaborn’s FacetGrid
allows creating histograms by categories (e.g., species and sex) in one figure.
g = sns.FacetGrid(df, col="species", row="sex", margin_titles=True, height=3.5, aspect=1.3)
g.map_dataframe(
sns.histplot,
x="flipper_length_mm",
bins=20,
color="#1f77b4",
kde=True,
stat="count",
edgecolor="black",
alpha=0.8
)
g.set_axis_labels("Flipper Length (mm)", "Count")
g.fig.suptitle("Distribution of Flipper Length by Species and Sex", fontsize=15, fontweight="bold", y=1.05)
plt.show()
Benefits of Faceting
- Quickly compares distributions across multiple groups.
- Automatically scales axes for clarity.
- Reduces clutter compared to overlayed hue plots.
Step 6: Adding Log Scale and Custom Binning
If your data spans wide ranges, using a logarithmic scale or custom bin edges enhances visibility.
plt.figure(figsize=(10,5))
sns.histplot(
data=df,
x="flipper_length_mm",
hue="species",
bins=[170,180,190,200,210,220,230,240],
log_scale=(False, True), # Log scale for y-axis
stat="count",
multiple="dodge",
palette="coolwarm",
element="bars",
linewidth=1,
edgecolor="black"
)
plt.title("Flipper Length Distribution with Custom Bins", fontsize=13, fontweight="bold")
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Count (Log Scale)")
plt.tight_layout()
plt.show()
Explanation
bins=[...]
→ Defines exact bin edges manually.log_scale=(False, True)
→ Applies log scale to y-axis only.multiple='dodge'
→ Bars for different species shown side-by-side.
Step 7: Overlaying a Histogram with a Rug Plot and KDE
To further emphasize individual data points, overlay a rug plot.
plt.figure(figsize=(10,5))
sns.histplot(
data=df,
x="bill_depth_mm",
bins=20,
color="skyblue",
stat="density",
kde=True,
edgecolor="black",
alpha=0.7
)
sns.rugplot(data=df, x="bill_depth_mm", height=0.05, color="darkblue")
plt.title("Bill Depth Distribution with KDE and Rug Plot", fontsize=13, fontweight="bold")
plt.xlabel("Bill Depth (mm)")
plt.ylabel("Density")
plt.show()
This adds a subtle layer of granularity—each rug mark represents one observation along the x-axis.
Step 8: Real-Life Use Case
Let’s consider a wildlife research scenario:
You are studying how body mass distributions differ among penguin species to estimate feeding habits and energy needs.
The advanced histogram we created can help biologists:
- Identify overlapping mass ranges between species.
- Estimate species health variations.
- Detect outliers (e.g., unusually heavy or light penguins).
- Decide optimal feeding portions or conservation priorities.
By using hue, density normalization, and KDE overlays, we can easily interpret data that would otherwise be overwhelming in tabular format.
Step 9: Saving and Exporting High-Resolution Plots
Always save your final figure in publication-quality format.
plt.savefig("penguins_histogram.png", dpi=300, bbox_inches='tight')
Use .png
for high quality images or .svg
for vector graphics (ideal for reports).
Step 10: Summary of Key Takeaways
✅ sns.histplot()
combines power and simplicity for advanced histogram visualizations.
✅ Use hue
, multiple
, kde
, and element
for rich, multi-dimensional plots.
✅ Faceting is perfect for comparisons across multiple groups.
✅ Custom bins and log scales make your plots precise and insightful.
✅ Combine KDE and rug plots for distributional storytelling.
✅ Always export plots in high resolution for dashboards or publications.
By mastering these techniques, you’ll be able to create publication-ready, data-rich histogram plots that are both visually stunning and analytically powerful — perfect for professional analytics, academic research, or executive dashboards.