Violin Plot with Matplotlib

One of the things I sorely missed from matplotlib for a very long time, was a violin plot implementation. Many a time, I thought about implementing one myself, but never found the time.
Today, browsing through Matplotlib's documentation, I found the recently added fill_betweenx function. Finally it seemed to have become a piece of cake to implement a violin plot. I Googled for violin plot and Python, to no avail. So I decided to write it myself.
Violin Plots are very similar to Box and whiskers plots, however they offer a more detailed view of a dataset's variability. It's frequently a good idea to combine them on the same plot. So here is what I came up with:
# -*- coding: utf-8 -*-
from matplotlib.pyplot import figure, show
from scipy.stats import gaussian_kde
from numpy.random import normal
from numpy import arange
def violin_plot(ax,data,pos, bp=False):
'''
create violin plots on an axis
'''
dist = max(pos)-min(pos)
w = min(0.15*max(dist,1.0),0.5)
for d,p in zip(data,pos):
k = gaussian_kde(d) #calculates the kernel density
m = k.dataset.min() #lower bound of violin
M = k.dataset.max() #upper bound of violin
x = arange(m,M,(M-m)/100.) # support for violin
v = k.evaluate(x) #violin profile (density curve)
v = v/v.max()*w #scaling the violin to the available space
ax.fill_betweenx(x,p,v+p,facecolor='y',alpha=0.3)
ax.fill_betweenx(x,p,-v+p,facecolor='y',alpha=0.3)
if bp:
ax.boxplot(data,notch=1,positions=pos,vert=1)
if __name__=="__main__":
pos = range(5)
data = [normal(size=100) for i in pos]
fig=figure()
ax = fig.add_subplot(111)
violin_plot(ax,data,pos,bp=1)
show()
The next step now is to contribute this plot to Matplotlib, but before I do that, I'd like to get some comments on this particular implementation. Moreover, I don't know if it'd be acceptable for Matplotlib to add Scipy as a dependency. But since re-implementing kernel density estimation for a simple plot would be overkill, maybe the destiny of this implementation will be to live on as an example for others to adapt and use.
WARNING: This code requires maplotlib 0.99 (maybe 0.99.1rc1) to work because of the fill_betweenx function.

4 comments:
Nice!
I guess you can import scipy from the violin function, and/or add an optional parameter that would accept the kde.
Definitely post a patch to mpl please.
Cool! I hadn't heard of violin plots before. A beautiful way to summarize some data.
@Ondrej: I will certainly post a patch to MPL, maybe this weekend. maybe I can make the import within a try/except clause and warn the user he needs scipy in order to make violin plots.
Thanks Flavio! My group loves violin plots, but had been relying on R. We'll be sure to use this in the future.
Post a Comment