Thursday, September 17, 2009

Violin Plot with Matplotlib


One of the things I sorely missed from matplotlib for a very long time, was a violin plot implementation. Many a time, I thought about implementing one myself, but never found the time.

Today, browsing through Matplotlib's documentation, I found the recently added fill_betweenx function. Finally it seemed to have become a piece of cake to implement a violin plot. I Googled for violin plot and Python, to no avail. So I decided to write it myself.

Violin Plots are very similar to Box and whiskers plots, however they offer a more detailed view of a dataset's variability. It's frequently a good idea to combine them on the same plot. So here is what I came up with:


# -*- coding: utf-8 -*-
from matplotlib.pyplot import figure, show
from scipy.stats import gaussian_kde
from numpy.random import normal
from numpy import arange


def violin_plot(ax,data,pos, bp=False):
'''
create violin plots on an axis
'''
dist = max(pos)-min(pos)
w = min(0.15*max(dist,1.0),0.5)
for d,p in zip(data,pos):
k = gaussian_kde(d) #calculates the kernel density
m = k.dataset.min() #lower bound of violin
M = k.dataset.max() #upper bound of violin
x = arange(m,M,(M-m)/100.) # support for violin
v = k.evaluate(x) #violin profile (density curve)
v = v/v.max()*w #scaling the violin to the available space
ax.fill_betweenx(x,p,v+p,facecolor='y',alpha=0.3)
ax.fill_betweenx(x,p,-v+p,facecolor='y',alpha=0.3)
if bp:
ax.boxplot(data,notch=1,positions=pos,vert=1)

if __name__=="__main__":
pos = range(5)
data = [normal(size=100) for i in pos]
fig=figure()
ax = fig.add_subplot(111)
violin_plot(ax,data,pos,bp=1)
show()


The next step now is to contribute this plot to Matplotlib, but before I do that, I'd like to get some comments on this particular implementation. Moreover, I don't know if it'd be acceptable for Matplotlib to add Scipy as a dependency. But since re-implementing kernel density estimation for a simple plot would be overkill, maybe the destiny of this implementation will be to live on as an example for others to adapt and use.

WARNING: This code requires maplotlib 0.99 (maybe 0.99.1rc1) to work because of the fill_betweenx function.

8 comments:

Ondřej Čertík said...

Nice!

I guess you can import scipy from the violin function, and/or add an optional parameter that would accept the kde.

Definitely post a patch to mpl please.

Alex said...

Cool! I hadn't heard of violin plots before. A beautiful way to summarize some data.

Flavio Coelho said...

@Ondrej: I will certainly post a patch to MPL, maybe this weekend. maybe I can make the import within a try/except clause and warn the user he needs scipy in order to make violin plots.

anand said...

Thanks Flavio! My group loves violin plots, but had been relying on R. We'll be sure to use this in the future.

Lou said...

Hi Flavio,

I am very excited to find your post on violin plots, as I am trying to integrate them into my work. I tried to run your post script and get the following error...

"File "numpy.pxd", line 30, in scipy.stats.vonmises_cython (scipy\stats\vonmises_cython.c:2939)
ValueError: numpy.dtype does not appear to be the correct type object"

I am using NumPy version 1.4.0rc1.

Flavio Coelho said...

@Lou: I think this is an issue with your numpy or scipy installation. The script work nice for me.

Lou said...

Thanks for your quick reply. What version of numpy and scipy are you using? Thanks again, Lou

Stjórn said...

Is thera any news about what is happening with the violin inclusion

http://www.mail-archive.com/matplotlib-users@lists.sourceforge.net/msg13532.html

ccp

Amazon