Python in Science: April 2010

Monday, April 19, 2010

New Liveplots Release

I recently blogged about Liveplots. I have just released another version with a very important improvement.

Before, even though plot server was running on a separate process, there was still a small overhead for the calling code since the plotting function had to be evaluated (and data piped to Gnuplot) before the execution returned to your simulation. To improve matters, I have implemeted queueing of the plotting calls, so that now, any call to the plot server returns immediately, allowing the main code to run at maximum speeds. In the simple.py example the speed improvement was greater the 10x!!. This of course mean that the plotting runs asynchronously to the simulation, and depending on the rate of plot creation, plotting can lag behind somewhat.

To make this new feature even nicer, it is implemented as a queuing decorator so that future expansion of the plotting API can be done whithout worrying about the queueing machinery a simple @enqueue decorator to the plot function will do the magic.

Enjoy!

Thursday, April 15, 2010

Liveplots package

I have recently released a package I had been using privately for a while. It's called Liveplots and is a package which provides the tools for monitoring computationally intensive, long running numerical simulations. I am aware that similar tools for Java, like Livegraph. But I needed something that was light, simple, and in Python.

Liveplots provide a daemonized plot server (i.e. running as a separate process) which accepts plot commands via xmlrpc, and a file system monitor, which can be used to produce visualizations when data files are created or modified.

Liveplots is fast enough for most applications. The simple example script available with the source code is able to generate about 15 plots per second on my laptop, but your mileage may vary, depending on how fast you computer is. For the file monitor, it doesn't make sense talking about speed since it is event driven but the same upper limmit should apply. An example is provided for it as well. The file monitor works only in Linux since it depends on pyinotify, but the plot server should work everywhere Python and Gnuplot run.

A very simple API is offered by the plot server, basically consisting of three types of plots: histogram, scatter, and line plots. All of these commands are able to plot multiple data-sets on the same set of axes or on a multi-plot page layout. I think this should suffice for now, since the purpose of this package is to provide a quick visual inspection of long running computations, not to produce publication quality plots. I any case, If you would like to see its API extended, please send me patches. New plot commands should be really easy to implement if you are familiar with Gnuplot.

The visualization part is done by Gnuplot, which I chose because it is fast and ubiquitous. It requires Gnuplot.py, which unfortunately is not yet available through easy_install, thus requiring an extra installation step. Liveplots itself is easy_installable.

Monday, April 5, 2010

Automatic Parameterization of Scipy's Random Number Generators

I am working on a new version of Model-Builder, which will incorporate Uncertainty analysis features based on my BIP package.

One key aspect of doing uncertainty analysis, is the ability to attribute prior distributions to the model's parameters, and programatically, this is easily accomplished through the use of scipy.stats random number generators which provide a large variety of both continuous and discrete distributions. However, since Model-Builder is a GUI-based software, I need to offer a unified interface for the parameterization of such distribution.

This should not be as hard as it seems because in the stats package, all distributions are either a subclass of stats.rv_continuous or stats.rv_discrete, and thus [almost] all of them can be parameterized by calling the class constructor with a tuple of shape parameters, the length of which will vary from distribution to distribution but is available as the attribute numargs. This is exemplified in the doc string of all the distributions as the following pseudo-doctest:

>>> import matplotlib.pyplot as plt
>>> numargs = norm.numargs
>>> [ ] = [0.9,]*numargs
>>> rv = norm()

Display frozen pdf

>>> x = np.linspace(0,np.minimum(rv.dist.b,3))
>>> h=plt.plot(x,rv.pdf(x))

So I set out to test if this would actually work for all generators. If it worked for at least the most important ones, it would mean I could just have the user provide me with a -tuple of numbers, and pass them to whichever distribution the user chose. To run this test I came up with the following script:


# -*- coding: utf-8 -*-
import pylab as P
import numpy as np
f1 = P.figure()
f2 = P.figure()
a1 = f1.add_subplot(111)
a1.set_title("Continuous Distributions")
a2 = f2.add_subplot(111)
a2.set_title("Discrete Distributions")
for d in dir(st):
   obj = getattr(st, d)
   if isinstance(obj, st.rv_continuous):
       try:
           numargs = obj.numargs
           shape_pars = [0.9]*numargs
           rv = obj(shape_pars)
           x = np.linspace(0,np.minimum(rv.dist.b,3))
           h=a1.plot(x,rv.pdf(x),  label=obj.name)
           a1.legend()
           print "+++>", obj.name
       except:
           print "--->", obj.name
   if isinstance(obj, st.rv_discrete):
       try:
           numargs = obj.numargs
           shape_pars = [0.9]*numargs
           rv = obj(shape_pars)
           x = np.linspace(np.maximum(-3, rv.dist.a),np.minimum(rv.dist.b,3))
           h=a2.plot(x,rv.pmf(x),  label=obj.name)
           a2.legend()
           print "+++>", obj.name
       except:
           print "--->", obj.name
P.show()

When I ran this simple script, I not got the plots of each probability distribution that conformed with the parameterization via the shape tuple, but also a report which did (+++>) or did not(--->) conform. Here are the results:

+++> alpha
---> anglit
---> arcsine
+++> bernoulli
---> beta
---> betaprime
---> binom
---> boltzmann
+++> bradford
---> burr
---> cauchy
+++> chi
+++> chi2
---> cosine
+++> dgamma
+++> dlaplace
+++> dweibull
+++> erlang
---> expon
+++> exponpow
---> exponweib
---> f
+++> fatiguelife
+++> fink
+++> foldcauchy
+++> foldnorm
+++> frechet_l
+++> frechet_r
+++> gamma
---> gausshyper
---> genexpon
+++> genextreme
---> gengamma
+++> genhalflogistic
+++> genlogistic
+++> genpareto
+++> geom
---> gilbrat
+++> gompertz
---> gumbel_l
---> gumbel_r
---> halfcauchy
---> halflogistic
---> halfnorm
---> hypergeom
---> hypsecant
+++> invgamma
+++> invnorm
+++> invweibull
---> johnsonb
---> johnsonsu
+++> ksone
---> kstwobign
---> laplace
---> levy
---> levy_l
---> levy_stable
+++> loggamma
---> logistic
+++> loglaplace
+++> lognorm
+++> logser
+++> lomax
---> maxwell
---> mielke
+++> nakagami
---> nbinom
---> ncf
---> nct
---> ncx2
---> norm
+++> pareto
+++> planck
+++> poisson
+++> powerlaw
---> powerlognorm
+++> powernorm
---> randint
---> rayleigh
+++> rdist
+++> recipinvgauss
---> reciprocal
+++> rice
---> semicircular
+++> t
+++> triang
+++> truncexpon
---> truncnorm
+++> tukeylambda
---> uniform
+++> vonmises
---> wald
+++> weibull_max
+++> weibull_min
+++> wrapcauchy
+++> zipf

From the list above, it is clear that although the majority passed the test, many important ones did not (n, beta, uniform, etc.). This don't mean those cannot be used or won't work, but only that they cannot be parameterized in the same way as the rest. The norm distribution, for instance, has numargs = 0, (WTF?)

Although initially I was pretty excited about the possibility of handling every probability distribution in the same standard way, in the end I was disappointed that the Scipy people, after having come up with a such nice design for the stats package, did not think about running their own doc-tests (or making them runnable in the first place).

I am still looking for a solution to this. If you have an idea of how to elegantly work around this issue, please comment below.

Python in Science