Thursday, 22 October 2009

ASCII Histograms

I have recently come across an interestng problem while working on my random variable implementation for BIP. Any type in Python is expected to have a __str__(self) method which returns an adequate and expressive string representation of the object. Well, as far as I could think, the most straightforward representation of a random variable is its probability distribution. Probability distributions most often depicted graphically by a continuous density function, or a histogram. So my challenge was how to bring the information conveyed by a histogram to a concise ascii string, suitable to be the output of a print statement?

I immediately rejected the boring solution of representing the distribution by its moments (mean, variance, skewness, etc.). I wanted a full histogram in as few ascii characters as possible. So I set out to implement my own ASCII histogram generator. I can anticipate that it was a very simple task given the handy histogram function in Numpy and how easy it is to do string formatting in Python. It was nevertheless a fun couple of hours of programming. I ended up implementing a horizontal and a vertical histogram. The ascii histogram proved to be very useful since it helped enormously in debugging code involving probability calculations with simple print statements. Probabilistic simulations are extremely hard to test because the results of a given operation are never strictly the same. However, they should have the same probability distribution, so by looking at the rough shape of the histogram, you tell you if your calculations are going in the right direction.

Curiously, such a simple and expressive representation for probability distributions is not available in any package I knew, so I decided to share the code with the scientific Python community so that people that may put it to good use. The code below is part of BIP and consequently under GPL license. Any suggestions of improvements are welcome.

# -*- coding: utf-8 -*-
class Histogram(object):
    """
    Ascii histogram
    """
    def __init__(self, data, bins=10):
        """
        Class constructor
        
        :Parameters:
            - `data`: array like object
        """
        self.data = data
        self.bins = bins
        self.h = histogram(self.data, bins=self.bins)
    def horizontal(self, height=4, character ='|'):
        """Returns a multiline string containing a
        a horizontal histogram representation of self.data
        :Parameters:
            - `height`: Height of the histogram in characters
            - `character`: Character to use
        >>> d = normal(size=1000)
        >>> h = Histogram(d,bins=25)
        >>> print h.horizontal(5,'|')
        106            |||
                      |||||
                      |||||||
                    ||||||||||
                   |||||||||||||
        -3.42                         3.09
        """
        his = """"""
        bars = self.h[0]/max(self.h[0])*height
        for l in reversed(range(1,height+1)):
            line = ""
            if l == height:
                line = '%s '%max(self.h[0]) #histogram top count
            else:
                line = ' '*(len(str(max(self.h[0])))+1) #add leading spaces
            for c in bars:
                if c >= ceil(l):
                    line += character
                else:
                    line += ' '
            line +='\n'
            his += line
        his += '%.2f'%self.h[1][0] + ' '*(self.bins) +'%.2f'%self.h[1][-1] + '\n'
        return his
    def vertical(self,height=20, character ='|'):
        """
        Returns a Multi-line string containing a
        a vertical histogram representation of self.data
        :Parameters:
            - `height`: Height of the histogram in characters
            - `character`: Character to use
        >>> d = normal(size=1000)
        >>> Histogram(d,bins=10)
        >>> print h.vertical(15,'*')
                              236
        -3.42:
        -2.78:
        -2.14: ***
        -1.51: *********
        -0.87: *************
        -0.23: ***************
        0.41 : ***********
        1.04 : ********
        1.68 : *
        2.32 :
        """
        his = """"""
        xl = ['%.2f'%n for n in self.h[1]]
        lxl = [len(l) for l in xl]
        bars = self.h[0]/max(self.h[0])*height
        his += ' '*(max(bars)+2+max(lxl))+'%s\n'%max(self.h[0])
        for i,c in enumerate(bars):
            line = xl[i] +' '*(max(lxl)-lxl[i])+': '+ character*c+'\n'
            his += line
        return his
            
if __name__ == "__main__":
    from numpy.random import normal
    d = normal(size=1000)
    h = Histogram(d,bins=10)
    print h.vertical(15)
    print h.horizontal(5)

Tuesday, 29 September 2009

The Internet Manifesto

This document is a must read (see link at the end).

I want to add my own items to it:

1. Net Neutrality is not only protecting Internet content provider corporations' profits

We need to defend net neutrality in a way which goes beyond what is currently done: We need to establish fixed ip addresss for every private individual so that publishing rights don't have to be gatekeeped by large corporations such as Google and the like.

2. Copyright should not be sellable item.

The source of most of the confusion about whether copyright is a good thing or not in the information age, is the fact that most of the commercial exploration of copyrighted materials is not done by the original authors, but by large publishing houses which either coerce authors to give them the commercial right to their work in exchange for pennies, or exploit materials which should be in the public domain for as much as 70 years after the authors death.

3. Network Infrastructure ownership should not be a priviledge of large corporations.

The right to form open "ad hoc" wireless networks should be guaranteed globally, like we have with amateur radio for decades. This is the only way to assure the basic freedom of association and expression.

referente a: Internet-Manifesto (ver no Google Sidewiki)

Monday, 28 September 2009

Python(x,y) A Scientific Python Distribution

I recently came across, this interesting, opensource Python Scientific Distribution for Windows. I normally don't pay too much attention to windows tools, but it's good to have something to recommend to windows users when you want them to try out some Python code.

For Linux users, it's not really relevant, because we all have powerfull package managers to help us get most Python packages installed very easily.

The dowside is that it is a really big download, 300+ MB, and the only mirror available was giving me only about 15 Kbps today (I am on a 18MBps connection). It is so big that it includes Enthought Tool Suite in it and much more.

For users of more civilized Operating systems, like Linux, It's worth checking out one of the editors bundled, spyder, which is available through pypi ("easy_install spyder"). It's a reincarnation of pydee, and despite its beta status, it is very good already.

referente a: python(x,y) - Python for Scientists (ver no Google Sidewiki)

Amazon

Video Bar

Loading...