Thursday, November 19, 2009
Developing a Wave Robot in Python
For those of you living under a rock in the last several months, Google Wave offers an API for the development of robot, which are... well, robots, which when added to waves, do automated tasks with its contents. There are robots for translating, do syntax highlighting on codde snippets and so on.
My idea for a robot was to create one which would allow for easy insertion of Google trends results into a conversation. Why? because everyone's favorite kind of online conversation is centered around what's hot and what's not... thus the need to quickly check the latests trends in mindshare (as proxied by search volume).
My initial approach was to somehow fetch raw data about trends from google and generate a small plot with the results (using google charts). But I stumbled into the fact that Google has yet to release an API for access to trend data. After a couple of days frustrated, which made me almost give up, I stumbled into a solution, which was both easier to code and faster on the server -- Robots use AppEngine for hosting, but for a robot, you must strive to keep latency as low as possible.
The solution involved embedding Google's own trend gadget, into the wave, replacing of the markup. Did I mention the markup? after fiddling with various possibilities, and decided on this markup: ~~topic1,topic2,topic3,topic4.
Whenever the robot finds such strings, it tries inserts a trends graph for those trends in the blip. make sure you dont forget the period at the end of the topic list, and add some text after the end of the markup, otherwise the robot gets confused... (I'll fix this bug at some point).
Enough talk, If you want to know how I did it, go read the source. If you're curious of how it works and have a wave account, add it to a wave and start playing: trendy-robot@appspot.com
Thursday, October 22, 2009
ASCII Histograms
I have recently come across an interestng problem while working on my random variable implementation for BIP. Any type in Python is expected to have a __str__(self) method which returns an adequate and expressive string representation of the object. Well, as far as I could think, the most straightforward representation of a random variable is its probability distribution. Probability distributions most often depicted graphically by a continuous density function, or a histogram. So my challenge was how to bring the information conveyed by a histogram to a concise ascii string, suitable to be the output of a print statement?
I immediately rejected the boring solution of representing the distribution by its moments (mean, variance, skewness, etc.). I wanted a full histogram in as few ascii characters as possible. So I set out to implement my own ASCII histogram generator. I can anticipate that it was a very simple task given the handy histogram function in Numpy and how easy it is to do string formatting in Python. It was nevertheless a fun couple of hours of programming. I ended up implementing a horizontal and a vertical histogram. The ascii histogram proved to be very useful since it helped enormously in debugging code involving probability calculations with simple print statements. Probabilistic simulations are extremely hard to test because the results of a given operation are never strictly the same. However, they should have the same probability distribution, so by looking at the rough shape of the histogram, you tell you if your calculations are going in the right direction.
Curiously, such a simple and expressive representation for probability distributions is not available in any package I knew, so I decided to share the code with the scientific Python community so that people that may put it to good use. The code below is part of BIP and consequently under GPL license. Any suggestions of improvements are welcome.
# -*- coding: utf-8 -*-class Histogram(object):
""" Ascii histogram """def __init__(self, data, bins=10):
""" Class constructor :Parameters: - `data`: array like object """self.data = data
self.bins = bins
self.h = histogram(self.data, bins=self.bins)
def horizontal(self, height=4, character ='|'):
"""Returns a multiline string containing a a horizontal histogram representation of self.data :Parameters: - `height`: Height of the histogram in characters - `character`: Character to use >>> d = normal(size=1000) >>> h = Histogram(d,bins=25) >>> print h.horizontal(5,'|') 106 ||| ||||| ||||||| |||||||||| ||||||||||||| -3.42 3.09 """his = """"""
bars = self.h[0]/max(self.h[0])*height
for l in reversed(range(1,height+1)):
line = ""
if l == height:
line = '%s '%max(self.h[0]) #histogram top count
else:
line = ' '*(len(str(max(self.h[0])))+1) #add leading spaces
for c in bars:
if c >= ceil(l):
line += characterelse:
line += ' '
line +='\n'
his += linehis += '%.2f'%self.h[1][0] + ' '*(self.bins) +'%.2f'%self.h[1][-1] + '\n'
return his
def vertical(self,height=20, character ='|'):
""" Returns a Multi-line string containing a a vertical histogram representation of self.data :Parameters: - `height`: Height of the histogram in characters - `character`: Character to use >>> d = normal(size=1000) >>> Histogram(d,bins=10) >>> print h.vertical(15,'*') 236 -3.42: -2.78: -2.14: *** -1.51: ********* -0.87: ************* -0.23: *************** 0.41 : *********** 1.04 : ******** 1.68 : * 2.32 : """his = """"""
xl = ['%.2f'%n for n in self.h[1]]
lxl = [len(l) for l in xl]
bars = self.h[0]/max(self.h[0])*height
his += ' '*(max(bars)+2+max(lxl))+'%s\n'%max(self.h[0])
for i,c in enumerate(bars):
line = xl[i] +' '*(max(lxl)-lxl[i])+': '+ character*c+'\n'
his += linereturn his
if __name__ == "__main__":
from numpy.random import normal
d = normal(size=1000)
h = Histogram(d,bins=10)
print h.vertical(15)
print h.horizontal(5)
Tuesday, September 29, 2009
The Internet Manifesto
This document is a must read (see link at the end).
I want to add my own items to it:
1. Net Neutrality is not only protecting Internet content provider corporations' profits
We need to defend net neutrality in a way which goes beyond what is currently done: We need to establish fixed ip addresss for every private individual so that publishing rights don't have to be gatekeeped by large corporations such as Google and the like.
2. Copyright should not be sellable item.
The source of most of the confusion about whether copyright is a good thing or not in the information age, is the fact that most of the commercial exploration of copyrighted materials is not done by the original authors, but by large publishing houses which either coerce authors to give them the commercial right to their work in exchange for pennies, or exploit materials which should be in the public domain for as much as 70 years after the authors death.
3. Network Infrastructure ownership should not be a priviledge of large corporations.
The right to form open "ad hoc" wireless networks should be guaranteed globally, like we have with amateur radio for decades. This is the only way to assure the basic freedom of association and expression.