## Thursday, October 30, 2008

### Fast(?) Gillespies Direct Algorithm in Python

Today I took the day off to implement the Gillespie SSA algorithm. For those of you who have never heard of it is a solver for stochastic equations. There is no implementation of it in Python (to my knowledge) and the only other easily accessible implementation I found was the one on the GillespieSSA package for R.

To help the unfamiliar to understand the application and relevance of this algorithm, consider the following dynamical system derived from an epidemiological problem:

Suppose an infected individual arrives in his home town after his/her vacation carrying an infectious disease entirely new to his fellow citizens. This means no one in town has antibodies for that disease. Let this set of people be called the "susceptibles" and let S be the number of people in this group. Our Infected individual will initially be the only member of a group called the "infectious" which size is denoted by I. The rate of encounters between susceptibles and infectious is proportional to their numbers (S*I). Everytime an infectious meet a susceptible there is a chance b that the disease is transmitted, so the probability of a transmission event is b*S*I. This is called an individual state transition, since the individual went from a susceptible state to an infectious one.
The dynamical system used as an example here contains this and other state transitions described in the code below:

`"""Example of an SEIR model with two Infectious classes: subclinical(Is) and clinical(Ic)        Is       /  \S -> E     R       \  /        Ic        States:S: SusceptibleE: ExposedIs: Infectious subclinicalIc: Infectious clinicalR: RecoveredTransition rates:b,ks,kc,rs,rc = (0.001, 0.1, 0.1, 0.01, .01)Transitions:S -> E : b*S*(Is+Ic)E -> Is : ks*EE -> Ic : kc*EIs -> R : rs*IsIc -> R : rc*Ic"""from gillespie import Modelimport timefrom numpy import arrayvars = ['S','E','Is','Ic','R']#rates: b,ks,kc,rs,rcr = (0.001, 0.1, 0.1, 0.01, .01)ini = (490,0,10,0,0)prop = (lambda r,ini:r*ini*(ini+ini),        lambda r,ini:r*ini,        lambda r,ini:r*ini,        lambda r,ini:r*ini,        lambda r,ini:r*ini        )tmat = array([[-1,0,0,0,0],            [1,-1,-1,0,0],            [0,1,0,-1,0],            [0,0,1,0,-1],            [0,0,0,1,1]            ])#for e in prop:#    print e()M=Model(vnames=vars,rates = r,inits=ini,tmat=tmat,propensity=prop)t0 = time.time()M.run(tmax=80,reps=100)print 'total time: ',time.time()-t0t,series,steps = M.getStats()print steps,'steps'from pylab import plot , show, legend, errorbarplot(t,series.mean(axis=2),'-o')legend(vars,loc=0)show()`
As can be seen there are five different states in this model with the last one R, being an absorbing state, since all individuals converge to that state with time.

the code that simulates this stochastic dynamic model is in the gillespie module:

`from numpy.random import uniform, multinomial, exponential,randomfrom numpy import arange, array, empty,zeros,log#from math import logimport timeimport multiprocessingimport psycopsyco.full()#global ini#ini=[500,1,0]class Model:    def __init__(self,vnames,rates,inits, tmat,propensity):        '''         * vnames: list of strings         * rates: list of fixed rate parameters         * inits: list of initial values of variables         * propensity: list of lambda functions of the form:             lambda r,ini: some function of rates ans inits.        '''        self.vn = vnames        self.rates = rates        self.inits = inits        self.tm = tmat        self.pv = propensity#[compile(eq,'errmsg','eval') for eq in propensity]        self.pvl = len(self.pv) #length of propensity vector        self.nvars = len(self.inits) #number of variables        self.time=None        self.series=None        self.steps=0        def getStats(self):        return self.time,self.series,self.steps        def run(self,method='SSA', tmax=10, reps=1):        self.res = zeros((tmax,self.nvars,reps),dtype=float)        tvec = arange(tmax, dtype=int)        if method =='SSA':            for i in xrange(reps):                steps = self.GSSA(tmax,i)            print steps,' steps'        elif method == 'SSAct':            pass        self.time=tvec        self.series=self.res        self.steps=steps    def GSSA(self, tmax=50,round=0):        '''        Gillespie Direct algorithm        '''        ini = self.inits        r = self.rates        pvi = self.pv        l=self.pvl        pv = zeros(l,dtype=float)        tm = self.tm                tc = 0        steps = 0        self.res[0,:,round]= ini        a0=1        for tim in xrange(1,tmax):            while tc < tim:                for i in xrange(l):                    pv[i] = pvi[i](r,ini)                #pv = abs(array([eq() for eq in pvi]))# #propensity vector                a0 = pv.sum() #sum of all transition probabilities#                print tim, pv, a0                tau = (-1/a0)*log(random())                event = multinomial(1,pv/a0) # event which will happen on this iteration                ini += tm[:,event.nonzero()]                #print tc, ini                tc += tau                steps +=1                if a0 == 0: break            self.res[tim,:,round] = ini            if a0 == 0: break#        tvec = tvec[:tim]#        self.res = res[:tim,:,round]        return steps    def CR(self,pv):        """        Composition reaction algorithm        """        pass        def main():    vars = ['s','i','r']    ini= [500,1,0]    rates = [.001,.1]    tm = array([[-1,0],[1,-1],[0,1]])        prop=[lambda r, ini:r*ini*ini,lambda r,ini:r*ini]    M = Model(vnames = vars,rates = rates,inits=ini, tmat=tm,propensity=prop)    t0=time.time()    M.run(tmax=80,reps=1000)    print 'total time: ',time.time()-t0    #print res#    from pylab import plot , show, legend#    plot(t,res,'-.')#    legend(M.vn,loc=0)#    show()    if __name__=="__main__":    import cProfile    cProfile.run('main()',sort=1)    main()`

The Gillespie SSA is a very simple algorithm, so I thought it would be nice to blog about it and have other Pythonistas take a look at my code and possibly find a way to optimize it further. Having the core code run as fast as possible is very important for this application, because you normally have to run thousands of replicates of a every run due to its stochastic nature. This "need for speed", made me try Cython, but due to my inexperience with it, I couldn't squeeze much performance out of it:

`from numpy.random import uniform, multinomial, exponential#from numpy import arange, array, empty,zerosimport numpy as npcimport numpy as npimport timeDTYPE = np.doublectypedef np.double_t DTYPE_tctypedef np.int_t INT_tcdef class Model:    cdef object vn,rates,inits,pv    cdef np.ndarray tm,res,time,series    cdef int pvl,nvars,steps    cdef object ts    def __init__(self,vnames,rates,inits, tmat,propensity):        '''         * vnames: list of strings         * rates: list of fixed rate parameters         * inits: list of initial values of variables         * propensity: list of lambda functions of the form:             lambda r,ini: some function of rates ans inits.        '''        self.vn = vnames        self.rates = rates        self.inits = inits        self.tm = tmat        self.pv = propensity#[compile(eq,'errmsg','eval') for eq in propensity]        self.pvl = len(self.pv) #length of propensity vector        self.nvars = len(self.inits) #number of variables        self.time = np.zeros(1)        self.series = np.zeros(1)        self.steps = 0    def run(self, method='SSA', int tmax=10, int reps=1):        cdef np.ndarray[DTYPE_t,ndim=3] res = np.zeros((tmax,self.nvars,reps),dtype=float)        tvec = np.arange(tmax)        self.res = res        cdef int i, steps        if method =='SSA':            for i from 0 <= i<reps:                steps = self.GSSA(tmax,i)            print steps,' steps'        elif method == 'SSAct':            pass        self.time=tvec        self.series=self.res        self.steps=steps        def getStats(self):        return self.time,self.series,self.steps        cpdef int GSSA(self, int tmax=50,int round=0):        '''        Gillespie Direct algorithm        '''        ini = self.inits        r = self.rates        pvi = self.pv        cdef int l,steps,i,tim        cdef double a0,tc, tau        #cdef np.ndarray[INT_t] tvec        cdef np.ndarray[DTYPE_t] pv        l=self.pvl        pv = np.zeros(l, dtype=float)        tm = self.tm        #tvec = np.arange(tmax,dtype=int)        tc = 0        steps = 0        self.res[0,:,round]= ini        a0=1.        for tim from 1<= tim <tmax:            while tc < tim:                for i from 0 <= i <l:                    pv[i] = pvi[i](r,ini)                #pv = abs(array([eq() for eq in pvi]))# #propensity vector                a0 = a_sum(pv,l) #sum of all transition probabilities                #print ini#,tim, pv, a0                tau = (-1/a0)*np.log(np.random.random())                event = multinomial(1,(pv/a0)) # event which will happen on this iteration                ini += tm[:,event.nonzero()]                #print tc, ini                tc += tau                steps +=1                if a0 == 0: break            self.res[tim,:,round] = ini            if a0 == 0: break#        tvec = tvec[:tim]#        self.res = res[:tim,:,round]        return steps    def CR(self,pv):        """        Composition reaction algorithm        """        pass        def main():    vars = ['s','i','r']    cdef np.ndarray ini= np.array([500,1,0],dtype = int)    cdef np.ndarray rates = np.array([.001,.1],dtype=float)    cdef np.ndarray tm = np.array([[-1,0],[1,-1],[0,1]])        prop = [l1,l2]    M = Model(vnames = vars,rates = rates,inits=ini, tmat=tm,propensity=prop)    t0=time.time()    M.run(tmax=80,reps=1000)    print 'total time: ',time.time()-t0    cdef double a_sum(np.ndarray a, int len):    cdef double s    cdef int i    s=0    for i from 0 <=i <len:        s+=a[i]    return s        def l1(np.ndarray r,np.ndarray ini):    return r*ini*inidef l2(np.ndarray r,np.ndarray ini):    return r*ini`
That's it. I hope this code will serve to inspire other scientists which need to use this type of simulations. In the next few days I will organize it in a package and make it available somewhere.

I would appreciate any suggestions to speed up the gillespie module, both the Python and the Cython versions.

vumavu said...

Could you give the reference for the SEIR model example which you have taken here?

Unknown said...

@vumavu: This is a standard textbook epidemiological model. You can read about it on Wikipedia
http://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology

vumavu said...

"Example of an SEIR model with two Infectious classes: subclinical(Is) and clinical(Ic)"

Where can I find about this particular example? I was browsing for this example. I could not find the reference any where.

Unknown said...

@vumavu: I have a paper with coleagues submitted to Plos One. And I am writing another one that uses this model. This particular variation is a good match for the epidemiology of Influenza.

vumavu said...

can you send me the paper which you send to Plos One?
my email: vumavu@gmail.com

Unknown said...

@Coehlo. Hi there. I have written in python examples of the exact ssa and tau-leap ssa, from the book of M. Keeling & P. Rohani "Modeling Infectious Diseases in Humans and Animals". You can find the specific programs in chapter 6 (6.3 to 6.6) either on the online material of the book @ http://modelinginfectiousdiseases.org/ or on the wiki of the website i am keeping @ http://wiki.deductivethinking.com/wiki/Python_Programs_for_Modelling_Infectious_Diseases_book

To my view in order to make an workable programming algorithm out of gillespie ssa needs a meta-programming approach, that will give the user the flexibility to try different scenarios, such as the classes of the individuals and enable more complex models. I didn't have the time to look at R implementation of gillespieSSA to see how it handles it.

Using numpy arrays is fast by default. I managed to optimize for my paper on S. typhimurium at pig farms on risk analysis (DOI:http://dx.doi.org/10.1111/j.1539-6924.2009.01274.x) the tau-leap method using Pyrex and parallel computing, but trying to optimize the ssa algorithm I had to leave some arrays and functions of numpy out leading to the same or worst results with the C translated code than with the Python code.

@ vumavu. Hi there. The theory of different infectious classes is gaining more respect time to time. These different classes may differ in observation or not of symptoms, duration of stage, different effect in transmission. ISVEE conference XII (2009) (http://blog.deductivethinking.com/?p=167) had examples of Salmonella and E. coli on this. There are published papers also on these (salmonella and e.coli in cattle and pig farms). I have also a paper published on S. typhimurium at pig farms on risk analysis (DOI:http://dx.doi.org/10.1111/j.1539-6924.2009.01274.x) using two different infectious classes with two different propagation rates. Other human diseases like hepatitis B are cases of two different rates of infection.

Unknown said...
This comment has been removed by the author.
Unknown said...
This comment has been removed by the author.
Unknown said...

@Ilias: First of all congratulatin on your effort to port the examples in Keeling&Rohani's book to Python and also release the code under GPL. Good Job!.

I took a look at the Tau-leap implementation and I am currently in the precess of trying to optimize it and also modify it so that it can be easily used for any kind of model. Given that the code is GPL licensed, I'll probably add the implementation to BIP (http://code.google.com/p/bayesian-inference/) which is where my Gillespie code is maintained (more specifically in the subpackage SDE). You are naturally welcome to contribute code directly to BIP if you want. If you feel inclined to do it, we should talk more over e-mail since the project's roadmap page is a bit outdated.

Regarding you idea of metaprogramming for implementing more complex models: I fully agree, and I also have a student that did just that to implement a 200+ reactions model, using BIP. She did it using string formatting + eval.

I also liked you paper on Risk Analysis.

Kimberly said...

hello friends I really liked this information, a few days ago I read something similar on a site called wound infections, I would like to receive updates on this issue, as it is very interesting, thanks!

బీపీ said...

Hi there,

Would mind explaining how you arrived at the tmat?

I see that it has to do with the reaction equations but cant quite get it easily.. .

u made my work much easier btw, thanks :)