Thursday, February 19, 2009

Computer Supported Collaborative Science

The title of this post is intentionally the same as this one by Greg Wilson. His post brings up the important issue of the lack of efficient collaborative science tools, and asks for opinions on the subject. Here is my personal view:

Though science has invented collaborative open-source development centuries ago, it is currently not the best example of such practices, being surpassed on many fronts by the OSS (Open Source Software) community. The blame for this situation can be partly attributed to commercial scientific publishers, the current method of evaluation of scientific productivity etc. But the goal of this article is not to discuss this.

The OSS community have matured a series of tools and practices to maximize the rate of collaborative production of good quality software. By good quality we mean not only bug-free working software, but software which meets criteria such as: efficiency, desirability, readability (you can't form a developer community around unreadable code), modularity,etc.

Science currently fails to even meet the most basic criterion it sets for itself: reproducibility. Most papers do not include sufficient information for its results to be replicated independently. You can compare a scientific paper, to the binary compiled version of a software, it shows its purpose but does not help those which would like to re-create it independently. However in OSS, Binary files always carry information about where its complete source code can be found and downloaded freely. This closes the circle of reproducibility.

When it comes to collaborating with potentially hundreds of peers in developing code, The OSS community have perfected tools such as distributed version control systems(DVCS), bug trackers, wikis and what not, which have been proven indispensable to the production and maintenance of serious OSS projects. Last but not least, OSS projects are never done, which is also a fundamental rule for science, but does not applay to scentific papers. Unfinished papers in science are almost worthless(with the notable exceptions of workng papers and pre-prints).

So, heading back to the focus of this article, what would be the desirable fatures of a productive Computer Supported Collaborative Science (CSCS) tool?
  • Free-software
  • Web based interface
  • DVCS for code and manuscripts
  • Wikis for outlining of ideas and hypotheses
  • Bug tracking for reporting of errors in the analysis
  • Database browsing capabilities for uploading experimental data and interactively exploring it
  • Simple visualizations tools to explore data. Could be based on Google graph/visualization APIs.
  • For my research area at least: Integrated Sage system to foster interactive/collaborative development of computational analytical methods.
  • Your wish here....
This is my take on the issue Greg, I even have some grant money to help realize this, the hard part has been to find like-minded collaborators which believe in the idea.

If you read this and think this has already been accomplished by some OSS project, PLEASE let me know.


anand said...

Hi Flavio,

I agree with what you say, and would happily support the project if there were some indication that it would be accepted, at least by a viable community of early adopters.

The internet already provides some form of the tools you propose, scattered around. Why don't people use them?

Scientists and open-source developers have different motives, and it might be that a more collaborative, open process just wouldn't help most scientists get where they want to go.

I know at least one other person who feels that the publication system needs an overhaul, but understandably he isn't willing to jeopardize his career by taking chances on new low-impact journals or their replacements.

I'm not trying to rain on your parade, but I do think you need to identify a credible target audience if you're going to be successful.

anand said...

So lots of scientists I know avoid talking about work in progress to people who have the skills 'scoop' them: implement one of their ideas quickly and beat them to publication. Is there any way to license a working paper to protect against that?

usagi said...

Hi Anand,

I'll try to answer both your comments here.

You are right when you say tha the internet already provides the tools I mention and yet people don't use them.

What is needed is aggregation os such tools on an unified platform, which do not compete with traditional publication tools, but complete them. Maybe I'll detail the way I view such integration in a new post.

Regarding the fear of scooping. I believe this fear has been highly exagerated. The dissemination of the adoption of pre-print servers, such as ArXiv of nature precedings, have helped protect Ideas and data by associating them with their authors in an agile way, before an paper even reaches final format for submission on a standard journal. If someone publishes data from another author, which already appeared in a pre-print as his/her original data, This person can be sued for stealing those results since there is published evidence that those results had appeared before in someone else's work. With traditional publication channels, "scooping" charges are much harder to prove.
Preprint servers, have strict licenses on published working papers, which will make it even harder for information "thieves" to claim they didn't know the information was not public.

Anonymous said...

Hi, Flavio.

You might want to bring this subject to the debian-science mailing list in