TheContentMine is a project to extract all facts from the scientific literature. It has now been going for about 6 weeks – this is a soft-launch. We continue to develop it and record our progress publicly. It’s a community project and we are starting to get offers of help right now. We welcome these but we shan’t be able to get everything going immediately.
We want people to know what they are committing to and what they can expect in return. So yesterday I drafted an initial Philosophy – we welcome comments.
Our philosophy is to create an Open resource for everyone created by everyone. Ownership and control of knowledge by unaccountable organisations is a major current threat; our strategy is to liberate and protect content.
We are a meritocracy. We are inspired by Open communities such as the Open Knowledge Foundation, Mozilla, Wikipedia and OpenStreetMap all of whom have huge communities who have developed a trustable governance model.
We are going ahead on several fronts – “breadth-first”, although some areas have considerable depth. Just like Wikipedia or OSM you’ll come across stubs and broken links – it’s the sign of an Open growing organisation.
There’s so much to do, so we are meeting today to draft maps, guidelines, architecture. We’re gathering the community tools – wikis, mail lists, blogs, Github, etc. As the community grows we can scale in several directions:
- primary source. Contributors can choose particular journals or institutions/theses to mine from.
- subject/discipline. You may be interested in Chemistry or Phylogenetic Trees, Sequences or Species.
- technology. Concentrate on OCR, Natural Language Processing, Crawling, Syntax or develop your own extraction techniques
- advocacy and publicity. A major aim is to influence scientists and policy makers to make content Open
- community – its growth and practice.
We are developing a number of subprojects which will demonstrate our technology and how the site will work. Hope to report more tomorrow.