“We think ContentMine has the potential to transform the way we work!”

Malcolm Macleod is Professor of Neurology and Translational Neuroscience at the Centre for Clinical Brain Sciences, University of Edinburgh, and Head of Neurological Diseases and Stroke at NHS Forth Valley.

Professor Macleod and his team have a problem: they simply do not have the time to read all of the literature that they need to.

Prof. Macleod explains this in more detail in this short video from the Wellcome Trust:


Data and text mining tools allow automated data and text extraction from scientific literature, such that specific measures and details can be searched for and filtered within published papers. The application of these technologies can therefore enable researchers to refine their searchers and assimilate relevant information on a specific subject within a timely manner, helping individuals to make informed decisions about their research

To help Prof Macleod and his team, we held a ContentMine workshop in Edinburgh. One of the main purposes of this workshop was to tailor our text & data mining software tools for the specific needs of this group. The group need to extract data from images in a large number of papers – a task that would take a human many months:

We have around 3500 publications to extract data from. These publications have, on average, 3 graphs per publication that we would extract data from.

The CAMARADES workshop

The ContentMine team visited the CAMARADES team at the Centre for Clinical Brain Sciences, University of Edinburgh on 26th – 27th May, 2015.

From ContentMine, Peter Murray-Rust, Graham Steel  & Mark MacGillivray were in attendance along with Gillian Currie, Emily Sena and around six other members of Malcolm’s group, CAMARADES.

Part of Malcolm‘s research is in Systematic Reviews of Animal Testing. Emily has to measure hundreds of diagrams by hand to extract the data from graphs – perhaps half a day per graph and, perhaps worse, really boring. Gillian – a senior researcher like Emily – who had to read 30,000 papers last year – that’s one every 5 minutes.
The group was excited about our technology and within a day had not only learnt how to use it, but also how to create their own applications. We can process a paper – automatically – in a few seconds. That means that huge amounts of Gillian and Emily’s time can be saved for productive work.
Day one commenced with Gillian giving a presentation about the group’s work. This was then followed by a short presentation from Peter.
Gillian handed out copies of the ARRIVE Guidelines which outline how to report experiments that use animals – the research group wanted to be able to figure out whether a given paper was conforming to the guidelines. Automating this task would become the theme of the workshop. Before we could start tackling the problem, we needed to equip ourselves with some tools…
Next up to speak was Mark who gave a whirlwind tour of some of the features of ContentMine.
After lunch, we showed the participants two platforms we use for content mining analysis. First, everyone downloaded virtualbox along with the latest version of our virtual machine. Secondly, we demonstrated our web-based canary UI.
Mark and also Peter then took us through how to harvest literature for mining with various tools – including our own getpapers and quickscrape. Peter then took us through the normalisation and basic analysis of the downloaded material with our software norma and ami. We broke into teams and physically highlighted sections of a paper that we would like to extract information from. Peter then showed us how to extract information from diagrams. After an afternoon break, Peter reviewed the papers that the groups had annotated earlier and then we started looking at how to automate the annotation using regular expressions.


With that, it was time for all to get their hands dirty until close of play one day one.
All was not yet finished for the day yet! Time for some refreshments and then we were joined by Malcolm who took us all out for a truly splendid dinner at The Appartment at Barclay Place, Edinburgh.
Day two commenced with a briefing from Peter to recap on day one and on how best for us to move forward to assist the team.
We moved into the team office to do some further hacking, with more work on creating regexes. In all, the team came up with over 200 regexes to tag and identify the following metadata:
  1. Ethical Statement
  2. Sample Size
  3. Experimental outcomes
  4. Allocating animals to experimental groups
  5. Experimental Animals
  6. Housing and Husbandry
  7. Study Design and Procedures
  8. Statistical Methods

We then ran this on six PLOS ONE papers successfully – it was a bit of a eureka moment for all to see the results!

Malcolm – who was ultra busy – joined us briefly to recap the day.

Day two concluded with Peter giving a seminar to around 20 people. All in all, this was a very productive two days for all concerned and we will be continuing to work with Malcolm’s group.
Gillian Currie summarised the event nicely:
In our work performing systematic review and meta-analysis of preclinical studies we spend many research hours extracting data from publications. This workshop has introduced us to tools that could, with some fine-tuning, make this process semi-automated and much less painful! We think it has the potential to transform the way we work!

Published by


Scotland's (main, but not only) #OpenScience #OpenAccess #OpenData #OpenSource #OpenKnowledge & #PatientAdvocate Loves blogging http://figshare.com/blog Glasgow, Scotland.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s