Open Notebook Science at FWF, Vienna


Post from FWF





ContentMining: My Video to Shuttleworth about our proposed next year

I have had two very generous years of funding from the Shuttleworth Foundation to develop TheContentMine. Funding is in yearly chunks and each Fellow must reapply if s/he wants another year (up to 3). The mission is simple: change the world. As with fresh applicants we write a 2-page account of where the world is at, what and how we want to change things.

TL;DR I have reapplied and submitted a 7 minute video ( ).

These two years have been a roller-coaster – seriously changed my life. I can honestly say that the Fellowship is one of the most wonderful organizations I know. We meet twice a year with about 20 fellows/almuni/team committed to making sure the world is more just, more harmonious, and that humanity and the planet have a better chance of prospering.

There’s no set domain of interest for applying, but Fellows have a clear sense of something new that could be done or something that badly needs mending. Almost everyone uses technology, but as a means, not as an end. And almost everyone is in some way building or enhancing a community. I can truly say that my fellow Fellows have achieved amazing things. Since we naturally live our lives openly you’ll find our digital footprints all over the Internet.

I’m not going to describe all the projects – you can read the web site and you may know several Fellows anyway.

  • Some are trying to fill a vacuum – do something exciting that is truly visionary – and I’ll highlight Dan Whaley’s . This project (and ContentMine is proud to be an associate) will bring annotation to documents on the Web. That sounds boring – but it’s as exciting as what TimBL brought with HTML and HTTP (which changed the world). Annotation can create a read-write web where the client (that’s YOU!) can alter/enhance our existing knowledge and it’s so exciting it’s impossible to see where it will go. The web has evolved to a server-centric model where organizations pump information at dumb clients and build walled gardens where you are trapped in their model of the world. Annotation gives you the freedom to escape , either individually or in subcommunities.
  • Others are challenging injustice – I’l highlight two. Jesse von Doom ( ) is changing the way music is distributed – giving artists control over their careers. Johnny West ( ) is bringing transparency to the extractive industries. Did you know “BP” consists of over 1000 companies? Where the fracking contracts in UK are?

So when I launched TheContentMine as a project in 2014 we were in the first category. Few people were really interested in ContentMining and fewer were doing it. We saw our challenge as training people, creating tools, running workshops, and that was the theme of my first application ( ). Our vision was to create a series of workshops which would train trainers and expand the knowledge and practice of mining. And the world would see how wonderful it was and everyone would adopt it.


In the first year we searched around for likely early adopters, and found a few. We built a great team – where everyone can develop their own approaches and tools – and where we don’t know precisely what we want for the future. And gradually we get known. So for the second year our application centred on tools and mining the (Open ) literature ( It’s based on the idea that we’d work with Open publishers, show the value, and systematically extend the range of publishers and documents that we can mine. And that’s now also part of our strategy.

But then in 2014 politics…

The UK has already pushed for and won a useful victory for mining. We are allowed to mine any documents we have legal access to for “non-commercial research”. There was a lot of opposition from the “rights-holders” (i.e. mainstream TollAccess publishers to whom authors have transferred the commercial rights of their scientific papers). They’d also been fighting in Europe under “Licences for Europe” to stop the Freedom to mine. Indeed I coined the phrase “The Right to Read is the Right to Mine” and the term “Content Mining”. So perhaps when the UK passed the “Hargreaves” exception for mining, the publishers would agree that it was time to move on.

Sadly no.

2015 has seen the eruption of a fullscale conflict in EU over the right to mine. In 2014 Julia Reda MEP was asked to create a proposal for reform of copyright in Europe’s Digital Single Market. (The current system is basically unworkable – laws are different in every country and arcanely bizarre [1]). Julia’s proposal was very balanced – it did not ask for copyright to be destroyed – and preserved rights for “rights-holders” as well as for re-users.

ContentMining (aka Text and Data Mining, TDM) has emerged as a totemic issue. There was massive publishers pushback against Julia proposal, epitomised in the requirement for licences [2]. There were over 500 amendments, many being simply visceral attacks on any reform. And there has been huge lobbying, with millions of Euros. Julia could get a free dinner several times over every night!

There is no dialogue and no prospect of reconciliation. There is simply a battle. (I am very sad to have to write this sentence)

So ContentMine is now an important resource for Freedom. We are invited to work with reforming groups (such as LIBER who have invited us to be part of FutureTDM, an H2020 project to research the need for mining). And we accept this challenge by:

  • advocacy. This includes working with politicians, legal experts, reformers, etc.
  • software. Our software is unique, Open, and designed to help people discover and use ContentMining either with our support or independently.
  • Science. We are tackling real problems such as endangered species, and clinical trials.
  • Hands-on. We’ve developed training modules and also run hands-on workshops to explore scientific and technical challenges.
  • Partners. We’re working with university and national libraries, open publishers, and others.

So I’ve put this and more into the video. [3] This tells you what we are going to do and with whom. And I’ll explain the detail of what we are going to do in a future post.


[1] Read and laugh, then weep. You cannot publish photos of the Eiffel Tower taken at night….

[2] Licensing effecetively means that the publishers have complete control over who, when, where, how is allowed to mine content (and we have seen Elsevier forbidding Chris Hartgerink to do research without their permission, see and earlier blog posts).

[3] It’s a non-trivial amount of work. Approximately 1 PMR-day per minute of final video. It took time for the narrative to evolve (thanks to Jenny Molloy and Richard Smith-Unna for the polar bear theme). And it’s CC-BY.