Digital Scholarship: Enlightenment or Devastated Landscape? The Right to Read is the Right to Mine

(Glen Feshie, remains of forest, CC-BY-SA 2.0 Ian Shiell )


I am honoured to have been invited to give a talk in ten days’ time in the University of Edinburgh
(“Research: doing it faster, doing it differently” . I have strong connections with Edinburgh and Scotland (having lived in Stirling for 15 years), and have worked closely with the Informatics Forum (School). Here’s the abstract, and then commentary.



Digital Scholarship: Enlightenment or Devastated Landscape? The Right to Read is the Right to Mine

Peter Murray-Rust and University of Cambridge


Over 5000 scholarly articles are published per day and it’s becoming impossible to keep up. For example, researchers in systematic medical reviewing have to “read” 10000 articles in 10 days to filter out those suited for meta-analysis.  Another spends an hour per day “reading” the literature to find where her work has been mentioned. An HEP researcher measures data off graphs by hand at the rate of one per day.
Machines can solve many of these problems and I’ll demonstrate our software [2]. But the major problem is political. It’s a fight for the soul of the Digital Enlightenment and reformers are in danger of losing. The major publishers “control” access to, and re-use of scholarship. A Dutch statistician, Chris Hartgerink, is interested in the use (and misuse) of statistical measures (such as P-values); he downloaded 30,000 articles so programs could select those of interest. The mega-publisher Elsevier wrote to his University and demanded he stopped his research, and the University apparently complied.
In the UK we have won a small freedom in 2014 – we can now mine the literature for “non-commercial research” although we probably can’t publish the bulk of the results due to copyright. The European Parliament and commission is trying to follow and 2015 has seen massive political activity over Text and Data Mining (TDM, aka ContentMining). I express this as “The Right to Read is the Right to Mine”. Julia Reda, MEP has drafted coherent and positive proposals, but they have had massive anti-lobbying from vested interests such as scholarly publishers.
Publishers are increasingly adding legal, contractual, technical and political barriers to assert their control over the whole of electronic scholarship. Worryingly they are building an infrastructure which will coerce scholars to become digital serfs without academic rights or power.
I believe that the Enclosure of the Digital Commons is potentially as serious as the Highland Clearances – a “devastated landscape” [3]. A major part of the digital Enlightenment is machines and humans working together under a fair and just system.


[2] PMR, Mark MacGillivray, Informatics PhD graduand., Richard Smith-Unna, Cambridge

[3] Fraser Darling see



In 2003 , the pioneers of “Open Access” created a vision in the Budapest Declaration of Open Access:


The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds. Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge.


This has inspired many of us to work towards creating a communal global knowledge base for Science and Medicine, with Free and Open Standards, Content, Software. I’m proud to have been part of the group which created – a vision and statement of our rights in the Digital Age.


New technologies are revolutionising the way humans can learn about the world and about themselves. These technologies are not only a means of dealing with Big Data1, they are also a key to knowledge discovery in the digital age; and their power is predicated on the increasing availability of data itself. Factors such as increasing computing power, the growth of the web, and governmental commitment to open access to publicly-funded research are serving to increase the availability of facts, data and ideas.

However, current legislative frameworks in different legal jurisdictions may not be cast in a way which supports the introduction of new approaches to undertaking research, in particular content mining. Content mining is the process of deriving information from machine-readable material. It works by copying large quantities of material, extracting the data, and recombining it to identify patterns and trends.

At the same time, intellectual property laws from a time well before the advent of the web limit the power of digital content analysis techniques such as text and data mining (for text and data) or content mining (for computer analysis of content in all formats). These factors are also creating inequalities in access to knowledge discovery in the digital age. The legislation in question might be copyright law, law governing patents or database laws – all of which may restrict the ability of the user to perform detailed content analysis.

Researchers should have the freedom to analyse and pursue intellectual curiosity without fear of monitoring or repercussions. These freedoms must not be eroded in the digital environment. Likewise, ethics around the use of data and content mining continue to evolve in response to changing technology.

Computer analysis of content in all formats, that is content mining, enables access to undiscovered public knowledge and provides important insights across every aspect of our economic, social and cultural life. Content mining will also have a profound impact for understanding society and societal movements (for example, predicting political uprisings, analysing demographical changes). Use of such techniques has the potential to revolutionise the way research is performed – both academic and commercial.


But not everyone shares this and the vision of Free and Open access to knowledge is in the balance. For every step of freedom, there is a massive encircling and enclosing of the scholarly commons. If we are not in control of the infrastructure, we will have given up our freedom to develop the benefits of digital knowledge.


Published by

the bear

I have another blog in real life...

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s