ContentMine featured in Horizon magazine article “Copyright shift would put Europe ahead in ‘future of research’ data mining”

Horizon magazine featured an article on text and data mining and specifically the European Commission proposal for a copyright exception, currently covering “public or private organisations that are carrying out scientific research in the public interest”.

Read the full article >>

Dr Peter Murray-Rust is director of ContentMine, a not-for-profit organisation which has developed software that enables researchers to search through scientific papers on a particular subject. He gives the example of the Zika outbreak as an area where TDM can help to enhance knowledge.

‘We’re going to need to know a lot more about Zika, and much of it may already be in the scientific literature that’s been published but that we don’t read. We don’t read it because there’s so much, so we’ve built a machine, ContentMine, that will liberate the facts from the literature.’

Latest leak on EU copyright reforms including TDM and its scope – commercial included.

Statewatch is a non-profit organisation founded in 1991 that monitors the state and civil liberties in the European Union.

Yesterday evening, it released the following tweet:-

The document itself (PDF) is 182 pages in length and the section relating to text and data mining (TDM) can be found in pages 93 – 108.

INTRO

4.3. TEXT AND DATA MINING
4.3.1. What is the problem and why is it a problem?
Problem: Researchers are faced with legal uncertainty with regard to whether and under which conditions they can carry out TDM on content they have lawful access to.
Description of the problem: Text and Data Mining (TDM) is a term commonly used to
describe the automated processing (“machine reading”) of large volumes of text and data to uncover new knowledge or insights. TDM can be a powerful scientific research tool to analyse big corpuses of text and data such as scientific publications or research datasets.

Continued

It has been calculated that the overall amount of scientific papers published worldwide may be increasing by 8 to 9% every year and doubling every 9 years. In some instances,more than 90% of research libraries’ collections in the EU are composed of digital content. This trend is bound to continue; however, without intervention at EU level, the legal uncertainty and fragmentation surrounding the use of TDM, notably by research organisations, will persist. Market developments, in particular the fact that publishers may increasingly include TDM in subscription licences and develop model clauses and practical tools (such as the Cross-Ref text and data mining service), including as a result of the commitments taken in the 2013 Licences for Europe process to facilitate it may partly mitigate the problem. However, fragmentation of the Single Market is likely to increase over time as a result of MS adopting TDM exceptions at national level which could be based on different conditions.

Four options are suggested on TDM reform.

Option 1 – Fostering industry self-regulation initiatives without changes to the EU legal framework.

Option 2 – Mandatory exception covering text and data mining for non-commercial scientific research purposes.

Option 3 – Mandatory exception applicable to public interest research organisations covering text and data mining for the purposes of both non-commercial and commercial scientific research.

Option 4 – Mandatory exception applicable to anybody who has lawful access (including both public interest research organisations and businesses) covering text and data mining for any scientific research purposes.

The  recommendation is for option 3 which allows Public Interest Research Organisations (Universities and research institutes) to mine for Non-Commercial AND Commercial purposes. This appears mainly to support industrially funded research.  On the whole it seems to be slight progress.

Option 3 is the preferred option. This option would create a high level of legal certainty and reduce transaction costs for researchers with a limited impact on right holders’ licensing market and limited compliance costs. In comparison, Option 1 would be significantly less effective and Option 2 would not achieve sufficient legal certainty for researchers, in particular as regards partnerships with private operators (PPPs).Option 3 allows reaching the policy objectives in a more proportionate manner than Option 4, which would entail significant foregone costs for rightholders, notably as regards licences with corporate researchers. In particular, Option 3 would intervene where there is a specific evidence of a problem (legal uncertainty for public interest organisations) without affecting the purely commercial market for TDM where intervention does not seem to be justified. In all, Option 3 has the best costs-benefits trade off as it would bring higher benefits (including in terms of reducing transaction costs) to researchers without additional foregone costs for rightholders as compared to Option 2 (Option 3 would have similar impacts on right holders but through a different legal technique i.e. scope of the exception defined through the identification of specific categories of beneficiaries rather than through the “non-commercial” purpose condition). The preferred option is also coherent with the EU open access policy and would achieve a good balance between copyright as a property right and the freedom of art and science.

Some reactions thus far via social media.

//platform.twitter.com/widgets.js

Sci-Hub and my personal position on legality 6/n

I have just blogged on the legal aspects of ContentMining:

https://blogs.ch.cam.ac.uk/pmr/2016/05/06/sci-hub-and-legal-aspects-of-contentmining/ (which also contains links to previous blogs.

These are general considerations but also relevant to the current issue of Sci Hub.

I am now going to set out my personal position and, where it impacts legally or organizationally , on how TheContentMine might behave. I am not going to impose any other non-legal requirement on my colleagues. In a non-ContentMine scenario they can think and act however they feel best.

When I came to Cambridge, ca 17 years ago I was in love with Universities. My first job was Assistant Lecturer at the University of Ghana when I was 22 years old (sic). My first UK job was 4 years later at the new University of Stirling. I helped build the University. It was wonderful and I loved it and have pride in what I helped with. After my time at Glaxo I moved to a part-time chair at the University of Nottingham to set up virtual science education and thence to Cambridge to a new Centre. I threw my heart into it.  I love all of them. I love the people.

But the system is getting worse. And one of the causes, or symptoms, is scholarly publishing. When I started this blog – almost 10 years ago – I was starry-eyed about the possibilities. I was going to build an artificially intelligent computer. It would have the chemical intelligence of a first year undergraduate. But – since much “intelligence” is based on knowledge – it needed knowledge in chemistry journals “published” by “publishers”. And they have completely failed me. I spend my days as a reformer as well as – hopefully – an innovator.

So I have been dissatisfied with scholarly publishing for about 10 years.

  • I’ve tried to change it constructively. Working with publishers. Little or no interest.
  • I’ve tried to get scientists interested. Little or no interest.
  • I’ve tried to get libraries involved. Little or no interest.
  • I’ve got some funding. Mainly JISC and M$. No one interested in the output.

… it’s clear that trying to do things through conventional university channels will take longer than my lifetime.
So what do I do?

  • I keep buggering on (W.S.Churchill). This issue is too important to drop.
  • I appeal to the non-academic world.
  • I build stuff. Stuff is wonderful.

And I think like a revolutionary.

How can I and others change the world?

It’s clear that laws and practice are broken. Copyright is being used to muzzle speech and creativity, not create it. Universities chase glory rather than public good.

I’ve got two main options:

  • Work within the law
  • Work outside the law

Both can be effective, and both can be ineffective. And reformers sometimes move from one to the other.

But you can’t do both at once.

There is a balance between the state and the individual. If you want the state to change the law then the state should respect the individual (e.g. https://en.wikipedia.org/wiki/Due_process ) and the individual should respect the state. This happened in UK in 2012 when the government asked for views on Copyright reform. They were prepared to listen to anyone – citizens, universities, publishers, etc. This is a fair and balanced approach – Government listens to everyone, makes a balanced decision through the Civil Service, recommends it to parliament (in this case the House of Lords), takes further amendments, asks for a vote and it passes into law (or at least a Statutory Instrument).

It’s a fair process. It’s democracy. Yes,  democracy is the worst form of government, except for all the others (WSC). I wish the result had allowed commercial mining. Others with the SI hadn’t been passed at all. But I accept it.

But the publishers have not accepted the result.

They have been using other methods to make it harder to use the law. These include lobbyists, disinformation and technical barriers. These are probably legal, but unethical and immoral. They have been spreading disinformation. This isn’t just my judgment, it’s Julia Reda MEP’s and many other policy makers. She’s had 80 offers to dinner in the first week after her report. She’s been fed information which may be possibly at variance with reality. Not l**s but https://en.wikipedia.org/wiki/Terminological_inexactitudes or  those https://en.wikipedia.org/wiki/Economical_with_the_truth .

That’s why we worked to make ContentMine runnable by any MEP. And why Rik SU is building a GUI for it. So that Julia and her colleagues can challenge any assertions.

This is very sad. The Hargreaves SI was passed. The publishers fought in Europe to require miners to get permissions through licences (“Licences for Europe”). It was a standoff. Publishers are still trying to get libraries and academics to sign licences rather than accept Hargreaves.

What should I do?

In 2013 I thought that very few people understood what mining was about. So I was delighted that Shuttleworth offered me a Fellowship. I was delighted that I can continue to use University resources. I was delighted that I could build a system and devote my “retirement” to getting the practice of mining adopted by everyone.

Not just academics, but citizens. Taxi-drivers. Patients, planners…

But the publishers have made it very hard. And there is massive lobbying against legal reform in Brussels. The new phraseology “by Public Interest Research Organizations” is almost meaningless. It’s sufficiently frightening that very few will actually do it.

And currently I’m the only person in Europe doing legal content-mining (unless you tell me different).

And that’s a minor success for the Luddites in the publishing industry. They’ve frightened people (I’ve talked with some), bewildered others, offered no technical support.

So I have to make the decision.

  • Work within the law
  • Work without the law

I do not like breaking the law, I wouldn’t ask anyone else to, and in addition:

  • My funders expect me to work within the law. The project is to develop mining tools, strategy, policy, practice that will convince lawmakers
  • My University requires me to work within the law
  • The politicians that I talk to in UK and Brussels expect me to work within the law.

You can’t break the law just-a-little. So I’m not breaking it at all.

The downside is that the law we have in UK, and the law we might have some-time-in-Europe is very restrictive. I’m pioneering it to see what use it can be and where it needs enlarging. I can only do this if I am legal. I can say to lawmakers:
“It’s working here  but failing here and here and here and…”

It’s not what I want, but it’s the bargain that I make with legislation.

And in addition I work with other legal groups such as Open Forum Europe to help and be helped in getting legal progress.

History will decide whether trying to stop progress has been successful…

… or whether ContentMine has made a difference.

Sci-hub and Legal aspects of ContentMining

I have written today to my collaborators in ContentMine – staff, volunteers, advisory board and Shuttleworth funders and mentors. It’s on the legal aspects of mining. It’s long, but laws are complex. It’s meant to put everyone ‘s minds at rest – us, universities, Shuttleworth, etc. it’s not authoritative, but may be a useful guide. We’d love to have your feedback. tl;dr I’ve assessed the main problems and most people should assume we have taken a responsible and public approach.
ContentMine is preparing to mine the complete scholarly literature every day – about 10,000 scholarly articles.People from inside CM and from outside have recently raised the question of whether CM is breaking or intends to break the law. This has arisen in parts because of our intention to use the UK Copyright exception to mine the whole literature, and because of speculation about the possible use of our technology by “illegal” sites such as Sci-Hub.

NOTE: I am not a lawyer (IANAL) but I have spoken to several and am aware of general principles and practice.

The simple answer is simple:

CM does not intend to break the law and intends not to break the law.

and to my colleagues.
Do not worry. You will not end up in court. If anyone does – and it is unlikely – it will be me and I am prepared.

I shall expand on this in blog posts, but please be assured that I am actively assessing areas where the laws might be broken, especially inadvertently. Note, of course, that there are many other laws where we have to observe on a continual basis, and include health and safety, employment, racial discrimination, libel, immigration, etc. I get frequent updates from the Chemistry Department  as to what procedures we have to observe. You, I, contentmine.org and everyone are bound to observe and practice these laws. They are complex in detail, extent, interpretation and we generally manage by knowing the outline of the law. We don’t steal, and we don’t read the small print of what is and is not a theft (e.g. “illegal borrowing”). But in others, e.g. animal experiments or immigration, the small print is critical. “Ignorance of the law is no defence”.

But I will take the responsibility of guiding you and making sure that you don’t transgress inadvertently.

The  laws particularly relevant to contentmine.org in question include:

* copyright law

* sui generis database rights (Europe only)

* computer fraud law

* technological protection measures (TPM) and digital rights management (DRM)

* national security laws

Most of these laws have a concern about geo-location. We shall attempt to make sure that all our activities are carried out by UK staff, “in the UK”, on UK machines.  But what is legal here may be illegal elsewhere and vice versa. Note also that many laws, especially new ones cannot have definite answers until they are tested in a courtcase. Lawyers may give opinions (for fees) but ultimately the court decides.
These laws are complex and often recent and – like many laws – it is possible to transgress unknowingly. We have have to educate ourselves and to behave responsibly in actions and language. If anyone is unsure they should raise the issue.
Note that by discussing this in public we will show our good faith and also be alerted by others to potential problems and misinterpretations.
Copyright law is exceedingly complex and also depends on the country. What is legal in the US may not be in Britain and vice versa. It includes:
* the process of copying for the purpose of mining for non-commercial research
* storage of copied material
* republication of the (transformed) output as part of the research/audit/verifiability requirement.

We continually discuss this with lawyers and with librarians. No one can predict precisely what is allowed and what is not – it may depend on “impact on the market of the rights-holder”. All law includes a balance of risks – It is my responsibility and (for some content) the librarians to make sure that we have a balanced assessment.

We believe that our mining is fully allowed under the UK 2014 reform (“Hargreaves”). It would not be allowed if we took money from commercial companies and mined the literature solely for their benefit. Europe has noted that much research is a public/private partnership (I worked for 15 years in the Cambridge Unilever Centre, for example). Was this non-commercial? I would take the view that all the projects I worked on were. If I was paid extra to do private contract research for a company which would not be published it would be commercial.

Since I and ContentMine are probably the only group in UK at present who publicly intend to use Hargreaves there is no case law to answer these questions. We read the current public discourse and form a balanced judgment.

What copyright material can we hold on our machines? It is common for researchers to have thousands of copies of copyright material on their machines and no one is challenged. Unlike them, our material is in a secure computer room in Cambridge with physical access only by trusted staff and e-access only to 2-3 named and authorised people. If anyone wishes to “steal” the literature from our server we will actively prevent and report this. We are not, of course, ourselves redistributing any of the University subscription content other than facts and fair quotations. If, as we hope, the resource becomes useful in the University, we will work with library staff to create a legally acceptable approach where any Cambridge scholar can use the system.

How long can we hold it for? Mining is often an iterative process, so we may wish to re-run searches with new parameters. It would be a technical waste to have to re-download everything everyday. It would also put additional workload on the publisher’s servers. We can’t give an answer in days or months or years until we know what the likely usage patterns are.

What can we republish? Since facts are uncopyrightable we can publish them without permission (although in Europe we cannot systematically republish the contents of databases protected by sui generis. Journals and supplemental data are not databases). But:

“42”

is not a useful fact.

“The average snout-vent-length (SVL, see https://sizes.com/natural/lizards.htm ) of the common lizards (Zootoca vivipara) found on Borchester Common ( https://en.wikipedia.org/wiki/Borchester )  was 42 mm (+- 5) measured by 3 independent researchers using the Graduated Ruler and Eyeball Method (see http://www.wikihow.com/Use-a-Ruler )”

is a useful fact. We intend to publish some or all of the facts we extract without formal permission from the publisher.

Note that a fact does not have to be “true”. I don’t actually know the sizes of newborn sandlizards. But what I have stated is a fact. The result might be a misprint for 142 mm (which is possible for an adult). It is still a (potentiallly falsifiable) fact. It remains a fact regardless of further lizard research.
I will blog more on facts as “facts” are uncopyrightable.
* sui generis database rights. We do NOT currently intend to systematically extract facts from factual databases described as such and specifically created for the purpose of holding facts.
* computer fraud laws. We scrupulously avoid breaking these laws. They carry the additional features that they are criminal, and so prosecution would be by the police. The UK takes these very seriously and wishes to extend the maximum term of imprisonment to 10 years:http://arstechnica.com/tech-policy/2016/04/uk-file-sharing-10-years-jail-time/(I personally protest against this, but I do it legally).You should therefore take especial care not to share files “illegally”. This means that ContentMine cannot have any dealings with Sci-Hub as it is seen by many as an “illegal” filesharing . Read  Ars technica:

<quote>The UK government has responded to that issue by saying that it accepts there are concerns, and writes: “the policy intention is that criminal offences should not apply to low level infringement that has a minimal effect or causes minimum harm to copyright owners, in particular where the individuals involved are unaware of the impact of their behaviour.”

Another major worry was the use of the term “affect prejudicially” in judging copyright infringements, which many felt was too vague and could mean a single infringing file would fulfil the requirement—for example, if it were widely shared online. Many thought this set the threshold for committing an offence far too low.

The UK government said it was not aware of any cases where minor infringement had resulted in a criminal prosecution, but “agrees that the undefined term ‘affect prejudicially’ could give rise to an element of ambiguity.” The government is now proposing to introduce “re-worded offence provisions” to address that.

</quote>

It is extremely unlikely that we will trigger this law as we don’t deliberately intend to break it and deliberately don’t intend to break it. However #icanhazpdf is almost certainly “illegal” and also breaks the rules of the University. I have never used #icanhazpdf in either direction and never sent files to people who weren’t subscribed. ContentMine staff should not use #icanhazpdf.

In some cases crawling has been held to be a violation of the CFA acts of various flavours. I am not aware of any cases where scholarly publishers have used this to prosecute bona fide researchers, nor where the police have.,

Note also that many publishers know that I and others (e.g. Crystallography Open Database) have been crawling their sites for many years and by implication permit it. This includes Nature, Elsevier, American Chemical Society, Royal Society of Chemistry, Acta Crystallographica, Science. We are careful to adhere to responsible mining practice (see https://contentmining.files.wordpress.com/2015/06/responsible-content-mining-1.pdf )

Aaron Swartz’s case was – for many, including me – a serious miscarriange of justice. From Wikipedia:

(https://en.wikipedia.org/wiki/Computer_Fraud_and_Abuse_Act#Aaron_Swartz )

<quote>In the wake of the prosecution and subsequent suicide of Aaron Swartz, lawmakers have proposed to amend the Computer Fraud and Abuse Act. Representative Zoe Lofgren has drafted a bill that would help “prevent what happened to Aaron from happening to other Internet users”.[35] Aaron’s Law (H.R. 2454, S. 1196[36]) would exclude terms of service violations from the 1984 Computer Fraud and Abuse Act and from the wire fraud statute, despite the fact that Swartz was not prosecuted based on Terms of Service violations.[37]

In addition to Lofgren’s efforts, Representatives Darrell Issa and Jared Polis (also on the House Judiciary Committee) raised questions about the government’s handling of the case. Polis called the charges “ridiculous and trumped up,” referring to Swartz as a “martyr.”[38] Issa, who also chairs the House Oversight Committee, announced an investigation of the Justice Department’s prosecution.[38][39]

As of May 2014, Aaron’s Law was stalled in committee, reportedly due to tech company Oracle‘s financial interests.[40]

</quote>

* TPM and DRM

These are technical methods of prevent access to material and can include firewalls, encryption, specific tools, and possibly Captcha. We have bought legal advice and the result is not clear about whether Hargreaves allows us to circumvent them. The rule for all of us is that if there is any technical barrier to mining we should identify it and alert the librarians and possibly computer officers. Deliberately breaking this law could have serious consequences. Rest assured that I will publicize and comment on publishers who impose TPM.

* national security. It is very unlikely that we shall trigger this very serious offence. However, overzealous prosecutors or government departments – particularly in the US – have used such provisions.

There is a simplistic tendency of some companies and government departments to demonize all “hacking” as security violations. My laptop carries “Wget is not a crime” https://ttdphx.com/2014/10/23/digital-rights-wget-is-not-a-crime/ , after

was jailed for its use. See Slashdot for the link to Snowden and hackerbabble:
https://yro.slashdot.org/story/14/02/09/0210252/snowden-used-software-scraper-say-nsa-officials

* scraping

Contentmine is in the business of scraping websites – scholarly publishers , academic departments, etc. Is this legal? People have been prosecuted for scraping (https://devcentral.f5.com/articles/web-scraping-data-collection-or-illegal-activity from a company selling anti-scraping software). Wiley and Elsevier caused Tilburg to cut off Chris Hartgerink for downloading (“stealing”) material to which he had legal access. Their accusations have not been made public and it seems most unlikely he had done anything illegal. However I have scraped publishers for 12 years (for legally accessible materials) with no complaints and I do not expect any.
*incitement to commit a crime.
in general it is a serious offence to encourage others to break the law. See http://www.cps.gov.uk/legal/h_to_k/inchoate_offences/#a01 for the official (and complex) UK law. For example I believe that any formal contact with Sci-hub or recommendation to use it could be interpreted as a crime.  Whether the same applies to breaking contract law is less clear, but ContentMine will not , knowingly, break this either.
Please let me know whether I have omitted an important item or have misrepresented one.

OpenForum Europe publishes High Level Policy Paper on text and data mining

On 4 May 2015, OpenForum Europe (OFE) published an extremely significant policy paper on text and data mining.

OFE paper

From the OFE website:-

For the past months, OFE has been involved in an intensive research process regarding the various arguments and approaches relating to text and data mining (TDM) in Europe, which culminated with the paper published today, titled “An analytical review of text and data mining practices and approaches in Europe”.

The Commission should aim to achieve coherence in the legal provisions which it seeks to apply to TDM, with no consideration of ‘commercial’ versus ‘non-commercial’ purposes. Europe needs a regime which enables any researcher, citizen, company or other entity to engage in TDM activities, using material to which they have lawful access. The exact commercial rewards can be managed at subsequent stages, depending on the implementation of the mining outcome. The protection could be considered at the point at which some clearly commercially beneficial project, product, service, business or company has emerged.

From the report itself, mention is made that Peter Murray-Rust contributed to it:-

This paper is based on extensive desk research, including most of the benchmark reports, such as the European Commission funded Expert Group Report (2014), the study by De Wolf and Partners (2014), the UK IPO’s ‘Exceptions to Copyright’ brief (October 2014), as well as numerous other reports, position papers, articles and blog posts1.The initial findings have been discussed at the Round Table that OFE organised in October 2015, the conclusions of which are available in the follow-up White Paper. The desk research and Round Table discussion have been complemented by a series of interviews with academics, researchers, start-ups, and more established companies (including publishers and infrastructure providers)2.

1
A comprehensive list can be provided upon request.
2
The interviews were conducted between September 2015 and February 2016, with the following experts (in alphabetical order): Geoffrey Bilder (CrossRef), Vivian Chan (Sparrho), Elizabeth Crossick (RELX), Lucie Guibault (IViR), Prof. Ian Hargreaves, Rachael Lammey (CrossRef), Thomas Margoni (Openminted), Peter Murray-Rust (Content Mine), Cameron Neylon (Public Library of Science), Julia Reda (MEP), Tim Stok (RELX), Kalliopi Spyridaki (SAS).
From the conclusions of the paper:-
Even if TDM is to be allowed through a generalised exception, APIs will still be needed to do the actual mining. Trusted third party platforms which make APIs available should be encouraged. Having a trusted third party in the mining process could provide a middle ground where publishers feel more confident that their content is not about to be misappropriated, and where miners feel they can engage in TDM without their project being put at risk of plagiarism or other sharp practice.
Bringing all stakeholders around a table would appear to be the most advisable solution, not least because there remains a degree of mistrust between some publishers and some researchers. Sometimes the presence of diverging interests can motivate such tension, but in other cases there can indeed be factors or aspects to which one category of stakeholder rightfully points, but which are not always foreseeable or even obvious for other categories of stakeholder.
In order to be sustainable and to avoid the need for future legislative updates, the provision should be drafted in neutral terms, sufficient to withstand the passage of time and likely evolution of the associated technology.
OFE are a great organization and you can also follow their work on Twitter via @OpenForumEurope

I urge my MEPs to reform European Copyright – please do the same

I have written to my members of The European Parliament to argue for reform of Copyright to allow Text and Data Mining (TDM, “ContentMining”) for commercial and non-commercial purposes. This issue has been very high-profile this year and Commissioner Oettinger will present his  recommendations soon, so it’s important that we let him and MEPs know immediately that we need a change in the law.

I urge you also to write to your MEPs. Its’ easy – just use write writetothem.org and it will work out who you should write to. You can use some of my letter, but personalise it to represent your own views and goals. MEPs take these letters seriously – and they are critical evidence against all the lobbying that they get from vested interests

Dear Geoffrey Van Orden, Stuart Agnew, Vicky Ford, Tim Aker, Richard Howitt, Patrick O’Flynn and David Campbell Bannerman,

Reform of European Copyright to allow Text and Data Mining (TDM)

I am a scientist at the University of Cambridge and write to urge you to promote the reform of European laws and directives relating to Copyright; and particularly the current restrictions on Text and Data Mining (“ContentMining”). The reforms [1] that MEP Reda promoted to the European Parliament earlier this year are sensible, pragmatic and beneficial and I urge you to represent them to Commissioner Oettinger before he produces the policy document on the Digital Single Market (expected in early December 2015).

Science and medicine publishes over 2 million papers a year and billions of Euro’s worth of publicly funded research lie unused, since no human can read the current literature. That’s an opportunity cost (at worst people die) and potentially a huge new industry. I and colleagues have been working for many years to develop the technology and practice of mining (especially in bio- and chemical sciences) . I am convinced that Europe is falling badly behind the US. “Fair use” (see the recent “Google” [2] and “Hathi” books case) is now often held to allow the US, but not Europeans (with only “fair dealing” at best), to mine science and publish results.

Over several years I and others have tried to find practical ways forward, but the rightsholders (mainly mega publishers such as Elsevier/RELX, Springer, Wiley, Nature) have been unwilling to engage. The key issues is “Licences” , where rightsholders require readers to apply for further permissions (and maybe additional payments) just to allow machines to read and process the literature. The EC’s initiative “Licences for Europe” failed in 2013, with institutions such as LIBER, RLUK, and British Library effectively walking out [3]. Nonetheless there has been massive industry lobbying this year to try to convince MEPs , and Commissioners, that Licences are the way forward [4].

The issue is simply encapsulated in my phrase “The Right to Read is the Right to Mine”; if a human has the right to read a document, she should be allowed to use her machines to help her. We have found scientists who have to read 10,000 papers to make useful judgments (for example in systematic reviews of clinical trials, animal testing, and other critical evaluations of the literature. This can take 10-20 days of highly skilled scientist’s time, whereas a machine can filter out perhaps 90%, saving thousands of Euros. This type of activity is carried out in many European laboratories, so the total waste is very significant.

Unfortunately the rightsholders are confusing and frightening the scientific and library community. Two weeks ago a NL statistician [5] was analysing the scientific literature on a large scale to detect important errors in the conclusions reached by statistical methods. After downloading 30,000 papers, the publisher Elsevier demanded that the University (Tilburg) stop him doing his research, and the University complied. This is against natural justice and is also effectively killing innovation – it is often said that Google and other industries could not start in Europe because of restrictive copyright.

In summary, European knowledge workers require the legal assurance that they can mine and republish anything they can read, for commercial as well as non-commercial purposes. This will create a new community and industry of mining which will bring major benefits to Europe. see [6]

Peter Murray-Rust
[1]

Reda Report draft – explained

EU parliament defends Freedom of Panorama & calls for copyright reform

[2] http://fortune.com/2015/10/16/google-fair-use/
[3] https://edri.org/failure-of-licenses-for-europe/, http://ipkitten.blogspot.co.uk/2013/11/licences-for-europe-insiders-report.html
[4] The use of “API”s is now being promoted by rightsholders as a solution to the impasse. APIs are irrelevant; it is the additional licences (Terms and Conditions) which are almost invariably added.
[5] “Elsevier stopped me doing my research” http://onsnetwork.org/chartgerink/2015/11/16/elsevier-stopped-me-doing-my-research/
[6] http://contentmine.org/2015/11/contentmining-in-the-uk-a-contentmine-perspective/
Yours sincerely,

Peter Murray-Rust