[I published a general blog about the impasse between digital scholars and the Toll-Access publishers http://blogs.ch.cam.ac.uk/pmr/2015/11/22/content-mining-rights-versus-licences/ . This is followed by a series of detailed posts which look at the details and consequences
This is the second]
If you have read these earlier posts you will know that the issue is whether I and others are allowed to use machines to read publications we have legal access to read with our eyes.
The (simplified) paradigm for Content-mining scholarly articles consists of:
finding links to papers (articles) we may be interested in (“crawling”). The papers may be on publishers web sites (visible or behind paywall) or in repositories (visible). Most of this relates to paywalled articles
downloading these papers from (publisher) servers onto local machines (clients). (“scraping”). If paywalled this requires paid access (subscription) which is only available to members of the subscribing institution. Thus I can read thousands of articles to which Cambridge University has a subscription.
Running software to extract useful information from the papers (“mining”). This information can be chunks of the original or reworked material.
(for responsible scientists – including me) publish the results in full.
This is technically possible. Messy, if you start from scratch, but we and others have created Open Source tools and services to help.
The problem is that Toll-Access publishers don’t want us to do it (or only under unworkable restrictions). So what stops us?
THE LAW STOPS US
What follows is simplistic and IANAL (I am not a lawyer) though I talk with people who are. I am happy to be corrected by people more knowledgeable than me.
There are two main types of law relevant here:
Copyright law. https://www.copyrightservice.co.uk/copyright/p01_uk_copyright_law . TL;DR any copying may infringe copyright and allow the “rights-holder” to sue. The burden of proof is lower : “However, in a civil case, the plaintiff must simply convince the court or tribunal that their claim is valid, and that on balance of probability it is likely that the defendant is guilty”. Copyright law varies between countries and can be extraordinary complex and difficult to get clear answers. The simple, and sad, default assumed by many people and promoted by many vendors is that readers have no rights. (The primary method of removing these restrictions is to add a licence (such as CC-BY) which is compatible with copyright law and explicitly gives rights to the reader/user).
Here the purchasers of goods and services (e.g. Universities) may agree a contract with the vendors (Publishers) that gives rights and responsibilities to both. In general these contracts are no publicised to users like me and may even be secret. Therefore some of what follows is guesswork. There are also hundreds of vendors and a wide variation on practice. However we believe that the main STMPublishers have roughly similar contracts.
In general these contracts are heavily weighted in favour of the publisher. They are written by the publisher and offered to the purchaser to sign. If the University doesn’t like the conditions they have to “negotiate” with the publisher. Because there is no substitutability of goods (you can’t swap Nature with J. Amer. Chem. Soc.) the publisher often seems to have an advantage.
The contracts contain phrases such as “you may not crawl our site, index it, spider it, mine it, etc.” These are introduced by the publisher to stop mining. (There is already copyright law to prevent the republishing of material without permission, so the new clauses are not required.). I queried a number of UK Universities as to what they had some – some were constructive in their replies but many – unfortunately – unhelpful.
However there is no legal reason why a University has to sign the contract put in front of them. But they do, and they have signed clauses which restrict what I and Chris Hartgerink and other scientists can do. And they do it without apparent internal or external consultation.
And this was understood by the Hargreaves reform which specifically says that text-miners can ignore any contracts which stop them doing it. Presumably they reasoned that vendors pressure Universities into signing our rights away, and this law protects us. And, indeed it’s critically important for letting us proceed.
But this law doesn’t (yet) apply to NL and so can’t help Chris (except when he comes to UK). We want it changed, and library organizations such as LIBER, RLUK, BL etc. want it changed.
So this mail is to ask Universities – and I expect their libraries will answer:
PLEASE REFUSE TO SIGN ANY CONTRACTS WHICH CONTAIN CLAUSES FORBIDDING CONTENT-MINING.
EXPLAIN WHY YOU HAVE TO SIGN OUR RIGHTS AWAY.
And then we’ll work out how to help.