Legal Issues

Copying a substantial amount from a work is potentially copyright infringement.  However, thanks to an important change to UK copyright law in 2014, there is a new exception to copyright (i.e., the law allows you to make copies) for copying works for text and data mining (TDM). This is subject to certain limitations. Firstly, the user must have lawful access to the work; secondly, the TDM must be for the purpose of “non-commercial research”, something that is difficult to define precisely; next, any copy made for TDM should be accompanied by sufficient acknowledgment of the source, unless this is practically impossible for some bona fide reason.  In addition, copyright is infringed if the copy made is transferred to another person, or it is used for purposes different than those permitted by the exception. Furthermore, copies made for TDM cannot be sold or let for hire. Note that all these reference to “copies” refer to copies made for TDM purposes; the RESULTS of such TDM, e.g., correlation analyses, scatter plots, etc., are not subject to such restrictions and might in fact enjoy copyright or other protection in their own right – though in the case of ContentMine, all its results are made available under Creative Commons principles.

Crucially, the provision states that the activities covered by the exception cannot be prevented by contract, i.e., contractual terms that purport to restrict or prevent TDM are unenforceable. The exception covers all categories of copyright works, and a parallel exception applies to recordings of performances.

Thus, UK-based researchers who have lawful access to a work in electronic format (for instance, through a library) can freely make further copies of those works to carry out computational analysis of their content, without having to ask for permission.  Incidentally, there is no equivalent exception in other EU member states, so UK researchers are alone in the EU in enjoying this exception.  There is talk of introducing such an exception throughout the EU, but the scholarly publishing industry is lobbying hard against it.  This web page will be updated if the legal position in the rest of the EU is changed.

There are two further important things to note.  Firstly, the new law allows the owners of collections of materials to impose reasonable measures to maintain their network security or stability, but note that these measures should not prevent or unreasonably restrict researcher’s ability to text and data mine.   This is clearly a grey area that copyright owners might be keen to exploit.  It is, however, unlikely to be legal for a copyright owner to insist that a user uses its particular API or software to undertake TDM on its collection unless it can clearly demonstrate that use of a third party’s API or software is having a deleterious effect on its system performance.

Secondly, the new exception does NOT apply to collections that are subject to database right, a quite separate right to copyright. This separate right protects any collection of data, information or works that required substantial investment in obtaining, verifying or presenting its contents. However, recent European Court cases have severely limited the applicability of database rights, and it is unlikely that any of the major database collections offered by scholarly publishers qualify for such a right.

Also see responsible-content-mining by Maximilian Haeussler, Jennifer Molloy, Peter Murray-Rust and Charles Oppenheim.

