ContentMine was founded in 2016 as a UK non-profit company limited by guarantee. Our mission is to establish content mining for research and for education as widespread philosophy and practice through:
creating computer programs, protocols, practises, standards and educational materials that enable content mining,
training researchers and others in content mining,
encouraging research institutions and funders of research to support establishing freedom for anyone to engage in computational analysis of books, journals, databases and other knowledge sources for the purposes of education and research.
We develop open source software for mining the scientific literature and engage directly in supporting researchers to use mining, saving valuable time and opening up new research avenues.
We are seeking an Operations Manager to take overall operational responsibility for ContentMine’s development and execution of its mission, reporting to the Board of Directors and working closely with the ContentMine Founder, Dr Peter Murray-Rust. The successful candidate will develop deep knowledge of our core focus, operations, and business development opportunities and manage the transition of the organisation from a project to a sustainable non-profit with oversight of all major business areas from fundraising to communications and HR.
£40-45k pro rata, negotiable.
Time and Location
4 days per week, fixed term contract for four months in the first instance, with renewal subject to funding. The candidate should be a UK or EU national, remote working possible but candidates in easy travelling distance of Cambridge are preferred.
Leadership and Management:
Ensure ongoing excellence in delivery of the ContentMine mission, including program evaluation, and consistent quality of finance and administration, Manage fundraising, communications, and systems; recommend timelines and resources needed to achieve the strategic goals.
Actively engage and energize ContentMine board members, contractors, collaborators, Fellows, volunteers and funders.
Ensure effective systems to track progress, evaluate program components and report to the Board and funders.
Fundraising and Communications:
Expand revenue generating and fundraising activities to support existing program operations and planned developments.
Oversee and refine all aspects of communications—from web presence to external relations, with the goal of creating a stronger brand based on a recent graphical design exercise.
Use external presence and relationships to garner new opportunities.
Planning and New Business:
Build partnerships with research-oriented organisations including groups and institutes, scholarly societies and NGOs.
Establish relationships with potential collaborators and philanthropic funders.
Write grant applications and tender for client contracts.
Manage relationships and work allocations with partner organisations and contractors who bring new skills and capabilities to projects.
The Operations Manager will be thoroughly committed to ContentMine’s mission. All candidates should have proven leadership and relationship management experience. Concrete demonstrable experience and other qualifications include:
At least 5 years of management experience; track record of effectively leading an outcomes-based organization.
Ability to point to specific examples of having developed and actioned strategies that have taken an organization to the next stage of growth.
Commitment to delivering quality programs and data-driven program evaluation.
Excellence in organisational management including developing high-performance teams, setting and achieving strategic objectives, and managing a budget.
Fundraising experience with the ability to engage a wide range of stakeholders, partiuclarly in the academic, non-profit, research and publishing sectors.
Strong written and verbal communication skills; a persuasive and passionate communicator with excellent interpersonal and multidisciplinary project skills.
Action-oriented, entrepreneurial, adaptable approach to business planning.
Ability to work effectively in collaboration with diverse groups of people.
Passion, integrity, positive attitude, mission-driven and self-directed focus are all desirable.
Please submit a cover letter and CV to email@example.com by 2 Dec 2016. Interviews will be held by the 9 Dec. Informal enquiries should be directed to Dr Peter Murray-Rust (firstname.lastname@example.org).
The document itself (PDF) is 182 pages in length and the section relating to text and data mining (TDM) can be found in pages 93 – 108.
4.3. TEXT AND DATA MINING
4.3.1. What is the problem and why is it a problem?
Problem: Researchers are faced with legal uncertainty with regard to whether and under which conditions they can carry out TDM on content they have lawful access to.
Description of the problem: Text and Data Mining (TDM) is a term commonly used to
describe the automated processing (“machine reading”) of large volumes of text and data to uncover new knowledge or insights. TDM can be a powerful scientific research tool to analyse big corpuses of text and data such as scientific publications or research datasets.
It has been calculated that the overall amount of scientific papers published worldwide may be increasing by 8 to 9% every year and doubling every 9 years. In some instances,more than 90% of research libraries’ collections in the EU are composed of digital content. This trend is bound to continue; however, without intervention at EU level, the legal uncertainty and fragmentation surrounding the use of TDM, notably by research organisations, will persist. Market developments, in particular the fact that publishers may increasingly include TDM in subscription licences and develop model clauses and practical tools (such as the Cross-Ref text and data mining service), including as a result of the commitments taken in the 2013 Licences for Europe process to facilitate it may partly mitigate the problem. However, fragmentation of the Single Market is likely to increase over time as a result of MS adopting TDM exceptions at national level which could be based on different conditions.
Four options are suggested on TDM reform.
Option 1 – Fostering industry self-regulation initiatives without changes to the EU legal framework.
Option 2 – Mandatory exception covering text and data mining for non-commercial scientific research purposes.
Option 3 – Mandatory exception applicable to public interest research organisations covering text and data mining for the purposes of both non-commercial and commercial scientific research.
Option 4 – Mandatory exception applicable to anybody who has lawful access (including both public interest research organisations and businesses) covering text and data mining for any scientific research purposes.
The recommendation is for option 3 which allows Public Interest Research Organisations (Universities and research institutes) to mine for Non-Commercial AND Commercial purposes. This appears mainly to support industrially funded research. On the whole it seems to be slight progress.
Option 3 is the preferred option. This option would create a high level of legal certainty and reduce transaction costs for researchers with a limited impact on right holders’ licensing market and limited compliance costs. In comparison, Option 1 would be significantly less effective and Option 2 would not achieve sufficient legal certainty for researchers, in particular as regards partnerships with private operators (PPPs).Option 3 allows reaching the policy objectives in a more proportionate manner than Option 4, which would entail significant foregone costs for rightholders, notably as regards licences with corporate researchers. In particular, Option 3 would intervene where there is a specific evidence of a problem (legal uncertainty for public interest organisations) without affecting the purely commercial market for TDM where intervention does not seem to be justified. In all, Option 3 has the best costs-benefits trade off as it would bring higher benefits (including in terms of reducing transaction costs) to researchers without additional foregone costs for rightholders as compared to Option 2 (Option 3 would have similar impacts on right holders but through a different legal technique i.e. scope of the exception defined through the identification of specific categories of beneficiaries rather than through the “non-commercial” purpose condition). The preferred option is also coherent with the EU open access policy and would achieve a good balance between copyright as a property right and the freedom of art and science.
This post is based upon an important report released 08h00 CET 31 MAY 2016
The Lisbon Council launches Text and Data Mining for Research and Innovation: What Europe Must Do Next, an interactive policy brief which looks at the challenge and opportunity of text and data mining in a European context. Building on the Lisbon Council’s highly successful 2014 paper, which served as an important and early source of evidence on the uptake and interest in text and mining among academics worldwide, the paper revisits the data two years later and finds that recent trends have only accelerated. Concretely, Asian and U.S. scholars continue to show a huge interest in text and data mining as measured by academic research on the topic. And Europe’s position is falling relative to the rest of the world. The paper looks at the legal complexity and red tape facing European scholars in the area, and call for wholesale reform. The paper was prepared for and formally submitted as part of the European Commission’s Public Consultation on the Role of Publishers in the Copyright Value Chain and on the ‘Panorama Exception.’ Source.
Text and Data Mining for Research and Innovation looks at the transformative role of text and data mining in academic research, benchmarks the role of European research against global competitors and reflects on the prospects for an enabling policy in the text and data mining field within the broader European political and economic context.
Asia leapfrogs EU in research on text and data mining. Over the last decade, Asia has replaced the European Union as the world’s leading centre for academic research on text and data mining as judged by number of publications. From 2011 to 2016, Asian scholars’ share of academic publications in the field rose to 32.4% of all global publications, up from 31.1% in 2000. The EU’s global share fell to 28.2%, down from 38.9% in 2000. North America remained in third place at 20.9% due to the relatively small size of the three-country region.
China ranks No.1 within Asia. As recently as 2000, Japan and Taiwan led Asia with 12.6% and 7% of all global text-and-data-mining-based publications. After a steady rise in interest, China now leads. On its own, it accounted for 11.7% of all global publications in 2015, up from zero in 2000. This gave China a No. 2 finish in the country rankings, second only to the United States. China’s ranking within Asia is now No. 1.
Chinese patents on data mining see unprecedented growth. China also led the global growth in the number of patents pertaining to data mining. While the number of patents granted by the U.S. Patent and Trademark Office (USPTO) remained relatively stable over the past decade, the number of patents granted for data-mining-related products by the State Intellectual Property Office of the People’s Republic of China (SIPO) rose to 149 in 2015, up from just one in 2005.
Chinese researchers are champions in patenting TDM procedures. Chinese researchers and organisations are patenting text-and- data-mining procedures at a faster rate than any other country in the world. This suggests that Chinese researchers attach a growing priority to the potential use of this new technique for stimulating scientific breakthroughs, disseminating technical knowledge and improving productivity throughout the scientific and technical community.
Middle East entering the game too. Some of the fastest growth and greatest interest was seen in relative newcomers: India, Iran and Turkey. Having shown virtually no interest in text and data mining as recently as 2000, the Middle East is now the world’s fourth largest region for research on text and data mining, led by Iran and Turkey.
Europe remains slow. Large European scientific, technical and medical publishers have added text-and-data-mining functionality to some dataset licences, but the overall framework in Europe remains slow and full of uncertainty. Many smaller publishers do not yet offer access of this type. And scholars complain that existing licences are too restrictive and do not allow for generating the advanced “big data” insights that come from detecting patterns across multiple datasets stored in different places or held by different owners.
Legal clarity also matters. Some countries apply the “fair-use” doctrine, which allows “exceptions” to existing copyright law, including for text and data mining. Israel, the Republic of Korea, Singapore, Taiwan and the United States are in this group. Others have created a new copyright “exception” for text and data mining – Japan, for instance, which adopted a blanket text-and-data-mining exception in 2009, and more recently the United Kingdom, where text and data mining was declared fully legal for non-commercial research purposes in 2014.
What Europe Must Do. New technologies make analysis of large volumes of text and other media potentially routine. But this can only happen if researchers have clearly established rights to use the relevant techniques, supported by the necessary skills and experience. Broadly speaking, the European ecosystem for engaging in text and data mining remains highly problematic, with researchers hesitant to perform valuable analysis that may or may not be legal. The end result: Europe is being leapfrogged by rising interest in other regions, notably Asia. European scholars are even forced, on occasion, to outsource their text and data mining needs to researchers elsewhere in the world, as has been reported repeatedly in past European Commission consultations. Anecdotally, we hear stories of university and research bureaux deliberately adding researchers in North America or Asia to consortia because those researchers will be able to do basic text and data mining so much more easily than in the EU.
ContentMine is a scholarly assistant for the 21st Century – we’re a team of researchers and developers building an open source pipeline for mining facts from the scientific literature: from genes to species, diseases to chemicals. We’re looking for early adopters who want to fast forward their literature-based research and extract information from thousands of papers for collation and analysis. If you have a research project that involves manually searching through thousands of documents, we could help!
In this first round, we will fund up to five fellows for six months to work on research projects related to the life sciences.
Anyone is welcome to apply with an interesting idea to explore using the scientific literature and a basic set of programming skills. We welcome and encourage applicants from outside academia. Due to the software still being in an alpha state we require applicants to have basic knowledge of:
UNIX command line
Version control using git and github
What ContentMine Fellows can expect from us:
£1000 and some ContentMine merchandise!
A one-day webinar workshop to help you get started with the software.
Fortnightly support calls with the ContentMine developers.
Access to support via ContentMine Slack (chat app).
Priority support and bug fixes to help keep your research up and running.
Excitement and enthusiasm for your project! We are researchers and we love science.
What we expect from ContentMine Fellows:
We promote open notebook science! You will record your progress via github and the ContentMine discussion forum.
Three blog posts summarising progress over the course of six months, one of which will be the final report.
Willingness to explore new methods of research and research communication.
Attendance at fortnightly calls with members of the ContentMine team to help each other and discuss bugs or features for the software.
Detailed bug reporting and feedback on our software.
Please submit the following as email attachments to Jenny Molloy (via contact [at] contentmine [dot] org): a one-page summary of your research idea, a CV and cover letter explaining your eligibility and why you would like to be a ContentMine Fellow. Shortlisted applicants will be asked to perform a simple task with the ContentMine software and attend a brief online interview. Applications close on 3 June 2016, interviews will take place shortly afterwards and the fellowships will run 1 July – 31 December 2016.
In the spirit of openness, we’re discussing the proposal on discuss.contentmine.org and collaboratively drafting via Google Docs. We’re appreciative of any volunteers who would like to help!
During the process of drafting the proposal, we thought of a name for our prize entry: amanuens.is A scholarly assistant for open science. The deadline for entrants for phase l was 29th February. After entering it, we then submitted it for publication in the Open Access Journal RIO which was subsequently published 10th March:-
Martone M, Murray-Rust P, Molloy J, Arrow T, MacGillivray M, Kittel C, Kasberger S, Steel G, Oppenheim C, Ranganathan A, Tennant J, Udell J (2016) ContentMine/Hypothes.is Proposal. Research Ideas and Outcomes 2: e8424. doi: 10.3897/rio.2.e8424
After judging has been completed, phase I prizes will be awarded by 30 Apr 2016
We are pleased to announce that we’re teaming up with Hypothes.is to put forward a proposal to the Open Science Prize to mine and annotate the biomedical literature – using and producing loads of open data along the way.
A growing number of open data resources are either directly cited in the biomedical literature or have an indirect link to the content of articles or other research outputs. Unfortunately these links are often not visible to readers and if the article is behind a paywall they could be invisible to the vast majority of the population, including many researchers.
We plan to automatically mine and openly annotate the biomedical literature with intelligent identifiers for data such as genes, species and many dataset citations. ContentMine will extract the facts and Hypothes.is will display them on the online document. Through this, we’ll create an index of facts as open data that can be combined with manual annotations from the community of Hypothes.is and ContentMine users. This development and linking of two existing early-stage services will lead to a powerful and rich user opportunity to examine facts in context and look for connections and correlations centred around identifiers.
In the spirit of openness, we’re discussing the proposal on discuss.contentmine.org and collaboratively drafting via Google Docs. We’re appreciative of any volunteers who would like to help!