Update for last month: Shuttleworth, BL, MySociety and more

I have been silent for the last month because I have been very busy. I hope to blog more in the next few days.

The main activities have been

  • The twice-yearly Gathering of Shuttleworth fellows (this time in Malta). My first time (March) was great, but this was fantastic. I have so much in common with so many of the past and current Fellows. I’ve a huge todo list to see how we can work together.
  • An application to the Shuttleworth Foundation for re-funding for a further year. I’ll take you through this in detail in the next posts.
  • Running ContentMining workshops (at EBI on 2014-10-06) and (Jenny Molloy and Puneet Kishor) in Delhi last weekend (2014-11-02). We have now ironed out most of the problems and feel confident about delivering a range of workshops.
  • Preparing for the launch of our mining activities (RRSN, promise!). Will blog in next day or two
  • Preparing for trip to US and doing workshops in Chicago, Washington (OpenCon) and visiting Penn State).
  • Attending Open Access Button launch 2014-10-21. OA_Button is massive because at least there is some true energy and anger. I gave a 30 sec tribute where I said it was massive and that we hadn’t seen nothing yet. I think OA_Button will change the world. Young people are sick of the broken values of academia. (I shall expand on this at OpenCon).
  • Cobi Smith (who is currently with Francois Grey in CERN) gave an excellent talk at Open Research Cambridge in the Panton Arms on crowdsourcing/crafting for disasters.
  • Met with all the Cambridge University Librarians to say farewell to Peter Morgan, retiring, who has helped me get off the ground in Library and Informatics research. Peter’s been a huge contributor to new ideas and practices and we’ll miss him. Good opportunity to talk with the library and know they are supportive of my ContentMining research.
  • Interesting meeting on Big Data in science in Cambridge. Great talk by Florian Markowetz (Cancer Genomics) debunking the hype (Big Data is sliding down the curve into trough of disillusion). He demolished the “lets throw all our data into Machine Learning and wonderful things will automatically emerge” syndrome. I completely agree – Machine Learning has its place – e.g. OCR – but leads to no understanding and no transferability. I am largely renouncing it.
  • British Library Labs on “Big Data” in Arts and Humantities. Massively Wonderful. Wonderful speakers and projects. The BL runs its labs on a shoestring (Mainly Mahendra and Ben) and IMO is a world leader in innovative use of library resources. I’m certainly planning to see if they’ll host a workshop on ContentMining.
  • MySociety – ran a great meeting yesterday evening in Cambridge and asked me to present TheContentMine. I had some help – see http://instagram.com/p/u_NrqxIFTz/

More later…


July summary: an incredible month: ContentMine, OKFest, Shuttleworth, Hargreaves, Wikimania

I haven’t blogged for over a month because I have been busier than I have ever been in my life. This is because the opportunities and the challenges of the Digital Century appear daily. It’s also because our ContentMine (http://contentmine.org) project has progressed more rapdily, more broadly and more successfully than I could have imagined.

Shuttleworth fund me/us to change the world. And because of the incredible support that they give – meetings twice a week, advice, contacts, reassurance, wisdom we are changing the world already. I have a wonderful team who I trust to do the right thing almost by instinct – like a real soccer team – each anticipates what is required when.

It’s getting very complex and hectic as we are active on several fronts (details in later posts and at Wikimania)

  • workshops. We offer workshops on ContentMining, agree dates and place and then have to deliver. Deadlines cannot slip. A workshop on new technology is a huge amount of effort. When we succeed we know we have something that not only works, but is wanted.  It’s very close to the OpenSource and OpenNotebook Science where everything is  made available to the whole world. That’s very ambitious and we are having to build the …
  • technology. This has developed very rapidly, but is also incredibly ambitious –  the overall aim is to have Open technology for reading and understanding and reusing the factual scientific literature. This can only happen with a high quiality generic modular architecture and
  • community. Our project relies on getting committed democratic meritocratic volunteers (like Wikipedia, OpenStreetMap, Mozilla, etc.). We haven’t invited them but they are starting to approach us and we have an excellent core in RichardSmith-Unna’s quickscrape (https://github.com/ContentMine/quickscrape/).
  • sociopoliticolegal. The STM publishers have increased their effort to require licences for content mining. There is no justification for this and no benefit (except to publishers income and control). We have to challenge this and we’ve written blogs and a seminal paper and…

Here’s a brief calendar …

  • 2014-06-04-> 06 FWF talk, workshop, OK hackday in Vienna
  • 2014-06-19->20 Workshop in Edinburgh oriented to libraries.
  • 2014-07-07->12 Software presented at BOSC (Boston)
  • 2014-07-14 Memorial for Jean-Claude Bradley and promotion of OpenNotebookScience
  • 2014-07-15 Presentation at CSVConf Berlin
  • 2014-07-16->19 OKFest at Berlin – 2 workshops and 2 presentations
  • 2014-07-22->23 Mozilla Sprint – Incredibly valuable for developing quickscrape and community
  • 2014-07-24 Plenary lecture to NLDTD (e-Theses and Dissertations) Leicester
  • 2014-07-25->27 Crystallography and Chemistry hack at Cambridge (especially liberating crystallographic data and images)
  • 2014-07-28->29 Visit of Karien Bezuidenhout from Shuttlworth – massive contribution
  • 2014-08-01 Development of PhyloTreeAnalyzer and visit to Bath to synchronise software
  • 2014-08-02 DNADigest hack Cambridge – great role that ContentMine can play in discovery of datasets




  • preparing for Featured Speaker at Wikimania on 2014-08-08 where I’ll present the idea that Wikipedia is central to understanding science. I’ll blog initial thought later today