#MozSprint – June 4th – 5th, 2015

Now in its second year, Mozilla Science Lab Global Sprint 2015 took place in over 30 cities across the globe.

You can read more about last year’s sprint here.
Coordination ahead of this years sprint took place namely via this MoPad and we were delighted to be invited to participate.

Logistically, participating at the event in London worked best for us, so we gladly joined the list of ~18 people who signed up to take part.

Graham Steel spoke about ContentMine and their involvement in #MozSprint during the monthly Mozilla Sxience Lab Call on 14th May as detailed here.



The gathering started at around 09:45 – Many thanks to Digital Science for hosting the event and to figshare for providing the excellent food and drink !

From ContentMine, Graham Steel, Mark MacGillivray and Peter Murray-Rust started the #mozsprint ContentMine group.

We set up this etherpad for the two day sprint and were joined in person by Keren Limor, Jonathan Miller, Ben Pellegrini, Rob Knight, Frank Herman and also virtually by Neil Chue Hong – Edinburgh (Software Sustainability Institute) & Ernest Walzel – Edinburgh (Edinburgh Genomics) who were present at the Edinburgh part of the sprint.




Naturally, we set up a pad for the two days (content of which is below).


20150604_115517 20150604_112047 20150604_112042 20150604_112027 20150604_112022


20150604_122716 20150604_122703 20150604_122645




After lunch








We were joined by one of our advisory board members, Joe Mcarthur and also Pamela Jones from UCL.





Peter, Graham and Joe (from OA Button)



MozScience – 4 to 5 June 2015
Link to overal Mozilla Science Lab Hack pad is: https://etherpad.mozilla.org/sciencelab-2015globalsprint 
Graham Steel
Jonathan Miller – London  (Symplectic)
Frank Herman – London ()
Rob Knight – London (Mendeley)
Peter Murray-Rust
Mark MacGillivreay – Cottage Labs
Ben Pellegrini
Keren Limor
Neil Chue Hong – Edinburgh (Software Sustainability Institute)
Ernest Walzel – Edinburgh (Edinburgh Genomics)
Peter gave a subset of those of us in London (Jonathan, Robert, ) a brief overview of ContentMine itself.
Using this example paper:
    DOI doi:10.1016/j.bbr.2011.09.015  ->  http://www.sciencedirect.com/science/article/pii/S0166432811006711
IDEA: Latex-in
    Seed for the idea sown by: consuming xarchiv-sourced LaTEX files
    With or without the equations
    Latex 2 XML  (in order to go on to generate scholarly XHTML)
Tools to research:
    LaTeXML – https://github.com/brucemiller/LaTeXML – See this intro paper:  http://www.mendeley.com/research/transforming-large-collections-scientific-publications-xml/ – Digital Library of Mathematical Functions (part of NIST in USA)
Examples with outputs from different tools:
    https://gist.github.com/robertknight/7d87a504df1fd7be672b – “Design of Automatically Adaptable Web Wrappers” (http://arxiv.org/abs/1103.1254 
    – TeX source
    – LaTeXML default output
VIRTUAL MACHINE downloads are at:
    TO DO ITEMS: add build instructions for command line build of Norma
    – how to get to a working command line as demonstrated in the TUTORIAL/md
– Collaborators
– Grant number
Proposed structure of a paper (headings)
– Introduction
– Methodology
– Results
– References
– Evaulation
– Proof
– Computation / Code
– Related Work
– Conclusions
– Discussion
http://www-sop.inria.fr/marelle/tralics/ (INRIA is a world-class informatics institution). Not tested
  Have built this… (run make and we get ./tralics)
ContentMine Architecture:
    (esp slide 34/35) for results.xml
EXAMPLE INSTANCE from contentmine
Here is the arxiv FAQ about Tex submissions:
    Thanks. Thought: do archiv have some stats on the level of .tex coverage in their arxiv, perhaps broken down roughly by year?
Marks test URLs:
Question at end of Friday: So: how far did you get guys with the Latex -> XML/HTML scraper?











Published by


Scotland's (main, but not only) #OpenScience #OpenAccess #OpenData #OpenSource #OpenKnowledge & #PatientAdvocate Loves blogging http://figshare.com/blog Glasgow, Scotland.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s