#MozSprint – June 4th – 5th, 2015

Now in its second year, Mozilla Science Lab Global Sprint 2015 took place in over 30 cities across the globe.

You can read more about last year’s sprint here.
Coordination ahead of this years sprint took place namely via this MoPad and we were delighted to be invited to participate.

Logistically, participating at the event in London worked best for us, so we gladly joined the list of ~18 people who signed up to take part.

Graham Steel spoke about ContentMine and their involvement in #MozSprint during the monthly Mozilla Sxience Lab Call on 14th May as detailed here.

20150605_115514

 

The gathering started at around 09:45 – Many thanks to Digital Science for hosting the event and to figshare for providing the excellent food and drink !

From ContentMine, Graham Steel, Mark MacGillivray and Peter Murray-Rust started the #mozsprint ContentMine group.

We set up this etherpad for the two day sprint and were joined in person by Keren Limor, Jonathan Miller, Ben Pellegrini, Rob Knight, Frank Herman and also virtually by Neil Chue Hong – Edinburgh (Software Sustainability Institute) & Ernest Walzel – Edinburgh (Edinburgh Genomics) who were present at the Edinburgh part of the sprint.

rsz_edinburgh1

 

 

Naturally, we set up a pad for the two days (content of which is below).

 

20150604_115517 20150604_112047 20150604_112042 20150604_112027 20150604_112022

 

20150604_122716 20150604_122703 20150604_122645

 

 

20150604_142139

After lunch

 

rsz_20150604_180715

 

DAY TWO

breakfast1

 

HOT BREAKFAST BAPS

We were joined by one of our advisory board members, Joe Mcarthur and also Pamela Jones from UCL.

20150605_101031

rsz_20150605_110250

 

rsz_20150605_161204

Peter, Graham and Joe (from OA Button)

 

 

CONTENTMINE
MozScience – 4 to 5 June 2015
Link to overal Mozilla Science Lab Hack pad is: https://etherpad.mozilla.org/sciencelab-2015globalsprint 
Participants:
Graham Steel
Jonathan Miller – London  (Symplectic)
Frank Herman – London ()
Rob Knight – London (Mendeley)
Peter Murray-Rust
Mark MacGillivreay – Cottage Labs
Ben Pellegrini
Keren Limor
Neil Chue Hong – Edinburgh (Software Sustainability Institute)
Ernest Walzel – Edinburgh (Edinburgh Genomics)
Peter gave a subset of those of us in London (Jonathan, Robert, ) a brief overview of ContentMine itself.
Using this example paper:
    DOI doi:10.1016/j.bbr.2011.09.015  ->  http://www.sciencedirect.com/science/article/pii/S0166432811006711
IDEA: Latex-in
    Seed for the idea sown by: consuming xarchiv-sourced LaTEX files
    With or without the equations
    Latex 2 XML  (in order to go on to generate scholarly XHTML)
Tools to research:
    Pandochttp://pandoc.org/
    Latex2HTMLhttp://www.latex2html.org/
    LaTeXML – https://github.com/brucemiller/LaTeXML – See this intro paper:  http://www.mendeley.com/research/transforming-large-collections-scientific-publications-xml/ – Digital Library of Mathematical Functions (part of NIST in USA)
Examples with outputs from different tools:
    https://gist.github.com/robertknight/7d87a504df1fd7be672b – “Design of Automatically Adaptable Web Wrappers” (http://arxiv.org/abs/1103.1254 
    – TeX source
    – LaTeXML default output
VIRTUAL MACHINE downloads are at:
    
    TO DO ITEMS: add build instructions for command line build of Norma
    – how to get to a working command line as demonstrated in the TUTORIAL/md
Acknowledgements
– Collaborators
– Grant number
Proposed structure of a paper (headings)
– Introduction
– Methodology
– Results
– References
– Evaulation
– Proof
– Computation / Code
– Related Work
– Conclusions
– Discussion
Software
http://www-sop.inria.fr/marelle/tralics/ (INRIA is a world-class informatics institution). Not tested
  Have built this… (run make and we get ./tralics)
  
ContentMine Architecture:
    (esp slide 34/35) for results.xml
EXAMPLE INSTANCE from contentmine
EXAMPLE PAPERS (from arXiv)
Here is the arxiv FAQ about Tex submissions:
    Thanks. Thought: do archiv have some stats on the level of .tex coverage in their arxiv, perhaps broken down roughly by year?
    
    
    
Marks test URLs:
Question at end of Friday: So: how far did you get guys with the Latex -> XML/HTML scraper?

 

 

 

 

 

 

 

 

 

Advertisements

Published by

steelgraham

Scotland's (main, but not only) #OpenScience #OpenAccess #OpenData #OpenSource #OpenKnowledge & #PatientAdvocate Loves blogging http://figshare.com/blog Glasgow, Scotland.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s