What’s the first thing that you look at in a scientific paper?
The title? probably.
The abstract? maybe.
For many readers , yes! Many disciplines have a very long tradition of figures rich in detail and information. Introductory and summary data is often found (only) in figures. Results are again often shown in figures. I’m hypothesising that for many readers a quick glance at the figures and their captions would give a very good indication of whether they wanted to read it.
Here’s a typical figure [from http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111303]
So how can we find what the message of a figure is? Put yourself in the position of a blind human or a machine and pretend you can’t see the image. The answer is the figure caption. So here are some examples, taken at random from today’s PLoSONE papers [use the links as acknowledgments].
- Action units of open-mouth faces.
- Map of Palmyra Atoll showing different habitat types and the location of acoustic listening stations.
- Self-reported pain ratings for the initial and relived pain.
- Total ion chromatogram of the middle Miocene Zhangpu amber from GC-MS analysis. [see blog image]
- Effect of all-L and all-D Esc(1-21) peptides at different concentrations on the number of metabolically-active HaCaT cells
Can you get a feel for whether you would want to look at a given image? In specialist fields it’s even clearer; here are some captions from two Open Access articles from Elsevier’s Phytochemistry:
- Analysis of recombinant SvPAL by SDS–PAGE. (SDS-PAGE is a gel electrophoresis technique).
- Phylogenetic tree of the phenylalanine ammonia-lyase proteins of Arabidopsis. (Phylogenetic analysis).
- qRT-PCR analysis of the level of expression of SvPAL1, SvPAL2, SvPAL3 and SvPAL4 in A) willow young leaves, stem, phloem, xylem, mature leaves and root tissue and B). (Molecular biology of plants: “level of expression”)
- Subcellular localisation of SvPAL2. (A)–(F) An overview of the YFP fusion constructs is shown on the left, with the corresponding transient expression in tobacco epidermal cells is shown on the right. (Image very likely to be a photomicrograph).
- List of primer sequences used for cloning and semi-qRT-PCR. (Molecular biology).
- The main maca glucosinolate and metabolites analysed in this study. (Chemical structures).
- Amide and fatty acid HPLC profiles of fresh and dry maca. (HPLC is a chromatographic technique).
- Profiles of selected storage and secondary metabolites during traditional open-field drying of whole maca hypocotyls. (Metabolism)
If you are interested in phytochemistry (chemistry of plants) – and I am, see later posts – these papers are worth reading. It’s also possible to subclassify the first to molecular biology and the second to chemical metabolism.
CONTENTMINE REQUEST: I WANT CLASSIFY FIGURES BY THE TEXT IN THEIR CAPTIONS
We’ve agreed that new software functionality needs communal agreement and that we should therefore blog our suggestions. As Mark MacGillivray puts it “if it’s not worth blogging about it’s not worth doing”. So I think it would be valuable for:
- all figures and captions to be extracted from all papers. This is technically reasonably easy – most publishers use the word “Figure” and some enlightened ones use JATS-XML or HTML5 to tag the figures (<figure>) and even the caption (<figurecaption>). They will be grouped as a special category in Norma (figure-caption pair). This is already well under way with our “sectioning” programme.
- classification of figures by machine-learning and/or humans. This is harder. We need vocabularies before we can classify things, and there is no universal scientific vocabulary (Wikimedia is the best). This is what I’d like us to explore.
How might we do that? This needs to be a communal discussion – not just my guess. And we’d love input from everyone – not just ContentMine colleagues. So:
WOULD YOU LIKE US TO CLASSIFY FIGURES BY THEIR CAPTIONS SO YOU KNOW WHAT PAPERS TO READ?
DO YOU KNOW OF PREVIOUS WORK IN THIS AREA THAT WE COULD BUILD ON?
WOULD YOU LIKE TO HELP?
 And readers are everywhere NOT just in universities. In hospitals, patient groups, small businesses, policy makers and young and old curious minds. And if you have to pay 35 USD to read a paper – like Jack Andraka’s parents, and find it’s not relevant, that’s terrible.