I’ve just been trying to mine publicly visible scientific publications from scholarly publishers. (That’s right – “publicly visible” – Hargreaves comes later).
AND THE TECHNICAL QUALITY IS AWFUL. PUBLISHERS DESTROY SCIENCE THROUGH THEIR TECHNICAL INCOMPETENCE AND INDIFFERENCE.
They destroy the text. They destroy the images and diagrams. And we pay them money – usually more than a thousand dollars for this. Sometimes many thousands. And when I talk to them – which is regular – they all say something like:
“Oh, we can’t change our workflow – it would take years” (or something similar). As if this was a law of the universe.
Unfortunately it’s a law of publishing arrogance. They don’t give a stuff about the reader. There’s no market forces – the only thing that the PublisherAcademic complex worries about is the shh-don’t-mention-the-Impact-Factor.
And it’s not just the TollAccess ones but also the OpenAccess ones. So today’s destruction of quality comes from BMC. (I shall be even handed in my criticism).
I’m trying to get my machines to read HTML from BMC’s site. Why HTML? Well publisher’s PDF is awful – I’ll come to that tomorrow or sometime). Whereas HTML is a standard of many years and so it’s straightforward to parse. Yes,
unless it comes from a Scholarly publisher…
PUZZLE TODAY. What’s (seriously) wrong with the following. [Kaveh, you will spot it, but give the others a chance to puzzle!]. It’s verbatim from http://www.biomedcentral.com/1471-2229/14/106 (I have added some CR’s to make it readable
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html id="nojs" xmlns="http://www.w3.org/1999/xhtml" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:og="http://ogp.me/ns#" xml:lang="en-GB" lang="en-GB" xmlns:wb=“http://open.weibo.com/wb”> <head> ... [rest of document snipped]
When you see it you’ll be as horrified as I was. There is no excuse for this rubbish. Why do we put up with this?