I think of getpapers as a handy command-line tool for search & retrieval of relevant research. However, there are a variety of circumstances that can prevent getpapers from returning you the full text of some relevant papers, this is where quickscrape becomes very useful.
quickscrape is a command-line tool simply for retrieval of known research you want to download, with more power and flexibility of download techniques than getpapers. To some extent, it is in theory possible to get anything and everything you have legal access to, in bulk, via quickscrape. Now that’s what I mean by POWER!
Q: Is there a situation in which I might use both getpapers and quickscrape?
A: Yes! getpapers has functionality specifically designed for input into quickscrape which can be very useful when getpapers finds relevant closed access papers for which publisher-imposed restrictions don’t allow EPMC to make available for full text download.
A worked example: I want to mine the last 3 months of papers published in PNAS. PNAS typically imposes a 6-month embargo on research published in it, so EPMC cannot allow full-text download of recent PNAS research from EPMC. So you have to go via the PNAS journal website to get recent PNAS articles.
# # Use getpapers to get a list of all recent PNAS articles getpapers --query 'JOURNAL:"PNAS" AND FIRST_PDATE:[2015-04-01 TO 2015-07-01]' --all --outdir recentpnas # Use quickscrape to download recent PNAS articles output by getpapers quickscrape --urllist recentpnas/fulltext_html_urls.txt --scraper journal-scrapers/scrapers/pnas.json --output recentpnasfull --outformat bibjson
Perfect synergy, eh?
Q: What’s a real use case in which someone would use quickscrape instead of getpapers?
Incidentally, there are two Acta Palaeontologica Polonica articles in EPMC and I have no idea why they are in EPMC to be honest! It would certainly make my life easier if EPMC / PMC were more widely scoped in terms of subjects/journals allowed in.
I’m not a biomedical researcher myself so unfortunately this is a common problem for me. There is no central aggregation of evolution, ecology or palaeontology journal content – if you want to do full text mining on them you have to aggregrate the content yourself, with quickscrape !