anyone used headless browsers for scraping bibtex from webpages ?

Mike Marchywka marchywka at hotmail.com
Sun May 17 21:12:46 CEST 2020


I've been using my scripts pretty consistently now with special case handlers for most
domains including those that use canned bibtex code with doi scraping for crossref
from html and pdf. However, sometimes publishers get fancy javascript at least temporarily
probably to block robots and probably temporarily because people complain. This just
happened to me  on another publisher site and so I finally decided to get a headless
browser. assuming I can script it to download the pages and finally get the bibtex
or at least a doi. It looks like "chrome --headless " will work once I figure out how
to use it but curious if people have explored this before. Yes, I know there are things
like endnote etc but I like just putting a url on the clipboard and downloading
bibtex along with metadata and integrating into other workflow.

Thanks.


note new address
 Mike Marchywka 306 Charles Cox Drive Canton, GA 30115
 2295 Collinworth  Drive Marietta GA 30062.  formerly 487 Salem Woods Drive Marietta GA 30067 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency



More information about the texhax mailing list.