anyone used headless browsers for scraping bibtex from webpages ?

John Scott jscott at
Wed May 20 23:51:00 CEST 2020

I don't know about specifically for BibTeX, but for web scripting or doing 
basic forms cURL is pretty handy. For activating elements on a web page, 
you'll probably want to look at saving/using cookies with --cookie-jar and --
cookie, and how to send POST requests.

For example I recently wrote a script to allow me to do a form and complete a 
CAPTCHA all from the CLI. So I did
    curl --cookie-jar jar.txt
to get it to save the cookie for my session. Then I'd recycle this cookie to 
get my CAPTCHA:
    curl --cookie jar.txt -o image.png
and lastly after reading it, send the request (figure out the field names from 
Inspect Element in browser)
    curl --cookie jar.txt -X POST -F 'captcha_code=FfFfFf'

For help with particular sites, please feel free to share details on or off-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: This is a digitally signed message part.
URL: <>

More information about the texhax mailing list.