How do beginners learn or how are they taught to do "research" with Bibtex?

Mike Marchywka marchywka at
Fri Mar 5 02:31:21 CET 2021

On Thu, Mar 04, 2021 at 04:16:31PM -0800, Paulo Ney de Souza wrote:
>    On Thu, Mar 4, 2021 at 11:17 AM Mike Marchywka <[mailto:marchywka at]marchywka at> wrote:
>      I'm putting together another set of notes that  is almost
>      a manuscript and using this as a time to debug my new tool
>      "toobib" which finds and downloads Bibtex entries for a given url.
>      This is a more mtaintainable version of my old "med2bib" script.
>    Is this med2bib that is part of cb2Bib?
>    med2bib: []
>    .1.en.html
>    cb2Bib: []

Thanks, I never bothered to check the name and my script is entirely different
but includes some of that functionality. By pubmed format they may mean
what I call medline which I do convert internally. 
My c++ code is just for logic and data structures but invokes
a lot of bash utilities ( wget, sed, awk, headless chrome )  and scripts
including more custom scripts for the eutils automated pubmed access.
However, the new pubmed pages have doi's which may work just as well.

That is another interesting issue is the quality of the bibtex
file which can vary a lot between sources for a given work.
I now have an option to get all entries that I can discover and
it is kind of interesting to see the quirks and quality issues.

>      I like it and wanted to document it for possible release. So, it may
>      be helpful if I knew how "real people" would collect citations
>      for scholarly or other research with the intent of writing a
>      paper including complete bibliography ( or a DIY project using
>      BomTex as outlined before ).
>    I would believe each group has its own tools. For example, mathematicians
>    use MRef:
>         []
>    and I am sure other groups use similar tools.
I'll have to look at that as it is not immediately clear what that does.
I've tried various technical fields- math, physics, mostly medicine-
and most places that publish abstracts or articles have some similar
means to get the bibtex although some are quite quirky they
may be picked by web people not the "field." 

>      AFAICT, every time you find a paper or abstract you want to cite,
>      you have to find a button or doi or something and then maybe download
>      or copy/paste bibtex into your collection - I don't even think google scholar
>      makes this easy to automate.  My workflow now just means copy whatever link
>      you have to the clipboard, run my "toobib" program that finds the bibtex for you,
>      and then add it to your collection.
>      For example, if you wanted to cite these works, how would you go about it?
>      []
>      [
>      fiable_risk_factor_of_severe_COVID-19/links/5f48bab6299bf13c504629dd/Reduced-vitamin-K-status-as-a-potentially-modifiabl
>      e-risk-factor-of-severe-COVID-19.pdf]
>      tamin_K_status_as_a_potentially_modifiable_risk_factor_of_severe_COVID-19/links/5f48bab6299bf13c504629dd/Reduced-vitamin
>      -K-status-as-a-potentially-modifiable-risk-factor-of-severe-COVID-19.pdf
>      If I can figure out how to get headless chrome to download pdf files, then it will even be
>      able to read from private cites for which you have credentials.  In theory right now I can
>      render to pdf ordump the dom for scraping  if a user provides the right cookie file but have not
>      integrated that yet.
>      I've got a lot of quirky code specialized to particular publishers but a lot of patterns
>      have emerged so often it works on novel sites but I'm still adding stuff and
>      trying to generalize more.
>    Is this available anywhere?

I could put it somewhere but it would help if I had minimal documentation that
compares to realted products. If anyone wants to post a few links I'll
see if I can find the bibtex corresponding to those time permitting. 
I'm also working on an email based server and could integrate this with that.
Then, you could email a robot a bunch of links and get back a list of
bibetx entries for each one. Obviously it could not invoke a browser 
with your current cookies but for public stuff would be ok.

Thinking well into the future, mass deployment would be kind of an
interseting thing. I can take out all the hardcoded bash and move
into configuration files making it more adaptable and general.
Publisher web sites change all the time and likely it would need
to change often too. However, the build process is very simple
and you could almost have it check for and install a new version
on any failed bibtex discovery :) Some publishers for whatever reason
also don't seem automation friendly and may not even like this. 

The architecture is becoming comparatively  "sysetmatic" as it just
tries a big collection of hacks but I think the approach can be generalized


>    Paulo Ney


mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at
ORCID: 0000-0001-9237-455X

More information about the texhax mailing list.