www type bibtex entries - generating bibtex for webpages + prior theme.

Sat Sep 14 23:51:05 CEST 2019

In a prior thread I was describing some reasons to prefer latex-like
document "source" over things like html or explicit xml. Someone 
offered the CELT site below as an example of an experiment related
to this topic. In the link in the sample bibtex below, there is a link
to xml described as the "source document", 

%2019-09-14:17:16:49
%autogenerated by toobib 
@www{CELTprojectBriefeucc,
authors = {},
title = {CELT project: A Briefe description of Ireland: made in this year, 1589, By Robert Payne | University College Cork},
url = {http://research.ucc.ie/celt/document/E590001-007},
urldate = {2019-09-14:17:16:49},
year = {}
}

so called "source document":

http://research.ucc.ie/celt/document/E590001-007.xml

While it is quite true that this xml provides good explicit
structure and is "human readable" it does not
quite "flow" like simple latex source code. That is you could read 
most latex source as if it was meant to be understood versus html
or this xml. The latex just provides logical structure without a lot
of verbosity and allows a renderer to define layout info for the latex things.

Anyway, the point in posting this time is to ask about citing web pages.
For most articles intended to be cited, I had ways to scrape bibtex off
the pages containing an abstract- if the link is on the clipboard
the script can usually find a bibtex entry or a doi and call crossref.
However, I need to make some arguments contrasted to "popular" or maybe
news sites or cite commercial products that were mentioned in a work.
Few of these provide bibtex for their pages although plenty have
"share"  features.  AFAICT, even the CELT site did not provide
much in the way of "how to cite" which is odd for their academic
work and indeed confusing as you want to credit their work with
displaying some other classic work. Is there some obvious way
anyone here would create a bibtex entry for the page above, 

url = {http://research.ucc.ie/celt/document/E590001-007},

and as an example of the commercial site, for example,

./toobib.h608  m_bib.format()=%2019-09-14:17:45:00
%autogenerated by toobib 
@www{ZincCapsHighPotencylifeextension,
authors = {},
title = {Zinc Caps High Potency, 50 mg 90 capsules | Life Extension    },
url = {https://www.lifeextension.com/vitamins-supplements/item01813/zinc-caps-high-potency},
urldate = {2019-09-14:17:45:00},
year = {}
}

mjm>

?

The bibtex above is what I could scrape from the link using some code I wrote
to do it automatically from the link itself, html fields like "title" and any
"meta" it can find. Eventually I could chase down doi's or other cues, that
is why I went from bash to c++, but hopefully it does not become that big a mess.  
I guess if this worked well it would be nice to let
publishers or site owners use a similar tool to provide bibtex in a "how to cite"
button next to all the sharing stuff. 

Google scholar probably did something like this to create their bibtex but I was
not sure if any of that is public or if other mechanisms exist so I wrote
my own code but it could be quite involved and I'm not even sure how
to use some of the fields. Is there a style guide with this in it somewhere?

Thanks. 

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X