Generating bibtex entry from a URL: zotero, zbib and TooBib.

Mike Marchywka marchywka at hotmail.com
Sun May 30 02:34:23 CEST 2021


I tried this on Zotero,

https://www.npr.org/sections/coronavirus-live-updates/2021/05/29/1001590855/vietnam-detects-new-highly-transmissible-coronavirus-variant

And it came up with this, 

@misc{noauthor_vietnam_nodate,
	title = {Vietnam detects new highly transmissible coronavirus variant},
	url = {https://www.npr.org/sections/coronavirus-live-updates/2021/05/29/1001590855/vietnam-detects-new-highly-transmissible-coronavirus-variant},
	abstract = {Vietnam's health ministry announced the discovery of the new variant on Saturday that has characteristics of two other strains. The country is currently dealing with a recent spike in infections.},
	language = {en},
	urldate = {2021-05-30},
	journal = {NPR.org},
}

My code does not handle this at all yet but, until LinkedIn and Amazon started blocking headless chrome
( and I can get pupeeter to work but first I need to get more recent nodejs than in the ubuntu repo ),,
I found that the html parser isolated json blocks with most of the required bibtex entries. The same appears
true for the npr page. It has this for example, which when I get around to it will probably
let me make pretty much full citations ( author date etc ) similar to a printed article, 

 0 html  1 head  2 script  3 (null)    {"@type":"NewsArticle","publisher":{"@type":"Organization","name":"NPR","logo":{"@type":"ImageObject","url":"https:\/\/media.npr.org\/chrome\/npr-logo.jpg"}},"headline":"Vietnam Detects New Highly Transmissible Coronavirus Variant","mainEntityOfPage":{"@type":"WebPage","@id":"https:\/\/www.npr.org\/sections\/coronavirus-live-updates\/2021\/05\/29\/1001590855\/vietnam-detects-new-highly-transmissible-coronavirus-variant"},"datePublished":"2021-05-29T17:05:40-04:00","dateModified":"2021-05-29T17:05:40-04:00","author":{"@type":"Person","name":["Wynne Davis"]},"description":"Vietnam's health ministry announced the discovery of the new variant on Saturday that has characteristics of two other strains. The country is currently dealing with a recent spike in infections.","image":{"@type":"ImageObject","url":"https:\/\/media.npr.org\/assets\/img\/2021\/05\/29\/gettyimages-1232802621-61c88b4e62098fbb10d3c07a5f35903bb2fb74b1.jpg"},"@context":"http:\/\/schema.org"}

Before LinkedIn stopped responding, I had what I thought was a cool way to make a bib entry for a person
that allowed a reader to become familiar with pertienent public info and disambiguate. Was not sure
the Amazon pages will let me implement "bomtex" or bill of materials entries however. 

If you were talking to Zotero you may want to mention this to them. There is probably a lot more
they can do if they get a json parser :) 



note new address
 Mike Marchywka 306 Charles Cox Drive Canton, GA 30115
470-758-0799
404-788-1216



________________________________________
From: texhax <texhax-bounces+marchywka=hotmail.com at tug.org> on behalf of Mike Marchywka <marchywka at hotmail.com>
Sent: Wednesday, May 19, 2021 8:31 AM
To: texhax at tug.org
Subject: Re: Generating bibtex entry from a URL: zotero, zbib and TooBib.

On Wed, May 19, 2021 at 12:05:44PM +0100, Jonathan Fine wrote:
>    Hi Mike
>    You wrote:
>
>      I went back to this,
>      [https://zbib.org/]https://zbib.org/
>      and pasted this link into the blank,
>      [https://www.researchgate.net/publication/7305589_Menadione_is_a_metabolite_of_oral_vitamin_K]https://www.researchgate.n
>      et/publication/7305589_Menadione_is_a_metabolite_of_oral_vitamin_K
>      There was a big red error notice on the page.
>      This link is useful but the bib entry is a skeleton,
>      [https://zbib.org/0e5ac456090d4969af45b8b33b4e3619]https://zbib.org/0e5ac456090d4969af45b8b33b4e3619
>
>    Well, I repeated the experiment and reproduced your result. It's only my successful generation of the associated
>    [http://zbib.org/]zbib.org bibliography that prevents me doubting my previous success.
>    I'm grateful to you, Mike, for joining me in these investigations. At this point I think the next step for me is to raise
>    an issue with zotero, and see what they say.
>    with best regards

Well, I know research gate had changed their html not long ago and seems a bit adversarial. I was going to
include a comment on relationship in the entry- intended usage, scraped, adversarial, synthesized etc.
I've actually even got a flag in some code to add a header in the request admonish_webmaster if their
bibtex is too inacccessible. This was a concen I have had for a while - that is, updating in response
to capricous sites. An "update on fail" or automated failure report would probably be good- I
would have thought Zotero would routinely look at failed look ups and start fixing it
right away.  Having all the logic local is great though and the template is probably
reusable- any time you are hacking something that you expect to be a one-off you can just
add to the hack list in case you need it again :)

Again, I like this c++ code for data structures and logic with the fully accessible bash
utilities and possible dynamic library loads etc. Although right now
it is probably of most interst to developer-authors.

Personally I'd like to see all sites including news sites that can make good citations include bib info.
My other interest right now is citing a person with enough of a bib entry to disambiguate the
name and provide some terse relevant biographical information ( city and work or education or notable
publications or whatever )- you may be talking about some other "Bill Gates" lol.
Perhaps a larger commercial interest for everyone though is BomTex or creating a bill of materials
for the topic of the work that could be the basis for author impact assessment or payments.





>    Jonathan

--

mike marchywka
306 charles cox
canton GA 30115
USA, Earth
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X



More information about the texhax mailing list.