really odd problem extracting bibtex from a science publisher lol,

Mike Marchywka marchywka at hotmail.com
Sun Oct 3 20:53:08 CEST 2021


I never went to this site before and it looked pretty normal
in the web browser,

https://cdnsciencepub.com/doi/abs/10.1139/o59-099

but when I ran "TooBib" to extract the citation info it looked
like it was stuck in infinte loop. However, the stack trace was 
in html parsing which was supposed to work. I turned out the
html was over 20MB and rendered in lynx with 
89k links lol,

lynx -dump -force-html  junk/xxx | tail -n 100


89554. file://localhost/home/documents/cpp/proj/toobib/junk/xxx
89555. https://www.facebook.com/cdnsciencepub
89556. https://twitter.com/cdnsciencepub
89557. https://www.linkedin.com/company/canadian-science-publishing
89558. https://www.youtube.com/user/cdnsciencepub?feature=results_main

I'm not sure if my code woould have eventually worked, Zotero web
form returned an error after a little while. But, the doi is in
the link so I just redirect that site to an existing handler
for that case lol. 
 ( this uses the crossref x-bibtex facility which seems to 
consistently drop the journal info, I'm switching to parsing
their json output  but right now it just uses the publisher as the journal
which works sometimes... )

% mjmhandler: toobib guesscdnscience<-handledoilink
% date 2021-10-03:11:04:30 Sun Oct 3 11:04:30 EDT 2021
% srcurl: https://cdnsciencepub.com/doi/abs/10.1139/o59-099
% citeurl: http://api.crossref.org/works/10.1139/o59-099/transform/application/x-bibtex
@article{1959_Bligh_Dyer_RAPID_METHOD_TOTAL_LIPID,
X_TooBib = {journal: ReWriteParse be.get(s)=Canadian Science Publishing be.get(dest)=},
author = {E. G. Bligh and W. J. Dyer},
doi = {10.1139/o59-099},
journal = {Canadian Science Publishing},
month = {aug},
number = {8},
pages = {911--917},
publisher = {Canadian Science Publishing},
title = {A {RAPID} {METHOD} {OF} {TOTAL} {LIPID} {EXTRACTION} {AND} {PURIFICATION}},
url = {https://doi.org/10.1139%2Fo59-099},
volume = {37},
year = {1959},
srcurl={https://cdnsciencepub.com/doi/abs/10.1139/o59-099},
xsrcurl={https://cdnsciencepub.com/doi/abs/10.1139/o59-099},
citeurl={http://api.crossref.org/works/10.1139/o59-099/transform/application/x-bibtex}

}

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X


More information about the texhax mailing list.