problem with foreign letters in names apparently from crossref,

Mike Marchywka marchywka at hotmail.com
Tue Feb 8 09:59:25 CET 2022


On Mon, Feb 07, 2022 at 10:38:55PM +0000, David Carlisle wrote:
>    the c is c cedilla Unicode U+00E7 but it seems to have mangled all the names.
>    Can you use the jats xml which has pretty much all the information you need in machine readable form
>    [https://d197for5662m48.cloudfront.net/documents/publicationstatus/38127/preprint_jats/ec977818c5c0ae0420e3501ce604b48e.xml

It does not appear to be a problem with my existing html parser so I can get
it into the internal common format ignore the xslt stuff I guess.
I'm generally trying to find ways to include miscellaneous stuff in the
bibtex so that unexpected additions are included and an interested user can
modify them as needed. If I happen to find them I can modify the code
as cases come up.


I don't know if the descending c thing will go in the email ok but the parser
had not problem , 

1 html 3 body 4 article 15 front 22 article-meta 30 contrib-group 46 contrib 51 name 52 surname 53 text = Gonçalves

Now I just need to scrape jats links as with bibtex or doi info. 

Thanks.



>    ]https://d197for5662m48.cloudfront.net/documents/publicationstatus/38127/preprint_jats/ec977818c5c0ae0420e3501ce604b48e.xml
>    google suggests several existing jats to bibtex convertors eg
>    [https://github.com/PeerJ/jats-conversion/blob/master/src/data/xsl/jats-to-bibtex.xsl]https://github.com/PeerJ/jats-convers
>    ion/blob/master/src/data/xsl/jats-to-bibtex.xsl
>    David
> 
>    On Mon, 7 Feb 2022 at 22:13, Mike Marchywka <[mailto:marchywka at hotmail.com]marchywka at hotmail.com> wrote:
> 
>      I thought this was a simple URL to cite but it failed on Zotero webform,
>      [https://www.authorea.com/doi/full/10.22541/au.159103680.00306295]https://www.authorea.com/doi/full/10.22541/au.15910368
>      0.00306295
>      Outbreak of abomasal bloat in goat kids due to Clostridium ventriculi and Clostridium perfringens type A in Brazil
>      BACTERIAL PATHOGENSDIAGNOSTICSDISEASE CONTROLTHERAPEUTIC
>      +8
>      Mário Felipe Balaro,Fernanda Gonçalves,Felipe Seabra Leal,Isabel Cosentino,Júlia Vignoli,Nathalia Silva,Felipe
>      Brandão,Alessandra Figueiredo Nassar,Simone Miyashiro,Nathalie Cunha,Claudia Del Fava
>      Further, the names are mangled as the one with a "c" including a wierd descender
>      is hacked up even in the crossref output. I would blame by handling
>      or non-ascii except that the json seems to split the "Gon" name at the offending
>      character,
>      {"status":"ok","message-type":"work","message-version":"1.0.0","message":{"institution":[{"name":"Authorea,
>      Inc."}],"indexed":{"date-parts":[[2021,12,17]],"date-time":"2021-12-17T19:18:05Z","timestamp":1639768685631},"posted":{"
>      date-parts":[[2020,6,1]]},"group-title":"Preprints","reference-count":0,"publisher":"Authorea,
>      Inc.","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"accepted":{"date-parts":[
>      [2020,6,1]]},"DOI":"10.22541\/au.159103680.00306295","type":"posted-content","created":{"date-parts":[[2020,6,1]],"date-
>      time":"2020-06-01T18:40:05Z","timestamp":1591036805000},"source":"Crossref","is-referenced-by-count":0,"title":["Outbrea
>      k of abomasal bloat in goat kids due to Clostridium ventriculi and Clostridium perfringens type A in
>      Brazil"],"prefix":"10.22541","author":[{"given":"M rio
>      Felipe","family":"Balaro","sequence":"first","affiliation":[{"name":"Universidade Federal
>      Fluminense"}]},{"given":"Fernanda Gon","family":"alves","seq!
>       uence":"additional","affiliation":[{"name":"Universidade Federal Fluminense"}]},
>      Has anyone had a problem with crossref results and foreign chars?
>      This is what I came up with, and you can see author_orig and "author" entries
>      after I triedo to concat "DelFava" as I thought it should be- is that not
>      right? Normally I ignore stuff like this as most users would but since
>      I was cleaning it up I thought I would try to make it more conventional :)
>      toobib handledoilink
>      % date 2022-02-07:17:04:51 Mon Feb 7 17:04:51 EST 2022
>      % srcurl:
>      [https://www.authorea.com/doi/full/10.22541/au.159103680.00306295]https://www.authorea.com/doi/full/10.22541/au.15910368
>      0.00306295
>      % citeurl:
>      [http://api.crossref.org/works/10.22541/au.159103680.00306295]http://api.crossref.org/works/10.22541/au.159103680.003062
>      95
>      @article{2020_rio_Felipe_Balaro_Fernanda_Gon_alves,
>      X_TooBib = {year: ReWriteKvp dn=year sn=date flags=4},
>      X_TooBib = {month: ReWriteKvp dn=month sn=date flags=7},
>      X_TooBib = {day: ReWriteKvp dn=day sn=date flags=7},
>      X_TooBib = {journal: ReWriteParse be.get(s)=Authorea, Inc. be.get(dest)=},
>      X_TooBib = {urldate: FixBeKvp s= cmd=date "+%Y-%m-%d" d=2022-02-07 dn=urldate},
>      X_TooBib = {author: FeliperioBalaro , M and Gonalves , Fernanda and Leal , Felipe Seabra and Cosentino , Isabel and
>      liaVignoli , J and Silva , Nathalia and Felipe Brand and Nassar , Alessandra Figueiredo and Miyashiro , Simone and Cunha
>      , Nathalie and DelFava , Claudia},
>      affiliation = {Universidade Federal Fluminense and Universidade Federal Fluminense and Universidade Federal Fluminense
>      and Universidade Federal Fluminense and Universidade Federal Fluminense and Universidade Federal Fluminense and
>      Universidade Federal Fluminense and Instituto Biologico and Instituto Biologico and Universidade Federal Fluminense and
>      Instituto Biologico},
>      author = {FeliperioBalaro , M and Gonalves , Fernanda and Leal , Felipe Seabra and Cosentino , Isabel and liaVignoli , J
>      and Silva , Nathalia and Felipe Brand and Nassar , Alessandra Figueiredo and Miyashiro , Simone and Cunha , Nathalie and
>      DelFava , Claudia},
>      author_orig = {M rio Felipe Balaro and Fernanda Gon alves and Felipe Seabra Leal and Isabel Cosentino and J lia Vignoli
>      and Nathalia Silva and Felipe Brand o and Alessandra Figueiredo Nassar and Simone Miyashiro and Nathalie Cunha and
>      Claudia Del Fava},
>      bib-source = {Crossref},
>      content-domain = {false},
>      date = {2020-06-01},
>      date-accepted = {2020-06-01},
>      date-created = {2020-06-01T18:40:05Z},
>      date-deposited = {2020-06-012020-06-01T18:40:05Z},
>      date-indexed = {2021-12-17T19:18:05Z},
>      date-issued = {2020-06-01},
>      date-posted = {2020-06-01},
>      day = {01},
>      deposited = {1591036805000},
>      doi = {10.22541/au.159103680.00306295},
>      group-title = {Preprints},
>      institution = {Authorea, Inc.},
>      is-referenced-by-count = {0},
>      journal = {Authorea, Inc.},
>      member = {9829},
>      month = {06},
>      prefix = {10.22541},
>      publisher = {Authorea, Inc.},
>      reference-count = {0},
>      references-count = {0},
>      score = {1},
>      subtype = {preprint},
>      title = {Outbreak of abomasal bloat in goat kids due to Clostridium ventriculi and Clostridium perfringens type A in
>      Brazil},
>      type = {posted-content},
>      url = {[http://dx.doi.org/10.22541/au.159103680.00306295]http://dx.doi.org/10.22541/au.159103680.00306295},
>      urldate = {2022-02-07},
>      year = {2020},
>      srcurl={[https://www.authorea.com/doi/full/10.22541/au.159103680.00306295]https://www.authorea.com/doi/full/10.22541/au.
>      159103680.00306295},
>      xsrcurl={[https://www.authorea.com/doi/full/10.22541/au.159103680.00306295]https://www.authorea.com/doi/full/10.22541/au
>      .159103680.00306295},
>      citeurl={[http://api.crossref.org/works/10.22541/au.159103680.00306295]http://api.crossref.org/works/10.22541/au.1591036
>      80.00306295}
>      }
>      --
>      mike marchywka
>      306 charles cox
>      canton GA 30115
>      USA, Earth
>      [mailto:marchywka at hotmail.com]marchywka at hotmail.com
>      404-788-1216
>      ORCID: 0000-0001-9237-455X

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X


More information about the texhax mailing list.