Invalid unicode ranges in CMap beginbfrange operator

Ben JW b.weggenmann at gmail.com
Mon Feb 13 15:37:40 CET 2023


Hello,

I am trying to produce a PDF/A-2U compliant pdf file using pdflatex version:
pdfTeX 3.141592653-2.6-1.40.24 (TeX Live 2022/Debian).
The problem seems to be related to the produced unicode CMap in the
resulting pdf file, which seems to not always adhere strictly to the pdf
specifications, specifically regarding the beginbfrange operator.
Supposedly:

*When defining ranges of this type, the value of the last byte in the
string shall be less than or equal to 255 − (srcCode2 − srcCode1). This
ensures that the last byte of the string shall not be incremented past 255;
otherwise, the result of mapping is undefined.*

(
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf,
page 295; thanks to @bdoubrov for pointing this out)

In (do_)write_tounicode in
-
https://github.com/TeX-Live/texlive-source/blob/4f771e41a6c3799e9d16e44633c7fa95dc41f1bc/texk/web2c/pdftexdir/tounicode.c#L382
(as
well as
-
https://github.com/TeX-Live/texlive-source/blob/4f771e41a6c3799e9d16e44633c7fa95dc41f1bc/texk/web2c/luatexdir/font/tounicode.c#L394
),
it seems that ranges are identified spanning adjacent unicode codes, but I
don't see any check for an overflow (reaching values above 255) in the last
unicode byte.
Is it possible that the issue comes from this merging of adjacent codes
without the check for the additional format requirement?

I have originally reported issue for veraPDF, please see my posting there
for a minimal (non)-working example and the resulting pdf file, as well as
the produced unicode cmap which was retrieved from the pdf by @bdoubrov:
https://github.com/veraPDF/veraPDF-library/issues/1253#issuecomment-1420125850

Thanks and best,
Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/tex-live/attachments/20230213/ed0db761/attachment.html>


More information about the tex-live mailing list.