[tex-live] Making texts externally replaceable in PDFs, e.g. with sed(1)
Osipov, Michael
michael.osipov at siemens.com
Fri Dec 14 16:50:17 CET 2018
Hi folks,
we are using XeTeX 3.14159265-2.6-0.99999 (TeX Live 2018) on Windows and
FreeBSD.
After studying the PDF specification [1] and how XeLaTeX and xdvipdfmx
work with Unicode (from PDF samples), I believe that my request is
(virtually) impossible.
I'd be happy if someone could either confirm this or prove me wrong.
Task: We are producing PDFs on our server (from LaTeX source) for the
client which takes the PDF and uploads it to another service which may
replace placeholders, e.g., %DOCID% with the actual document ID in the
target system. So the PDF has to be uncompressed (xdvipdfmx -z 0) and
has to contain literal strings "(%DOCID%)Tj" or "[(%DOCID%)]TJ"
according to the PDF spec.
XeLaTeX produces the following:
> BT /F1 5.9776 Tf -40.819 -756.627 Td[<00270052004e00580050004800510057005100580050005000480055>]TJ /F1 9.9626 Tf 0 -11.955 Td[<0008002700320026002c00270008>]TJ ET
> begincmap
> /CMapName /C:-WINDOWS-fonts-siemens_global_roman.ttf,000-UTF16 def
> /CMapType 2 def
> /CIDSystemInfo <<
> /Registry (Adobe)
> /Ordering (UCS)
> /Supplement 0
>>> def
> 1 begincodespacerange
> <0000> <FFFF>
> endcodespacerange
> 13 beginbfchar
> <0008> <0025>
> <0017> <0034>
> <001B> <0038>
> <002A> <0047>
> <002C> <0049>
> <002E> <004B>
> <0032> <004F>
> <0033> <0050>
> <0039> <0056>
> <005C> <0079>
> <005D> <007A>
> <008B> <00A9>
> <00B3> <2014>
> endbfchar
> 5 beginbfrange
> <0010> <0015> <002D>
> <0024> <0028> <0041>
> <0035> <0037> <0052>
> <0044> <0053> <0061>
> <0055> <0059> <0072>
> endbfrange
> endcmap
So it writes hexadecimal character codes which map to Unicode points in
our true type font Siemens Global.
So for a sed(1)-based postprocessor it is virtually impossible to map
"<0008002700320026002c00270008>" to "%DOCID%" w/o analyzing the PDF objects.
Requesting XeLaTex to produce
> BT /F1 5.9776 Tf -40.819 -756.627 Td[<00270052004e00580050004800510057005100580050005000480055>]TJ /F1 9.9626 Tf 0 -11.955 Td[(%DOCID%)]TJ ET
will not work because the /ToUnicode cmap does not have a character
mapping from the literal "%" (etc.) to the corresponding Unicode point.
Especially because the to be replaced chars in the real document ID
would need to be in the bfchar listing.
Having procuded a capable, corresponding PDF with PDF XChange printer
driver embedded the Siemens Global twice. As Identity-H encoding
(subset) and with WinAnsiEncoding (completely). Without the char code to
glyph mapping it seems to be possible. So the approach has to be a 8-bit
font encoding:
> /Type /Font
> /Subtype /TrueType
> /BaseFont /SiemensSansGlobal-Regular
> /FirstChar 32
> /LastChar 220
> /Encoding /WinAnsiEncoding
This is something which is impossible because of XeLaTeX's Unicode
nature. It will always use CID with Indentity-H and UCS ordering.
This will get even more complicated if glyph spacing is involved.
I'd be happy if someone could drop a comment or two on the issue.
Regards,
Michael
PS: I haven't looked into the pdfx package yet how this could solve the
issue with XeLaTeX. Plus, my PDF spec and LaTeX knowledge is very little.
[1]
https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
More information about the tex-live
mailing list