[texhax] unicode

pierre.mackay pierre.mackay at comcast.net
Fri Aug 5 18:34:59 CEST 2005


Alexander Grahn wrote:

>On Fri, Aug 05, 2005 at 03:13:25PM +0200, Karl Berry wrote:
>  
>
>>Sorry, but I've lost track of the original question.  What are you
>>trying to accomplish? 
>>    
>>
>
>I'm trying to expand my LaTeX-Package for generating PDF with embedded
>multimedia. I want to add some hack written in JavaScript which might
>workaround a bug of AdobeReader.
>
>I want to write out a name tree into the document Catalog, mapping names
>to object refs. The PDF-Spec says that name strings must be Unicode
>formatted (actually Big Endian UTF16).
>
>As the strings that I want to write out consist of [a-zA-Z0-9_] only,
>every UTF16 representation of the character in question to be written
>out is formed from a Zero byte x00 followed by the byte with the ASCII
>code of the character, e. g.
>
>A --> x0041
>B --> x0042
>etc.
>
>  
>
When in doubt, go straight to the primitives. \char places an eight-bit 
value directly into the output, and works for NULL as well as everything 
else.

 From the input: cdef\char0 A\char0 Bghijk

 From the DVI file:

cdef^@A^@Bghijk

 From the output of dvips:

(cdef\000A\000Bghijk)

which appears to be a string with two unmistakable 16-bit wide chars in 
it, whete the octal \000
provides the first null byte. The ggv default font appears to have an 
uppercase Gamma there.

what ps2pdf will do with that I don't quite know, because I find it 
nearly impossible to read pdf binaries;

So, inserting \char0 will work, and this can be done relatively 
painlessly with a tail-recursion macro.

You could also \def\0{\char0} to make the string more readable and 
easier to type

This is really neat, because it shows an---admittedly complex---way of 
inserting multicharacter UTF8 and wide-chars without altering TeX to 
enable set2 and set3.

Pierre MacKay



More information about the texhax mailing list