[texhax] Blank first page problem (how to remove?)

Pierre MacKay pierre.mackay at comcast.net
Tue Jun 7 02:01:07 CEST 2011


 >>>>  No, there is certainly NOT a font character corresponding with 
this byte sequence.

Clarification:  the three-byte UTF-8 <EF BB BF>  sequence resolves to 
U+FEFF, which is treated (and deplored) as a zero-width no-break space 
if it simply can't be avoided.  But in UTF-8 it can always be avoided.

The sequence 0XFEFF indicates that the source 16- or 32-bit stream was 
Bigendian, and the corollary is that 0XFFFE indicates that the  stream 
was Littleendian.  (I would see that as one colossal reason to avoid the 
use of 16- or 32-bit streams, even when CJK text might suggest a 
specious efficiency.)  FFFE is specific identifiable as 
"not-a-character" in Unicode 5.0, so that if it appears, it gives you 
only the historic information that the stream being processed began as a 
littleendian stream, an old horror wished on us by the Intel 80xx series 
chips.  "eHplI a  mrtpaep dnia  nI MBP .C"  0XFFFE must be removed 
altogether, and it might have been better if 0XFEFF had been subjected 
to the same fate.

UTF-8 avoids that problem entirely.  Long live UTF-8.

Pierre MacKay


More information about the texhax mailing list