[tex-k] epstopdf: using eps libraries

Tue Nov 13 00:37:28 CET 2012

On 2012-11-12 at 17:45:08 +0100, Heiko Oberdiek wrote:

 > On Mon, Nov 12, 2012 at 10:21:00PM +0900, Norbert Preining wrote:
 > 
 > > On Mo, 12 Nov 2012, Norbert Preining wrote:
 > > > > IMHO it is better to add an new option --gsopt
 > > > 
 > > > I'll send a patch tomorrow or tonight.
 > > 
 > > Here it is. Again gsopts is split at white space.
 > > Otherwise ... well I can if it is necessary, but then one has to do
 > > 	--gsopt -dSAFER --gsopt -dFOO --gsopt BLA ...
 > > instead of
 > > 	--gsopts "-dSAFER -dFOO BLA..."
 > > 
 > > Is that ok to split at white space?
 > 
 > For example, options -I, -sFONTPATH=..., might have values with spaces.
 > 
 > Thus I think, --gsopt without splitting at white space is necessary
 > and --gsopts with splitting at white space is useful for convenience.

Please excuse me for hijacking the tread, but there is a long standing
issue with the removal of "binary junk".  Two years ago I created
three testfiles which cause epstopdf to crash.  You'll find the files

  test-TN5002-cr.eps
  test-TN5002-crlf.eps
  test-TN5002-lf.eps

under Build/source/extra/epstopdf .

The binary headers are described in Adobe TN5002.  I padded the
PostScript section with comment characters so that the part of the
header denoting the length of this section contains the bytes "\r\n".

The regular expression which tries to remove the "binary junk" doesn't
work in this case.  The patch below uses substr() in order to remove
the header if the magic number defined in TN5002 is recognized and
falls back to the current behaviour otherwise.

At a first glance not much is gained because gs crashes now with

  Error: /undefined in II*

II* is the byte order mark of a TIFF image.  Curiously, despite of the
error message, for all three test files ghostscript created valid and
correct PDF files.  So I think there is at least a little gain.

The TN5002 header is written to the global variable $TN5002_header and
can be decoded with

   my ($magic, $psstart, $pslen, $metastart, $metalen,
       $tiffstart, $tifflen, $checksum)
       = unpack("H8VVVVVVH4", $TN5002_header);

The proper way would be to send only everything between $psstart and
$psstart+$pslen to gs, but these values have to be corrected because
stuff is inserted by epstopdf.  Admittedly, it certainly sounds easier
than it is.  I must admit that my motivation is quite limited because
Siep's epspdf.rb script works like a charm.

Regards,
  Reinhard

--- epstopdf-orig	2012-05-23 01:07:56.000000000 +0200
+++ epstopdf	2012-11-12 23:43:23.000000000 +0100
@@ -441,6 +441,7 @@
 my $buflen;
 my @bufarray;
 my $inputpos;
+my $TN5002_header;
 
 # We assume 2048 is big enough.
 my $EOLSCANBUFSIZE = 2048;
@@ -459,8 +460,15 @@
   # entire file
   if ($buf =~ /%!/) {
     # throw away binary junk before %!
-    $buf =~ s/(.*?)%!/%!/o;
-    $inputpos = length($1);
+    if ($buf =~ /^\xC5\xD0\xD3\xC6/) { # binary header 
+                                       # according to Adobe TN-5002
+      $TN5002_header = substr($buf, 0, 29);
+      $buf = substr($buf, 30);
+      $inputpos = 30;
+    } else {
+      $buf =~ s/(.*?)%!/%!/o;
+      $inputpos = length($1);
+    }
   }
   $lfpos = index($buf, "\n");
   $crpos = index($buf, "\r");


-- 
----------------------------------------------------------------------------
Reinhard Kotucha                                      Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                              mailto:reinhard.kotucha at web.de
----------------------------------------------------------------------------
Microsoft isn't the answer. Microsoft is the question, and the answer is NO.
----------------------------------------------------------------------------