[tex-live] Better ways to find packages and documentation

Reinhard Kotucha reinhard.kotucha at web.de
Thu Jul 5 02:51:19 CEST 2007

Florent Rougon writes:

 >>> 1) Can a given CTAN package be split among several TEXMF trees (in TL,
 >>>    in MiKTeX, etc.)? Or rather, do we want to support that?
 >> No.

 > Good. Then we can have the metadata be self-contained in each TEXMF
 > tree, with relatives paths from the base of the TEXMF tree. This is nice
 > from a theoretical POV and allows natural extension for TEXMFLOCAL data.

A package can be split among a TEXMF tree and several bin/<platform>
trees.  There will be certainly one database for the whole system.

When you mention TEXMFLOCAL I suppose that you have documentation in
mind.  TeXLive itself will definitely not touch anything in TEXMFLOCAL
or TEXMFHOME.  It would be nice if programs like texdoctk look for
databases in TEXMFLOCAL and TEXMFHOME, too.  But IMO these databases
have to be maintained by the users.

Regarding the format of texlive.tlpdb:

I'm alraedy able to parse texlive.tlpdb and to retrieve the
information I currently need and sort them into hashes and arrays with
very few lines of Perl code.  It should be pretty easy to parse it
with Python as well.  

I'm not against using a standard file format, but I think it shouldn't
be much more verbose than Norbert's format.  XML is much too verbose
and too difficult to parse.

texlive.tlpdb is needed by the installer.  Writing an installer which
works on all platforms is difficult enough, and I'm happy at least
that Norbert's database can be parsed so easily without any extra
tools.  And I don't want to depend too much on external tools.  There
had been a nice Perl module for FTP access on CPAN a few years ago.
But the author found that there are severe bugs in it and instead of
fixing the bugs he simply removed it from CPAN.  If we are using a
simple file format as proposed by Norbert, we can maintain the tools
we need ourself and avoid a lot of trouble

I do not see any advantage using RFC 2822.  Parts of it are even
completely braindead:

   Messages are divided into lines of characters.  A line is a series of
   characters that is delimited with the two characters carriage-return
   and line-feed; that is, the carriage return (CR) character (ASCII
   value 13) followed immediately by the line feed (LF) character (ASCII
   value 10).  (The carriage-return/line-feed pair is usually written in
   this document as "CRLF".)

   There are two limits that this standard places on the number of
   characters in a line. Each line of characters MUST be no more than
   998 characters, and SHOULD be no more than 78 characters, excluding
   the CRLF.

In which world are we living?  CRLF is required by mechanical teletype
machines.  You must be quite old if you ever have seen such a beast.

   Header fields are lines composed of a field name, followed by a colon
   (":"), followed by a field body, and terminated by CRLF.

This means to replace

     <key> <value>
     <key>: <value>

but I don't see the advantage.  And there is absolutely no good reason
to limit the length of a string to such a ridiculous value.

Whatever you decide, I think that texlive.tlpdb is quite good and easy
to parse.  Minor changes are not a big problem, I can adapt the script
easily.  But I definitely refuse to make the installer dependent on
external parsers, modules, libraries, tools,... 


Reinhard Kotucha			              Phone: +49-511-4592165
Marschnerstr. 25
D-30167 Hannover	                      mailto:reinhard.kotucha at web.de
Microsoft isn't the answer. Microsoft is the question, and the answer is NO.

More information about the tex-live mailing list