[tex-live] Better ways to find packages and documentation

Norbert Preining preining at logic.at
Wed Jul 4 22:33:29 CEST 2007

Hi Florent,

On Mit, 04 Jul 2007, Florent Rougon wrote:
> BTW: the emphasis here is to remind you that language='de' in a
>      <documentation> element of the Catalogue is *not* what I call a tag
>      here: it is an XML _attribute_ of the <documentation>.

Ok, see below for a solution to this.

> >> I thought that in texlive.tlpdb:
> >>   - either the 'name' field indicates the CTAN package name;
> >>   - or there is a 'catalogue' field indicating the CTAN package name.
> >
> > In theory, yes. But nobody ever checked this!!!
> Then make reality fit theory. :)

This is what we assume! If problems occur we fix them.

> Why not assume that the tlpdb-to-CTAN package name mapping is correct
> (using "catalogue" fields when they exist) and just fix any problem that
> arises?

Yes, that is our approach.

> Ah, thanks. Somehow most of the infrastructure stuff seemed to be in
> Master, so I didn't look closely enough in Build. I'm still wondering
> about the guidelines that define the contents of each of these dirs...

Well, this is a LONG story. Everyone, starting from Sebastian, wrote
bits and pieces, some for management, some for cd building, some for
inclusion. They were saved sometimes here, sometimes there.

Generally what is in Master should have sense to be put on the DVD
shipped. The perl modules definitely make sense there, because
(hopefully soon) the installer will make use of them.

The stuff outside is used for updating the subversion repository, making
DVD/CDs, etc. I guess you see the difference.

> >   The TLPOBJ files *CAN* (and hopefully will) be enriched with additional
> >   information from the catalogue
> Ah. You should know what you want. :)

Well, as I said, this is WIP.

> >  but first I have to write a catalogue, access Perl module (and read
> >  xml, grrrr ;-).
> Are you trying to parse it manually or what???

No XML::Parser or XML::DOM or whatever it is called. Thanks to Robin I
have example code to write the Perl module as I like it ;-)

> With the appropriate Python module (for instance xml.etree.ElementTree,

The only problem is that I am genetically disabled for understanding
Python ;-)

> I have to say, if you find XML difficult to read by a program, then
> maybe, just maybe, you should look on the side of the language in use.
> ;-)

No, the difficulty is in the documentation ... and my laziness, as

> >   We want to include at least
> >   - title/long description
> >   - some version/license information
> >   - (taggging information?)
> Easy to add:
> Tags: foo, bar, ...

Yes, that was the
	tags ...

> >> you prefer XML or RFC-2822 format? I saw you have some grief about XML,
> Maybe I have changed your mind now. ;-)

No. There is one big advantage:

Question: Is any file included in more than 1 TLPOBJ?

	current format:
		grep '^ ' texlive.tlpdb | sort | uniq --repeated
	xml format:
		shoot yourself ...

If you need more examples ...

Another advantage was that I could write a (extremely dump, but working)
shell library to access stuff in the tlpdb quite fast. Now, how to do
this in XML???

The question is about *WHAT* do we win when using xml wrt some
structured text. I see nothing.

> Easy... but it's probably better to do that:
>   - when the syntax is more or less settled;
>   - or when I actually need it for the tool we're discussing about.
> Or does someone already needs Python modules for this stuff??

No, and it was more a joke. But still, later on that would be nice.

> > I could even imagine that, if we do the tagging on a per package level,
> > that we add the tagging to the TLPDB, and then we have the TLPDB of
> > installed stuff, the TLPDB of available stuff of the TeX LIve
> > installation media, and the additional .xml/whatever files dropped into
> Exactly.

See below for the attributes for doc files.

> > - tagging is done on a per package level, not per file level
> OK.
> Hum, well, does everyone agree? :)

Irrelevant, it is only you and me. Since I will implement this on the
TeX Live side, and the net win will be for everyone. But see below,

> > - tags are taken from the catalogue when generating the to be shipped 
> >   texlive.tlpdb and stored there

Furthermore I propose that we could extend the format of the docfiles
lines as follows:
docfiles size=*****
 file1 attrib1=value1 attrib2=value2 ...
 file2 attrib1=value1 ...

That is an easy extension of the syntax, and we could carry over the
Catalogue contained attributes of files to the texlive.tlpdb. 

Furthermore, we add (optionally) for every TLPOBJ a line
	tags <tag1> <tag2> <tag3> ...
to get the per packages tagging.

Now the only problem is that we have to get this information from the
Catalogue to the texlive.tlpdb and its generation time. This is a
problem only for me I guess, but this I can handle.

This way you don't have to have access to anything else but the
texlive.tlpdb, local.tlpdb, local added .xml/whatever files for

Does this sound reasonable?

> We're almost there. :)

If you agree on that, we should start (in private email) to write a
decent proposal with:
- rational
- format changes to the infra structure of TeX Live
- changes necessary for the Catalogue
  . DTD changes
  . upload/handling changes
- specification of a file format for the upload specification

This we will bring back to attention here, to the CTAN guys.

>From my side as TL guy I see no problem in adding those tags/attributes.
It will not blow up the tlpdb too much.

For the CTAN it actually depends on the changes, but AFAIS now we only
need one more XML whatever entity for the tags.

Best wishes


Dr. Norbert Preining <preining at logic.at>        Vienna University of Technology
Debian Developer <preining at debian.org>                         Debian TeX Group
gpg DSA: 0x09C5B094      fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
The combination of little helpful grunts, nodding movements of the
head, considerate smiles, upward frowns and serious pauses that a
group of people join in making in trying to elicit the next
pronouncement of somebody with a dreadful stutter.
			--- Douglas Adams, The Meaning of Liff

More information about the tex-live mailing list