[tex-live] [LONG] Improving TeX package classification and the associated documentaion

Florent Rougon f.rougon at free.fr
Mon Jul 2 12:39:34 CEST 2007


Thanks for the reply.

Norbert Preining <preining at logic.at> wrote:

> On Mon, 02 Jul 2007, Florent Rougon wrote:
>>   Each package (in the sense of CTAN package, not Debian) contains an
>>   XML file that specifies the following:
> How will we ever get at this?

Well, probably not, but I don't think it's a big problem:
  - if package maintainers write their own metadata, well, that's great
  - the most important packages (geometry, graphicx, etc.) will get
    tagged in a reasonable timeframe by either their maintainer or by
    volonteers (yeah, could be me, but preferably some LaTeX guru :).
  - each package that is not tagged is simply not accessible through the
    tag-based search or browsing facilities in the tool I'm thinking
    about. We can still have an alphabetic classification and a poor
    man's search that simply looks at package names. These packages are
    just (much) less easy to find. If someone likes one of them and
    finds it's too difficult to find, then he can tag it, dammit!

Basically, the database need not be complete to be useful. Having only
the "important" packages tagged would already be very helpful to users,
I think.

> Furthermore I would prefer *not* to have new stuff on CTAN.

I'm not sure what the problem with "new stuff" is, but...

> I would suggest to add this information in some way
> to the TeX Catalogue

That can be done. But then, the burden of tagging the packages relies on
the shoulders of the sole catalogue maintainers, and I fear that this
way, we cannot ever have any significant part of CTAN tagged (I'm not
implying that the catalogue maintainers are lazy, rather that CTAN is

That's mainly why I wanted the metadata to be part of CTAN packages:
careful package authors will tag their packages in a good way. That
would make their packages easier to find without causing any additional
work for the catalogue maintainers.

If you really want the metadata to be part of the catalogue, there's
still a way to have it updated "collectively": using a web interface or
a custom network client. But doing it properly requires validation of
the tagging by some devoted soul, so this is complex and requires quite
some additional work. Besides, web interfaces are not my area; I am not
volonteering to write anything like that. A custom client can easily be
made portable to current systems, yes, but this is real work (well,
actually, something like that already exists in debtags and the client
sends "tag patches", but I'm not sure the client is portable enough and
can be reused for cataloguing CTAN instead of Debian).

> (where it belongs)

Hmmm, depends on the POV. :)

Yes, if you look at the current state of catalogue implementation, that
is where it belongs. But you can't say it's not natural to have the
metadata embedded in each package (basically, you're telling that we
should upload our Debian packages without debian/control and then copy
debian/control ourselves for each upload somewhere on
ftp-master.debian.org. Ugh! :).

> Another problem is that sometimes packages on CTAN don't directly ship
> documentation files, but they have to be created.

Ah, that is indeed a problem I hadn't thought about. In these cases,
what happens on the catalogue side?

  (1) No doc is listed on the web interface.


  (2) The catalogue or CTAN maintainers build the doc themselves, store
      it in CTAN and point to it from the catalogue.

If (1), then we (TeX Live) are on our own and have to build the doc
ourselves. I won't develop this case for now because this becomes a bit
messy and have the impression that a better solution would be to enforce
that each CTAN upload has the full documentation built. But if there are
good reasons against this, I can devise solutions.

If (2), then I think the CTAN package should be stored in what I'll call
"definitive form" *with* its documentation, and the metadata (be it in
the package or the catalogue) could then point to the various doc files
present in the package. Then, we're back to square one and can follow my

This has an important implication: that TeX Live adopts the same format
for documentation as CTAN. Yes, I know this won't make everyone happy,

  - I believe this is the simplest and cleanest way from the POV of
    information structure (package/documentation/metadata);

  - In many cases, there is an optimal format for a given documentation:
    if there are figures, DVI is ruled out and we need either PS or PDF.
    If there are links from one file to another, I believe PS is ruled
    out too. Moreover, PDF files with the navigation table (hierarchical
    bookmarks) on the left are far more convenient than PS files without
    such a table for not-so-short doc files IMHO. So, you should be able
    to guess my preferred format for most doc. :)

> If we restrict ourselves to TeX Live we can use the
> 	docfiles
> entry in the TeX Live database, but there are no tags on these files in
> any way (and currently no way to tag them, but this can be changed).

I believe it would be a shame to restrict ourselves to TeX Live, but if
we don't adopt the same doc format as on CTAN, I think this will have to

> If we have to tag all the stuff that would be impossible. OTOH we cannot
> urge the package writers to write some tag specification.

As said previously, the database need not be complete to be useful...

> Furthermore there are those packages which are not supported anymore,
> i.e., no one is responsible for them.

These can be tagged! You can then trivially avoid cluttering your search
with obsolete packages.

Unmaintained packages are a slightly different case. Often, you will
prefer a maintained package to an unmaintained one providing the same
functionality, but that is not always the case. So, in this case, the
tag is not necessarily used as a binary filter, but can be used as a
hint among others helping the user make his choice.

> One way out of this dilemma I see is:
> - we migrate the CTAN upload procedure to the experimental one currently
>   in testing phase

Sorry, I don't know what this experimental procedure consists of.

> - we encourage package writers at upload time to add some tags (drop
>   down lists, checkboxes, whatever)

Yes, that can be done. But I have the impression you're seeing it as a
web interface, and I repeat I'm not the one who will code it. That said,
I'm not opposed to such an interface, if done correctly.

> - documentation files could be gathered automatically from the ctan2tl
>   script which would be executed in the background

I suppose this is the magic script that will be able to tell me where
each doc file is installed. OK. What we need is:

  - installation path for each file;

  - the CTAN package it belongs to;

  - ideally, some way to link each file to its metadata, so that we can
    know the language each doc file is written in and can tell when a
    file is an index (entry point) in the list of doc files installed by
    a package.

    As said above, this should be easy to do if the files installed are
    the same as in the (definitive) CTAN packages, because then the
    metadata can be part of the catalogue or of the CTAN package;
    otherwise (e.g., if CTAN has no doc or the doc in other formats),
    matching the metadata from CTAN with the files installed in TL
    becomes messy IMO.

> This way slowly stuff would get tagged, and for old packages one of us
> would have to take the work to go through them.

Well, I don't think some volonteer will ever tag the whole CTAN. Too
much boring work. A set of volonteers could probably, yes, but it is
more likely that old unmaintained packages will remain untagged. That
said, it is not a regression from the current state, and I don't think
it's a real problem. Obsolete/obscure packages are more difficult to
find[1], so what?...

As far as TL (as opposed to CTAN) is concerned, that might be a bit
different, as the number of packages is, I think, more manageable. But
probably the same will happen (obsolete/obscure packages untagged, at
least during the first years), just on a smaller scale.


  [1] Unless their maintainer makes the effort of tagging them, in the
      case of the obscure package, of course.


More information about the tex-live mailing list