[tex-live] [LONG] Improving TeX package classification and the associated documentaion

Florent Rougon f.rougon at free.fr
Mon Jul 2 16:54:10 CEST 2007


Norbert Preining <preining at logic.at> wrote:

> The problem is that if one of us writes a meta data file for a package
> because the maintainer does not do it, and we put it on
> CTAN/macros/foobar, and there is a new release, then normally all the
> files are replaced there, so it might happen quite easy that the tagging
> gets lost.

One way out of this is to write the metadata in the "override files" on
a tug server. Then, the data cannot get lost and you get the additional
benefit that you know it was written by "one of us", so could reasonably
be trusted.

For those not familiar with the concept of "override file" (which we use
in Debian), it is a file (for CTAN, maybe one file per package would be
better) that sits in a well-known place on the central servers (e.g.,
tug.org) and always overrides what the package maintainer provided in
his upload.

When a package is uploaded for the first time, there is no override for
this package yet: the ftp-masters (here, CTAN team) have to check the
metadata for correctness. When done, they approve the info by putting it
into the override file. When another version of the package is uploaded,
there are two possibilities:

  - either the metadata didn't change -> nothing to do;

  - or it did and the ftp-masters have to approve it before the upload
    ends up in the archive; if they agree, they update the override
    file; otherwise, they tell the maintainer about the disagreement and
    are supposed at this point to contact him privately in order to come
    to an agreement (another possibility being to ignore the
    maintainer...).

Of course, this can be adapted for TL, in particular if the new upload
infrastructure allows you to identify that the person uploading the
package is its current maintainer, or someone from a trusted group. In
this case, you could decide to bindly trust the metadata and update the
override file accordingly. This is a only matter of policy and I'm not
the right person to define this policy.

> You are right, I see your point, and I agree that this would be the best
> solution. But this is something we should discuss with:
> - the CTAN team
> - the TeX Live team
> - probably on the c.t.t. mailing list?

I have the impression that this ML is the right place for TeX Live and
that some people from the CTAN team are already participating, but maybe
that's not enough.

> Anyone here has an opinion for that?

We could invite people on c.t.t. do join the discussion, but that could
become messy, dunno. Another possibility is to devise a proposal here
and then post to c.t.t. for comments, objections, etc.

>> If (1), then we (TeX Live) are on our own and have to build the doc
>> ourselves. I won't develop this case for now because this becomes a bit
>
> This is already done for many packages, see ctan2tl script, and below
> ;-)

Grrrmpf. OK. I can propose a solution anyway.

With most of our current documentation formats, there is a trivial one
to one mapping between the files:

  foo.dvi <-> foo.ps <-> foo.pdf

There are two tricky exceptions AFAICS: Info and HTML formats. Here,
there isn't always a one to one mapping between foo.dvi and the various
HTML or Info files. *But* there is always an index file in these
formats. So, all I need is that your TL-installation provided data
points to the index file. The rest will be done by the Info or HTML
browser.

Example
~~~~~~~

CTAN ships package foo with author-provided HTML user guide like
this:

    index.html
    foo-1.html
    foo-2.html
    foo-3.html

and additionally a FAQ in text format, foo-faq.txt.

TL or MiKTeX decides that HTML documentation sucks and instead wants to
build and ship foo.pdf. OK, no problem. The metadata on CTAN should look
like that:

  <entry>

    [...]

    <tag>
      field::mathematics
    </tag>

    <tag>
      macropackage::latex
    </tag>

    <documentation details='Foo User Guide' language='en'
                   href='ctan:/macros/latex/contrib/foo/doc/index.html'/>
  
    <documentation details='Foo Frequently Asked Questions' language='en'
                   href='ctan:/macros/latex/contrib/foo/doc/foo-faq.txt'/>

  </entry>

>From this metadata, your ctan2tl script would generate for me a file
containing something like that:

  <package name="foo">

    <tag>
      field::mathematics
    </tag>

    <tag>
      macropackage::latex
    </tag>

    <documentation details='Foo User Guide' language='en'
                   path='/usr/share/texmf-texlive/doc/foo/foo.pdf'/>
    <documentation details='Foo Frequently Asked Questions'
                   language='en'
                   path='/usr/share/texmf-texlive/doc/foo/foo-faq.txt'/>
  </package>

  <package name="geometry">

  [...]

  </package>

  etc.

i.e., the main difference is that your file provides me with the
installation path for each doc file (and also that it compiles the
information for all packages, but that's a minor detail and not
necessary).

With this, I can do everything: I have the set of tags for each package:
therefore, I can offer a nice way to find a package given a set of
criteria. When the package is chosen, I can list its documentation files
with a nice short description for each of them, filter the accepted
languages, fire the appropriate viewer, etc.

At first, I had added an id='foo-guide' attribute in each "document"
element, but it seems it isn't even needed. That would be nice from a
theoretical POV, but if it's a problem for CTAN maintainers, we don't
really need it.

[ But maybe it would make your ctan2tl script easier. In fact, at first,
  I wanted to use such an id to make the link from the TL-installed
  provided file to the CTAN metadata in order to find the language
  attribute for each document, for instance. This is because I didn't
  intend to have all the metadata in the TL-installed provided file, but
  only the id and the path for each installed document.

  But it seems easier if your ctan2l script just copies all the metadata
  as shown in the example; then, it is trivially equivalent to get the
  metadata from locally-installed packages in /usr/local/share/texmf,
  etc., which should answer a question from George N. White III. ]

> Ok, Jim Hefferon from the CTAN team presented on the EuroBachoTeX
> conference an experimental CTAN upload process via a web interface. Main
> points of this was that authors not only upload stuff, but also can
> edit the metadata as present in the catalogue. 

Very good.

> Most important we try to build a TDS package of what is uploaded, so
> that people can download a TDS ready package from CTAN. This is done in
> a horrible perl script called ctan2tl which was developed for TeX Live
> inclusion. We are working on this to update it for new packages, but
> of course it still does not work for all.

A lot of work, for sure...

> Of course this script *has* knowledge of documentation files, because
> everything what is left in texmf-dist/doc/... is a documentation file
> then. 

Good for me. :)

> But nothing of tagging etc.

This can be derived from the CTAN metadata.

> Actually no, everything which is on CTAN and free is (with exceptions)
> included in TL. So TL *is* CTAN in this sense, or better 
> 	TL = ctan2tl(CTAN) + some magic;
> (would be nice if it once will work ;-)

I didn't want to believe TL was so big. :)

-- 
Florent


More information about the tex-live mailing list