[tex-live] Better ways to find packages and documentation

Norbert Preining preining at logic.at
Thu Jul 5 08:42:33 CEST 2007

Hi Florent,

On Don, 05 Jul 2007, Florent Rougon wrote:
> > The only problem is that I am genetically disabled for understanding
> > Python ;-)
> Tsssk tsssk, did you even *try*?

No ;-)

>                    < Python is executable pseudo-code. >

Perl, too.

> > 		grep '^ ' texlive.tlpdb | sort | uniq --repeated
> Nope, this only works if two packages have the exact same file *paths*.

This is *what*I*want*to*check*! I don't care whether a file README
occurs 200 times in our database, only that a specific file is included
in more than 1 package (imagine the funny things in the Debian packages
if we have the same file included 2 times ...)

> The more useful thing would be to detect files with the same basename

Why??? This is completely useless, no? Ok, it is useful for something
else, for checking that we didn't do a packaging error and putting files
in multiple places.

> Anyway, your shell library is only doable in full shell if the format is
> very simple. Otherwise, it becomes a *real pain* to parse correctly. For
> instance, you'll tell me how to parse this in pure shell:
>  path/to/manual.pdf details="pdfTeX User Manual" language="en"

This will be hard. True. But I have my perl modules ...

> > Furthermore I propose that we could extend the format of the docfiles
> > lines as follows:
> > docfiles size=*****
> >  file1 attrib1=value1 attrib2=value2 ...
> >  file2 attrib1=value1 ...
> This is more or less OK, but looks more and more like XML. :)

Yes, but we are growing from below, not above. That makes the

>   <documentation details='Manual, PDF version:'  language='en'
>                  href='ctan:/macros/latex/contrib/hyperref/doc/manual.pdf'/>
>   <documentation details='Summary of options:'  language='en'
>                  href='ctan:/macros/latex/contrib/hyperref/doc/options.pdf'/>

No problem. We just require that no " is embedded, than my parsing in
perl is trivial. 

And don't tell me that in the language attribute and the details we need
embedded ".

>  file1 attrib1="value1" attrib2="value2" ...
>  file2 attrib1="value1" ...

Ok, no problem.

> and you'll have to make up yet another quoting scheme for the cases
> where we need a double quote in an attribute value... See how you're

No, no quotes within quotes.

> ,----
> | Theorem (F. Rougon, 2007)
> | 
> | Any custom text file format tends to become a degraded version of XML
> | as adding features requires to extend it.
> `----

And that makes it easily parseable, the degradation ...

Anyway, we are not discussing this. 

> > That is an easy extension of the syntax, and we could carry over the
> > Catalogue contained attributes of files to the texlive.tlpdb. 
> Sure, but you'll have to explicitely carry over every needed attribute.
> Since the Catalogue DTD isn't likely to change every day, this won't be
> a problem in practice, though...

Ok, so what I see is that we have to carry over 
for the doc files.

> > - format changes to the infra structure of TeX Live
> I suppose this should be in the files describing the TL infra in SVN,
> no? I mean, the CTAN maintainers don't need to approve changes in the TL
> infra, only those to the Catalogue, do they?

Well, currently I consider the pod documentation in the perl modules the
definitive specification. There I wrote more details and specification,
but in general they agree.

And yes, it is only between you, me, and Karl. Ah, and Reinhard because
he plans to write / is writing a new installer based on this, so I guess
he needs this information, too.

> > - changes necessary for the Catalogue
> >   . DTD changes
> >   . upload/handling changes
> Well, for upload and things like that, we can propose several
> possibilities as already mentioned, but there are policy decisions that
> *the CTAN maintainers* have to make, such as whether to use override
> files, whether to blindly trust metadata from package official
> maintainers, how to authenticate them, etc. (yes, I don't think they'll
> want to require PGP-signed uploads, so authentication is probably
> impossible to achieve...)

True, but we can propose something keeping in mind that we don't add too
much overhead for them.

> > - specification of a file format for the upload specification
> Really, we shouldn't make this up without their input. There are many
> possibilities, with XML, RFC-2822, etc. Well, we can always propose
> something, but chances are good it will end up in the trashcan, so...

Right. Maybe we separate this out and do this AFTERWARDS.

> Surely, long descriptions will blow it up much more. But you *have* to
> be able to cope with spaces in attribute values, in order to store the
> short description for each doc file.

Ok, but I will ignore the problem of embedded quotes.

> Question:
>   Where will I find the various tlpdb files on the installed system?

This is not specified till now. We have not agreed on this for now.

I would say that on the DVD it will be just in the root of the DVD and
named texlive.tlpdb.

ANything is still unclear. I would say that the installer creates
something like 
	local.tlpdb, or installed.tlpdb, or whatever.

What is not clear is:
- should we copy the DVD texlive.tlpdb to the hard disk?
- what about stuff available from the network?

>   Currently, there is only one such file, but there are several TEXMF
>   trees, so it is either in only one of the them (ugh), or preferably

It will be one of them, because we have the texmf-dist/texmf-doc/texmf
names embedded. So all filenames in the telxive.tlpdb are relative to
TEXLIVEROOT, not to the TEXMF tree.

After Taco issuing a bit of critic I proposed on the context list that
we could add another tag
	relocatable yes
in which case all trees are relative to a TEXMF tree, ie *without* the
texmf-dist/texmf/texmf-doc prefix.

The problem for texlive with making *everything* relocatable is that the
binaries are in bin/... and the respective packages also contain files
in texmf/.... And we don't want to treat these trees specifically.

What one COULD imagine is that for category Package and category
Documentation we remove the texmf-dist and texmf-doc, resp., prefix.
I don't see an immediate advantage, but it is doable.

OTOH adding this relocatable bit would allow package writers to provide
tlpobj files for their package, and make it easy to install, add to the
tlpdb etc.

Details need to be worked out.

> Also, I don't know much how you make the Debian packages, but will you
> be able to easily adapt all this for Debian, since we don't have the

I am schizophrenic, I am not thinking about this NOW. Here my primary
obligations are with TeX Live for now. Adaptions for OS packaging should

> > For the CTAN it actually depends on the changes, but AFAIS now we only
> > need one more XML whatever entity for the tags.
>                     ^^^^^^^^^^^^^^^
>                         element
> An entity is something like "&foobar;".
> (yes, I know you may be doing that intentionally to make XML look
> complicated :)

No, I am not doing this in anyway to make XML look bad whatever. I just
have NO NO NO idea about XML, and how to best parse it, etc etc. And I
was the one who wrote the stuff. Bad luck for the others ;-) Don't take
what I say as an affront, I am just trying to make the best for TeX
Live. And it is just simpler for those actually doing the programming to
use structured text files than xml. 

Even if it is in principle better to use xml, it is still some practical
decision about who is programming this stuff. And that is Karl,
Reinhard, and me.

Best wishes


Dr. Norbert Preining <preining at logic.at>        Vienna University of Technology
Debian Developer <preining at debian.org>                         Debian TeX Group
gpg DSA: 0x09C5B094      fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
One who is employed to stand about all day browsing through the
magazine racks in the newsagent.
			--- Douglas Adams, The Meaning of Liff

More information about the tex-live mailing list