[tex-live] General test suite for TeX-Live

Mon Jun 20 11:08:05 CEST 2016

2016-06-20 0:45 GMT+02:00 Ken Moffat <zarniwhoop at ntlworld.com>:
> On Sun, Jun 19, 2016 at 09:33:41PM +0000, Karl Berry wrote:
>
> (replying to Uwe, I think)
>
>> All I have to add is that my previous attempt at automated testing
>> (years and years ago) is in ./Build/tests.  The directory names will
>> give the idea:
>>
>> ./Build/tests/
>> Makefile       dvi-latex0-small2e/   largefile/
>> README                 dvi-latex1-sample2e/  pdf-context0-hello/
>> asytestlibs.asy  dvi-latex2-pdfprim/   pdf-context4-select/
>> checkdvi.pl*   dvi-latex5-tugboat/   pdf-latex0-small2e/
>> checkpdf.pl*   dvi-tex0-story/       pdf-tex0-story/
>> common.mak     dvi-tex5-tugboat/     tryconcat/
>>
>> My approach for PDF (DVI is easy) was to make images and compare them.
>> Turned out to be impractical, not surprisingly.
>
> Which has an impact on the points you've copied in below (thanks for
> doing that - it always helps if links can be read at some point in
> the future).
>>
>> The advent of l3build and "reproducible builds" for PDF should make the
>> idea much more viable.  -k
>>
>> P.S.
>> > https://piratenpad.de/p/TeXLiveTesting
>>
>> I don't understand why you put ideas in some read-only temporary url.
>> Here is the text you wrote, for reading/archival on the mailing list
>> like everything else in the thread.
>>
>> Thoughts regarding automated TeX LIVE Testing
>>
>> Requirements
>>
>> * must run on at least Mac OS, Windows and Linux, more platforms are appreciated
>> * shall not require manual checks
>
> Nice if you can achieve it.
>
>> * shall be self contained
>
>
> I don't think I understand what you mean - e.g. you mention using
> ImageMagik which is a separate program, and probably there will be
> other external programs.
>
I hope I understand this idea. It replaces the need of visual check.
Suppose you have an output from TeX Live "year-1" and the output from
the latest versions. Both have equal number of pages. You can now
convert the pages to images and use ImageMagick to subtract them.
Ideally the resul should be white pages. You can them make a test to
distinguish different pages from rounding errors.

>> * shall test a majority of user scenarios (letters, reports, books, etc.)
>
> In my limited experience (and to be honest, I'm dubious about
> posting on this because my latex skills are so limited) the fun
> comes from various things within the texsphere - letters, reports,
> etc should be tested but they may be tangential to likely problems.
>
> I have my own tests for Beyond Linuxfromscratch which I mentioned
> the other day, but they are fairly minimal and do require manual
> review (I adapted my xindy tests to make sure the index did get
> created - but only after some changes in how _I_ built it caused
> failures).  And they are definitely not useful in general although
> they should work on any 'nix-ish system with bash and a PDF viewer.
>
There is a philosophy around the trip and trap tests made by D. E.
Knuth. He tests a lot of things that can be verified automatically,
does not do such tests that require  visual check because users will
spot it during normal work.

>> * many small tests may be helpful, bigger test scenarios must not be omitted (I, for example, ran a 600 pages dissertation as part of my testing)
>
> I know that CPU power continues to increase, and tests can be run in
> parallel if you can keep the details separate, but for repeated
> testing smaller is probably better.
>
Minimal tests are much better not becaus they run faster but because
they clearly say what went wrong.

>> * cannot test each and every scenario users may come across
>>
> Indeed, that is why people have to test for themselves.
>
>>
>> Design Ideas
>>
>> * use python (as I know it best)
>> * make use of its unit test facilities
>
> For a proof of concept, whatever you are familiar with.
>
>> * run a testcase by calling TeX engine n-times on specific file
>
> This I do not understand.  I understand using the engine the
> required number of times - and interspersing that with calls to
> other parts of TeX such as xindy or asymptote - but I'm not at all
> clear what you are proposing.
>
>> * check if the process ended properly
>
> I don't know about your preferred python (I try to avoid scripting
> languages where whitespace is important ;) but when my own tests run
> from a Makefile hang in TeX I have to key 'x' to stop.  And
> sometimes TeX reports normal status but part of the test did not
> work.
>
I have the same feeling, I do not like a language where a white space
is an important element. I do not need python for my work (I just have
it installed) but as you could see in another thread, even perl
scripts distributed with TL do not work in Windows. I would not be
surprized if a Windows version of python were incomplete. The
existence of lua in TL motivates me to learn lua, not python.

The status code is not the best way to use in such tests. It will be
necessary to analyze the log file. The test files can be designed in
such a way that the analysis of the log should not be difficult.

>> * proper process ending does not mean that file is correct
>> * Gather statistics on file: Size, number of pages, ...
>> * Convert generated file to image and run imagemagick to collect statistical data.
>
> This goes back to what Karl wrote above - what guarantee is there
> that the version of ImageMagick you use today will give the same
> results as the next version you happen to use ?
>
There are basic operations that must always work. First the number of
pages must match and the page size must match, if not, the output is
not equivalent. Unless there is a bug in ImageMagick, subtraction must
work and calculation of non-white pixels must work. It is not
necessary to use sophisticated filters that may (and probably will)
depend on the version of ImageMagick. Anyway, this test need not be
the most important.

> Also, I have no experience in using ImageMagick to split a
> multi-page PDF into individual page images, but I suspect there
> might be a large overhead and perhaps other tools such as qpdf may
> be better for that.
>
>> * Compare statistical data to known result. If theshold > level ==> testcase failed.
>
> I'm not a statistician, but I suspect that the degree of acceptable
> variation will probably differ for each test.
>
Not for the text but for the font. If you use the same resolution, the
roundoff errors in both documents will be the same but if the font is
updated and the glyphs are modified, you will get a different output
even if the metrics of the font remained unchanged. It might be useful
to subtract the images and before counting the non-white pixels apply
a filter based on algorithms of mathematical gnostics. It will remove
noise but retain patterns that signal real differences.

> Interesting ideas, and don't let me discourage you.
>
> ĸen

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz