[tex-live] "fmtutil-sys --all" may not return error exit code under low free space condition always.

Zephyrus C zephyrus8080 at gmail.com
Wed May 28 17:15:15 CEST 2014


Hi list,

Thank you for offering the great package incoporating many packages
under one umbrella.

I would like to report a problem and patch.

I have experienced a very rare and hard-to-diagnose problem
presumably caused by low free file space condition during installation
of TeX-related packages, especially texlive package.

This happened under Debian, but the problem can happen elsewhere, too.

PROBLEM:

After a seemingly successful installation, trying to run tex-related
commands from the tex installation failed because it could not find
and produce .fmt files.

A discussion in Debian bug report system indicates that the
cause is very likely to be low free-space condition.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=748962

But there was no smoking gun back then.

What irked me most is Debian GNU/Linux installer did not report any
abnormal conditions during the installation, and it printed that it
had successfully loaded TeX-related packages.
But the end result was that the installation was not usable later
since .fmt files were not created properly. This confused me to no end.

DEBUGGING:

So I started debugging.

I started on the assumption that some commands that are executed
during the installation of TeX-related packages including TeX-live
don't return error exit code properly when they encountered errors due
to low-space condition (errno == ENOSPC).  So I looked for potential
issues in this regards when I start debugging.

Naturally I looked at .fmt files creation which was screwed up as
prime suspect.

FILE: fmtutil.sh is the most likely culprit.

(Others may remain, but I have identified serious bugs in fmtutil.sh
and am proposing a patch.)

After a few days reading and testing the code, here are an observation
and possible improvements that can make a script file used in
TeX-live, fmutils.sh, more robust in the face of a low free-space
condition, and hopefully make it report possible error as exit code to
invoking process in a more reliable way.

============================================
To make a VERY LONG story short, here is a summary:
============================================

Most compelling bug:

An "mv" command was executed in "if" command portion, but there
was NO "else/fi" part to handle its error!
I think this is the culprit that fails to return the error code so
that the Debian package installer could properly failed the
installation.

Now, it has a proper error handler in my proposed patch.  I also noticed
empty (0-byte) .fmt files under /var/lib/texmf/ hierarchy when this "mv"
command
failed due to full file system condition.
So I decided to remove the bogus partial file when "mv" failed.

Also, the trap processing introduced by "set -e" may not be
most appropriate and so I changed it a little.

So, To wrap up, I am attaching a patch.
Please find below the diff of modified fmtutil.sh

fmtutil is actually /usr/share/texlive/texmf-dist/scripts/texlive/fmtutil.sh

diff of modified fmtutil in /tmp/fmtutil.sh
and
/usr/bin/fmtutil which is a symlink to the script under
/usr/share/texlive/...
was produced and attached.

To be exact, I looked at Debian version of the fmtutil.sh, and
so there may be a few Debian-specific changes in the original
fmtutil.sh with which I started.


Thank you in advance for your attention.

---

Here is the MUCH LONGER version.

===============
VERY LONGER version
===============

Here are the gory details that are behind the proposed patch.

There are a few issues with the current code of fmtutil.sh (I downloaded
TexLive 2013 source code from TUG web site, and am describing the
issue. If in 2014 version, it has improved, so much the better.)

To be exact, I looked at Debian version of the fmtutil.sh, and
so there may be a few Debian-specific changes in the original
fmtutil.sh with which I started.


(1) (Unintended or intended?) extra execution of a command when it
   fails in one user-defined function, "verbose" .

(2) When an error of producing a format file is encountered (possibly
   due to full file system condition), during processing when "--all"
   is passed, the error seems to be sometimes ignored, and so the
   error code is not returned as exit code.

(3) The fmtutil.sh script in TeXLive 2013 does not seem to have
    "set -e" to exit as soon as an error occurs during the script
    execution. This flag is in Debian version. I think it is a good
    idea to have it in the script.
    If "set -e" is in TexLive 2014 version already, so much the better.

(4) logged error/report messages are not flushed sometimes.

    Contrary to popular belief, calling "log_failure" does not stop
    processing immediately when fmtutil.sh is trying to produce many
    .fmt files ("--all" flag).  All it does is to accumulate the error
    messages as errors are encountered. At the end of the execution,
    when user-defined function "byebye" is called, the accumulated
    error messages are printed and error return is taken.
    "log_warning" is similar and only that it does not cause the
    script to return by error exit code. (I modify it to report error
    in my patch proposed here.)

    This operation is in theory. Unfortunately, there seems to be a
    loop-hole, an error path that may not invoke "byebye", and thus we
    may not see the error messages flushed at all :-(

The second issue above could be the cause of possible reason why I did
not see the Debian installer fail and .fmt files were not produced
when there was a very low free-space condition during installation.
(And I am more or less convinced that the missing handling of the
failure of an "mv" command executed in if-expression part is the
cause of fmtutil.sh not reporting the error under low-space condition.)

So, I am proposing a patch here.  It consists of changes of functional
specification in a few places and a set of added comments for better
maintainability.

With this patch, fmtutil.sh is much more robust, and the installer
will catch more errors due to low free-space conditions during the
installation with the patch, and this should save support man-hours
later on.

I know everyone is busy with 2014 release. So once the dust settles
down, I would appreciate to have someone look into this matter without
hurry.

VERY LENGTHY DESCRIPTION:

Sorry about this lengthy writing, but the code seems to have lasted
more than a dozen years, and there are so many things that accumulated
over the years that introduce possible errors during low-space
condition. So I am afraid that you have to read the long description
below to understand some subtle changes proposed in the patch.

"fmtutil-sys --all" may not return error exit code under low free space
condition.

(1) I found that fmtutil.sh has a slightly buggy function.
(fmtutil.sh is part of texlive-base and is called eventually when
fmtutil-sys --all is called during installation to create .fmt files.)

I am attaching a patch to re-define the problematic function and to
replace log_warning with log_failure to signal the calling program
that a failure occurred.

Summary regarding "verbose": Well, "verbose" alone is worth a single
post :-(, but I am including the whole description in one post.

There is one shell-function "verbose()" which is used to invoke a
passed command and redirect stderr based on the exit value of
$mktexfmtMode (which is either "true" or "false").

Here is the definition of "buggy" verbose()
from fmtutil.sh ( linked from /usr/bin/fmtutil to
/usr/share/texlive/texmf-dist/scripts/texlive/fmtutil.sh)

###############################################################################
# verbose (cmd)
#   execute cmd. Redirect output depending on $mktexfmtMode.
###############################################################################
verbose()
{
  $mktexfmtMode && ${1+"$@"} >&2 || ${1+"$@"}
}


This function as it is written today is buggy in the sense that if the
command fails when $mktexfmtMode is "true", the failing command is
executed TWICE (!). The first execution is with the redirection of
STDERR. The second execution is without such redirection.  I think
this is quite unintentional.

Or could it be intentional?

Maybe the thought was that if the redirected execution failed, let us
re-execute the command without error redirection so that the error is
visible on invoking console.

But there is a ramification regarding exit code if this were the
original intention.
The exit code from this execution of
"verbose other command options ..." becomes,
when the command fails on the LEFT Hand Side (LHS) of "||",
it is the exit code of the executed command (without re-direction) on
the right-hand side (RHS) of "||".

Under very space-tight conditions of a local file system, the failure
may happen while the stderr is re-directed (LHS), but a failure may not
happen when the command is executed as is without re-direction on the RHS
(!)
(However, if the command only tries to produce .fmt file, then the
this may not be relevant?)

Also, more to the point. I am not entirely sure what is the returned
exit code when the RHS of "||" fails, depending on the version of the
shell. We may not get the error condition that triggers 'trap' at all (!?)
That is one reason I want a mundane use of if/then/else/fi proposed in
the patch.

cf. Shell version dependency. Background:

[1] Here is a quote regarding the condition of error handled by trap from
"bash" man page.

About "trap [-lp] [arg] [sigspec ..]
    ...
     If a sigspec is ERR, the command arg is executed whenever a simple
     command has a non-zero exit status, subject to the following
     conditions. The ERR trap is not executed if the failed command is part
     of the command list immediately following an until or while keyword,
*    part of the test following the if or elif reserved words, part of a
*    command executed in a && or || list, or if the command return
     status is being inverted using !. These are the same conditions obeyed
     by the errexit option.

     => I am not entirely sure if the failure in the RHS of "||" in
     |verbose()| is trapped because of the above description.
     In the current version, maybe.

[2] A quote: "dash" man page (under Debian GNU/linux, it seems /bin/sh is
a link to /bin/dash),

    -e errexit' If not interactive, exit immediately if any untested
     command fails. The exit status of a command is considered to be
     explicitly tested if the command is used to control an if, elif,
     while, or until; or if the command is the left hand operand of an
     ''&&'' or ''||'' operator.

 => OK, so the left hand execution when it fails does not cause the
    trap to take effect even if it fails, and only the failure on the
    RHS is noticed. Hmm...

[3] Venerable Single Unix Specification OpenGroup Posix specification
document.
http://pubs.opengroup.org/onlinepubs/7908799/xcu/chap2.html

This is POSIX!

-e
    When this option is on, if a simple command fails for any of the
    reasons listed in Consequences of Shell Errors or returns an exit
    status value >0, and is not part of the compound list following a
    while, until or if keyword, and is not a part of an AND or OR
    list, and is not a pipeline preceded by the "!" reserved word,
    then the shell will immediately exit.

     => So it doesn't say explicitly that the failure of the right
     hand side of "||" triggers the error processing defined by
     trap immediately. Thus, POSIX accommodates the both positions of
     "dash" and "bash" beautifully :-)

    cf. Here "AND" list or "OR" list is defined in POSIX.
     AND Lists
     The control operator && denotes an AND list. The format is:

     command1 [ && command2] ...

     First command1 will be executed. If its exit status is zero,
     command2 will be executed, and so on until a command has a
     non-zero exit status or there are no more commands left to
     execute. The commands will be expanded only if they are executed.

     Exit Status: The exit status of an AND list will be the exit
     status of the last command that is executed in the list.

     OR Lists
     The control operator || denotes an OR List. The format is:

     command1 [ || command2] ...

     First, command1 will be executed. If its exit status is
     non-zero, command2 will be executed, and so on until a command
     has a zero exit status or there are no more commands left to
     execute.

     Exit Status: The exit status of an OR list will be the exit
     status of the last command that is executed in the list.


Also, in fmtutil.sh, originally trap processing invokes "cleanup" only
and message buffer was not flushed. Now trap calls "byebye" and
message is flushed.  BASH allows the 'ERR' specification, which
would trigger trap processing, as a value to specify the error of a
command execution in shell script. When "set -e" is used and a command
fails, a trap would be taken with "ERR".  But this is only available
in BASH.  In the end, I inserted "if" checks in a few places instead
of relying on "set -e".


FURTHER DETAILS:

DETECTIVE WORK

My bet was that when "fmtutil-sys --all" was called to create all
required .fmt files on my PC, some invocations of it failed due to low
free-space conditions but did not return the exit error code
correctly.

So I checked the calling sequence that would happen when "--all" is
passed to fmtutil, and found that the user-defined functions are
called in this order.

main
 ->
  recreate_all
  ->
   recreate_loop
   ->
    run_initex
    ->
     verbose

In "run_initex", the following piece of code is where the meat of
processing takes place.

--- quote
  verboseMsg "$progname: running \`$engine -ini  $tcxflag $jobswitch
$prgswitch
$texargs' ..."

  # run in a subshell to get a local effect of TEXPOOL manipulation:
  (
    # If necessary, set TEXPOOL. Use absolute path, because of KPSE_DOT.
    $localpool && { TEXPOOL="`pwd`:$TEXPOOL"; export TEXPOOL; }
    verbose $engine -ini $tcxflag $jobswitch $prgswitch $texargs
  ) </dev/null
--- end quote

I realized that "verboseMsg" is to print out what is going on to the
log.
and somehow user-defined "verbose" command was used to invoke the
command, "$engine -ini $tcxflag $jobswitch $prgswitch $texargs".

This slightly buggy function is called within "run_initex"
(yes, that is where the fmt files are created eventually!)

This "verbose" function should be rewritten to avoid the re-execution
of the command "${1+"$@"}" when it fails when "$mktexfmtMode" is true.

Correction-1:

verbose()
{
    if $mktexfmtMode
    then
         ${1+"$@"} >&2
    else
        ${1+"$@"}
    fi
}

If you are in doubt, you can see the repeated execution of command in
the following simplified test script.

# cat /tmp/t-buggy.sh
#!/bin/sh
#

# This script is to show that a failure of
# a command executed in verbose may be executed twice.

# false is not a good failure command sample

set -e

rc=0

verbose()  # original. This is buggy!
{
  $mktexfmtMode && ${1+"$@"} >&2 || ${1+"$@"}
}

#
run_initex()
{
  # run in a subshell to see if this changes anything.
  (
      echo "run_initex called"
      # DO SOMETHING THAT SHOULD FAIL
      cp /dev/null /etc
  ) </dev/null
}

# should succeed
verbose echo "hello, world"
echo "rc = $?"

echo before cp /dev/null /etc
# should fail
verbose cp /dev/null /etc
echo "rc = $?"

echo before subshell execution
# should fail
# run in a subshell ...
  (
    verbose cp /dev/null /etc
  ) </dev/null
echo "rc = $?"

echo before run_initex

mktexfmtMode=true
rc=1
# should fail since run_initex invokes false inside.
verbose run_initex $rc
echo "rc = $?"

echo before cp /dev/null /etc
# should fail under ordinary user account.
verbose cp /dev/null /etc
echo "rc = $?"

exit 0


The above is the content of the script.
Now let us run it.

$ /tmp/t-buggy.sh || echo failure
hello, world
rc = 0
before cp /dev/null /etc
cp: cannot create regular file '/etc/null': Permission denied
cp: cannot create regular file '/etc/null': Permission denied
failure
$

Please notice the repeated execution of the failed command by "verbose()".
That is because when "$mktexfmtMode" is true the passed command is
executed with redirection.
In this case, the passed command "cp /dev/null /etc" should fail for
ordinary user.
But it is then  re-executed because of ||. Shell figured that Left-Hand
Side (LHS) failed and so tried to execute Right-Hand Side (RHS), too.

Now, when I rewrite verbose() in the suggested manner, what happens?
With fixed verbose (as in correction-1 above):

$ /tmp/t-OK.sh || echo fail
hello, world
rc = 0
before cp /dev/null /etc
cp: cannot create regular file '/etc/null': Permission denied
fail
$

Great. It is executed only once and I think this is the intended behavior.

BTW, suppose the intention of "verbose" was to capture whatever error
message (to file descriptor 2) to a file associated with STDOUT (file
decriptor 1) [I may got confused here and 1 and 2 may be swapped] and
later check error condition by searching for a certain string pattern
in saved re-directed output, we may encounter a subtle problem *IF and
ONLY IF* the command fails during an (almost) full file system condition
when output is redirected (LHS), AND IF it succeeds without such
redirection (RHS).

Now, if the original intention was to run the command in
non-redirected manner when it fails in re-directed manner, an explicit
test using if/then/fi is still better since the individual command is
more likely to cause the shell execution to trap if an error occurs,
and return exit code as it should (and we don't have to worry about
the obscure trap handling of different version of shells to boot!)

Correction-1':
verbose()
{
    _v_repeat=false;
    if $mktexfmtMode
    then
         ${1+"$@"} >&2   # failure should trigger trap in this form.
     if test $? = 0
     then
         :
     else
          _v_repeat=true;
         fi
    fi
    if $_v_repeat
    then
        ${1+"$@"}
    fi
}

(2) log_warning

 Near line 765, we have the following code snippet in fmtutil.sh:

--- quote
  if test -f "$fmtfile"; then
    grep '^! ' $format.log >/dev/null 2>&1 &&
      log_warning "\`$engine -ini $tcxflag $jobswitch $prgswitch $texargs'
possibly failed."
--- end quote

Like I said in (1) above, format.log may have failed to capture the error
message
under a very space-tight condition in "verbose()". That is tough.

Anyway, I wanted  see the "log_warning" to be changed to a
failing "log_failure" instead. The reason is that log_warning in the
current form prints a one line warning and then kept on producing
other .fmt files merrily and at the end.
And it seems to forget about the error return
code from the "failure" noticed by '\! '.

Why? As I re-read the code, when user-defined function "byebye" is
called to exit the script execution, if a previous call to "log_warning"
has been made, then the accumulated warning message is printed, but no
error processing is done.  Only when "log_failure" has been called at
least once, the fmtutil.sh returns error exit code of 1 as a whole.

I should note that I was looking at the console when the TeX-related
packages were installed (eventually the installation turned out to be
broken) I did not notice any strange output (I may have missed a line
or two, to be sure.)  Can there be an exit path that does not call
"byebye" and omit printing the accumulated error and warning messages
just in case?

I bet there is!

If "set -e" is set as in Debian version of fmtutil.sh, then any error
caught such as, say, the failure of

    cp "$poolfile" $engine.pool

near Line 729 under near-full file system condition
would cause the shell execution to stop and take exit processing.

Then the setting of

  trap 'cleanup 1' 1 2 3 7 13 15

near line 164, inside the user-function of  "setupTmpDir()"
will cause the script to exit WITHOUT calling the
dumping of accumulated messages performed by "byebye".

"cleanup" referred to in the trap statement above is a user-defined
function.

(Well, at least, it does return exit code 1 when 'cleanup 1' is
executed. So why didn't we see an error that would have caused Debian
installer to report the installation failure eventually.  I now know
why: unpatched fmtutil.sh failed to properly handle the error of an
"mv" command executed in if-command.)

Maybe we could change the above trap into the following so that
message buffer is flushed.

  trap 'byebye' 1 2 3 7 13 15

Norbert suggested we should not stop right there as soon as the first
error was encountered and should try to produce other .fmt files as
many as we can. I concur. So to make this intention clear, I agree to
continue using "log_warning" when "grep" failed.

The code in fmtutil.sh tried to handle errors using "set -e" (in
Debian version), "trap ...", etc.  But I am afraid that the error
handling was not so well structured using "set -e", "trap ...", etc.

So I am suggesting the patch to make it a Little more
organized or straight-forward and make sure that "byebye" is called
even in the face of unexpected error, but I am not sure if I
succeeded perfectly.
But it fixes a few serious bugs that failed to report errors to the
invoking process, and it is definitely better than the original
version.

(I tested this script under full /var partition condition, and the
modified script printed out error messages at the end while the
original failed to do so.)

I attach such output at the end.


Also, I modified the code to exit with error code when "log_warning" is
called (in addition to the case of "log_failure").

There is another usage of "log_warning" near line 796.

--- quote
          if cp "$destfile" "$mplib_mem_file" </dev/null; then
            mktexupd "$fulldestdir" "$mplib_mem_name"
          else
            log_warning "cp $destfile $mplib_mem_file failed."
          fi
--- end quote

The warning should indeed be produced when the file system is full and "cp"
can fail then!

This really should have shown up on my Debian installation log.
But I am not sure if the accumulated warning was printed due to the
possible short-cut exit without calling "byebye" mentioned earlier.

(This particular use of "log_warning" has been changed to
"log_failure" since "log_failure" does not stop execution
right away. Even when "--all" is given, the execution goes up to the
last .fmt creation, and only after that the error messages are printed
in one chunk after warning messages.)


====

OK, that's it.

As a user of linux system with not so large partition, I really need
to make sure that "log_warning" works, and "fmtutil.sh" as a whole
performs to return exit error code at the end with proper setting of
error handlers.

TIA



---
PS:
Now I notice that, if fmtutil.sh is passed "--no-error-if-no-format",
it does not abort when it probably should.

We can't have proper error failure if this option is used.

Such call with --no-error-if-no-format was introduced
in TeXLive 2013 in

./texlive-20130530-source/texk/texlive/linked_scripts/texlive/tlmgr.pl:
$errors += do_cmd_and_check("fmtutil$sysmode --no-error-if-no-format
--byengine $e");

./texlive-20130530-source/texk/texlive/tl_scripts/ChangeLog:    * fmtutil:
add --no-error-if-no-format, don't abort in some cases of

  full quote is:

  2010-07-04  Norbert Preining  <preining at logic.at>

    * fmtutil: add --no-error-if-no-format, don't abort in some cases of
    missing formats

Oh well :-)

I may have to figure out if this possible use of
"--no-error-if-no-format" was relevant on my PC that runs under Debian
GNU/Linux during installation of texlive-related packages.


PS:

NOTE: CAVEAT EMPTOR

Full File system error manifests itself in various manners.

Almost full file system (/var partition is full)
comes in varieties.

In some computer installations, /tmp and /var share the same partition.
Then creating temporary files during shell execution (such as "here
document" of shell script) will fail in addition to the failure to
create .fmt files.
Whereas if /tmp is not full and /var is full, then there is a pattern
of errors which will not be seen when /tmp and /var are both full.
Also, whether /usr and /var share the same partition also change the
error behavior, too.

In my tested environment, these partitions are all different.

So your mileage varies if you test the installation of texlive under
low-space condition using this script.

I am sure that there are OTHER commands that fail and may
fail to report that an error occurred.

But this does not change the fact that the failure to implement the
error handling of an "mv" command in if-command syntax was the primary
cause of my Debian installer's failure to notice the partial
incomplete installation of texlive package, and failure to generate
.fmt files properly.

Sample OUTPUT form the modified fmtutil.sh

The excerpt from the log of a hardened fmtutil.sh under almost full
file system condition such that even superuser fails to write to /var
partition in the middle of fmtutil.sh execution.
(For this test, I only loaded texlive-base of Debian package.
If I load TeX-related more packages, then I would have gotten a dozen
or two .fmt creation failures.)

--- quote ----
fmtutil: running `luatex -ini   -jobname=luatex -progname=luatex
luatex.ini' ...
This is LuaTeX, Version beta-0.76.0-2013070106 (rev 4627)  (INITEX)
 restricted \write18 enabled.

  [  omission ]

50 preloaded fonts
0 words of pdf memory
0 indirect objects
No pages of output.
Transcript written on dviluatex.log.
mv: error writing '/var/lib/texmf/web2c/luatex/dviluatex.fmt': No space
left on device
mv: failed to extend '/var/lib/texmf/web2c/luatex/dviluatex.fmt': No space
left on device
Error: 'mv dviluatex.fmt /var/lib/texmf/web2c/luatex/dviluatex.fmt' failed
Warning: Removing the possibly partial file
/var/lib/texmf/web2c/luatex/dviluatex.fmt left by mv failure ...

###############################################################################
fmtutil: Warning! Some warnings have been issued.
Visit the log files in directory
  /var/lib/texmf/web2c
for details.
###############################################################################

This is a summary of all `warning' messages:
Removing the possibly partial file /var/lib/texmf/web2c/luatex/luatex.fmt
left by mv failure ...
Removing the possibly partial file /var/lib/texmf/web2c/pdftex/etex.fmt
left by mv failure ...
Removing the possibly partial file /var/lib/texmf/web2c/pdftex/pdftex.fmt
left by mv failure ...
Removing the possibly partial file /var/lib/texmf/web2c/pdftex/pdfetex.fmt
left by mv failure ...
Removing the possibly partial file /var/lib/texmf/web2c/tex/tex.fmt left by
mv failure ...
Removing the possibly partial file
/var/lib/texmf/web2c/luatex/dviluatex.fmt left by mv failure ...

###############################################################################
fmtutil: Error! Not all formats have been built successfully.
Visit the log files in directory
  /var/lib/texmf/web2c
for details.
###############################################################################

This is a summary of all `failed' messages:
'mv luatex.fmt /var/lib/texmf/web2c/luatex/luatex.fmt' failed
'mv etex.fmt /var/lib/texmf/web2c/pdftex/etex.fmt' failed
'mv pdftex.fmt /var/lib/texmf/web2c/pdftex/pdftex.fmt' failed
'mv pdfetex.fmt /var/lib/texmf/web2c/pdftex/pdfetex.fmt' failed
'mv tex.fmt /var/lib/texmf/web2c/tex/tex.fmt' failed
'mv dviluatex.fmt /var/lib/texmf/web2c/luatex/dviluatex.fmt' failed
byebye  called
We had error(s).

--- end quote ---

[end of memo]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-live/attachments/20140529/213cc58f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fmtutil-diff.patch
Type: application/octet-stream
Size: 20627 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex-live/attachments/20140529/213cc58f/attachment-0001.obj>


More information about the tex-live mailing list