[texhax] Low-level TeX question: string substitution macro

Uwe Lück uwe.lueck at web.de
Tue Apr 21 15:55:06 CEST 2009

[Last thread entry by Toby Cubitt, 2007-05-31, copied much below]

Hi Toby,

so you have based cleveref.sty on sed? Traitor !-)

There have been already datatool, stringstrings, ted, xstring for string 
substitution using TeX. Scott Pakin's perltex moreover provides 
("exported") text processing macros that work independently of the Perl 
installation which they were created with (right?).

I have now provided basic setup of *expandable* chains of string 
substitutions processing (TeX) files with essentially catcodes 11 and 12 
only, so you can, e.g.,


in /macros/latex/contrib/nicetext. So far, I have just used it for 
replacing `...' by $\dots$, `etc. ' by `etc.\ ' etc.

By the way, in (My Code Blog)


Steve Hicks reports about controlling a mars rover using TeX. (The 
discussion then considers Metafont as well.)

TeX forever!



This was in thread `enviroment/ifthen':
At 15:32 11.02.09, Toby Cubitt wrote:
>Uwe Lück wrote:
> > You might reason whether your actual task is worth such efforts. If the
> > task arises all the time with new projects and with a large number of
> > different strings, this may be the case. I have made such a thing, yet I
> > can't release it as it is. This works on entire files, not environments.
>If you do want to embark on this foolish quest :-), perhaps the xstring
>package might help?
>Even if the task arises frequently, a simple sed script (or similar)
>will be far quicker to write, easier to maintain, more robust...and
>generally better in almost every way. In my experience, string
>substitution is just not a task that LaTeX is well suited to. It's not
>difficult to integrate running your source through a sed script into
>your LaTeX build procedure (you could even write a quick Makefile).


Last thread entry by Toby Cubitt, 2007-05-31:

Thanks to some very helpful comments from Barbara Beeton (off-list) and
to the on-list replies, I've now more or less got this working.

Since catcodes are fixed when the characters are first read (apart from
special commands like \string and \meaning), it seems there's no way to
do what I want directly. So instead, I first write the unescaped text to
a temporary file, then modify the appropriate catcodes and re-read this
temporary file, writing it out again to the final destination file. The
modified catcodes are in effect when the file is re-read, so the
characters get expanded to their escaped form when they're re-written.

The only thing holding me back from dispensing with the temporary file
is that I can't figure out how to write a newline character to an
external file. None of the following seem to work:

{\lccode`|=13 \lowercase{\write\@stream{|}}}

Is there some way to write out an explicit newline? Please don't just
tell me I could do it by writing the file one line at a time. That's
what I'm doing at the moment, but it requires the temporary file. I have
to loop through the temporary file, reading a line from it and
immediately re-writing it (with escapes expanded) to the final file. I
could store the text to be written in a macro that gets added to each
iteration, and write it all out to file at the very end. But then I need
to insert the newlines manually into the macro so that they appear in
the file when it's written out, hence my question.

In answer to Donald Arseneau's comments: I realise TeX's file
input/output features aren't designed for dealing with anything other
than files containing TeX source. But the file I'm writing *is* mostly
TeX code. The sed script contains rules for replacing one sequence of
LaTeX commands with another. The LaTeX commands to be replaced aren't
known until the LaTeX source file is processed, so I *have* to write out
at least some of the information from within TeX. Given that I have to
write something from TeX, I might as well write the entire sed script
from TeX if I can.

Finally, in reply to Michael Doob: I now think that writing a Perl
script instead of sed would only make things slightly simpler. I would
still need to escape the "\" character inside Perl strings when writing
the script file from TeX, and we're back to my original problem :) By
the way, awk can also be made to escape special characters in a string
prior to using it as a computed regexp, though not in quite so simple a
way as Perl. But I seem to have it working with sed now, anyway.

Thanks for everyone's help, and I hope someone can shed similar light on
my final dilemma.


Toby Cubitt wrote:
 > I'm trying to write an internal macro that does string substitution, in
 > order to escape certain characters in the string before writing it to a
 > file. (The package is supposed to be writing a sed script, so I need to
 > escape characters that have a special meaning in regular expressions.)
 > If this was a user-level macro to be used in the LaTeX source itself, I
 > think can see how it could be done, by changing the catcodes of the
 > characters to be escaped to 13 (active character), then defining these
 > active characters to expand to escaped versions of themselves. (I
 > suppose this would be somewhat akin to LaTeX's \verb command). The
 > trouble is, this macro is to be used in a LaTeX package, and I need
 > something like the following to work:
 > \begingroup%
 > \catcode`|=0
 > |catcode`.=13 |catcode`[=13 |catcode`]=13
 > |catcode`^=13 |catcode`$=13 %$
 > \catcode`\\=13
 > |gdef|@escapechars#1{%
 >    |begingroup
 >    |catcode`|=0
 >    |catcode`.=13 |catcode`[=13 |catcode`]=13
 >    |catcode`^=13 |catcode`$=13 %$
 >    |catcode`\=13
 >    |def\{|string\|string\}%
 >    |def^{|string\|string^}%
 >    |def${|string\|string$}%
 >    |def.{|string\|string.}%
 >    |def[{|string\|string[}%
 >    |def]{|string\|string]}%
 >    #1|endgroup%
 > }
 > |endgroup%
 > \def\@tmpa{\foobar}
 > \expandafter\@escapechars\expandafter{\@tmpa}%
 > It seems I need those \catcode changes outside the macro definition, as
 > well as inside, otherwise the |endgroup and |catcode changes inside the
 > macro aren't recognized properly, though I don't entirely understand the
 > reason behind this. In reality, the \@tmpa macro is of course defined by
 > a much more complicated process than a simple \def (otherwise the whole
 > exercise becomes trivial!), but the above serves to illustrate the scenario.
 > This code is supposed to change the "\foobar" into "\\foobar", but
 > instead it fails with an "Undefined control sequence \foobar" error. If
 > I understand this correctly (unlikely!), the problem is that the "\" in
 > "\foobar" already has catcode 0 (escape character) before it's absorbed
 > by \@escapechars, so TeX expands #1 into "\foobar" with the catcodes
 > already assigned, the catcode changes inside the \@escapechars macro
 > have no effect, and TeX tries to interpret "\foobar" as a command
 > sequence. Is this at all correct?
 > Is there any way to do what I want? If my above analysis is correct,
 > what I guess I need is a command to change the catcodes of tokens, but
 > TeX's abilities in this respect seem to be limited. The \string and
 > \meaning commands can only change tokens to catcode 12 (letter), and the
 > \lowercase command changes charcodes rather than catcodes. Maybe there's
 > a completely different way of achieving what I want?
 > I've tried to reduce this question to its bare essentials, but if it's
 > not clear what I'm trying to do, I can go into more detail.
 > Thanks very much,
 > Toby

More information about the texhax mailing list