[texhax] OT: NP-completeness [was Re: positioning of floats]

Sun Jan 4 19:04:34 CET 2009

This is way off-topic, but just for general interest...

Tom Schneider wrote:
> I suppose that if you know the sizes of the bits and the sizes of
> pages, you could write some smart code to look for solutions, but I
> suspect that since there are n! solutions the problem is exponential
> in n (ie NP-complete if I understand the notation).

Not quite. NP stands for "non-deterministic polynomial", and is the
class of problems that can be solved in polynomial time on a
non-deterministic Turing machine. An equivalent, and more intuitive, way
of defining NP is as the class of problems for which it is easy to
verify a solution, but it places no constraints on how difficult it
might be to *find* a solution. I.e. if I claim to have a solution to the
problem, you can check whether my solution is correct in polynomial
time, but there's no limit on how long it might take me to find that
solution in the first place.

A good example is factoring (ignoring technicalities to do with NP being
a decision class). If I claim to have found the factors of a large
integer, it's very easy to check whether they're correct: just multiply
them together and see whether you get the original integer back. But
it's an altogether different problem if I give you a large integer and
ask you to find its factors. (If you find a polynomial-time algorithm
for this, you'll be able to break all of the common public-key crypto
systems, such as the one used to encrypt credit card details when
shopping on the web.)

The class of problems which require exponential time to solve is called
EXP, whereas the class of problems which require polynomial time is
called P. It is not known whether P=NP, or NP=EXP, or neither. (Proving
any of these will win you a million dollar prize. What is known is that
P!=EXP.)

An NP-hard problem is a problem which at least as hard as any problem in
NP; solving an NP-hard problem would allow you to solve *every* NP
problem. Finally, an NP-complete problem is a problem that is NP-hard
and is also itself in NP. The canonical example of an NP-complete
problem is the famous 3SAT problem.

Just because there are many possible answers to search through doesn't
necessarily mean that a problem is hard. There are many examples of
problems for which the naive approach of checking every possibility
would take exponential time, but where there's a cleverer way of finding
the solution that only takes polynomial time (e.g. TeX's line-breaking
algorithm). However, the original problem of finding the optimal
arrangement of figures to minimize the number of pages is exactly the
bin packing problem (at least in the case of there being no text, only
figures). And the bin packing problem is known to be NP-hard. So, after
all that, you were right all along :)

Not that this means one should completely give up on solving it. There
are algorithms that produce good solutions to the bin packing problem,
even if not necessarily the optimal one. Implementing them would
probably require a major rewrite of TeX's page-breaking and
float-positioning algorithms, however.

Hope that clears things up somewhat,

Toby