On the Complexity of Optimal Grammar-Based Compression
Jan Arpe and Rüdiger Reischuk
The task of grammar-based compression is to find a small context-free
grammar generating exactly one given string.
We investigate the relationship between grammar-based compression of
strings over unbounded and bounded alphabets.
Specifically, we show how to transform a grammar for a string over
an unbounded alphabet into a grammar for a block
coding of that string over a fixed bounded
alphabet and vice versa. From these constructions, we obtain
asymptotically
tight relationships between the minimum grammar sizes
for strings and their block codings. Finally, we exploit an
improved bound of our construction for
overlap-free block
codings to show that a polynomial time algorithm for
approximating the minimum grammar for
binary strings
within a factor of
yields a polynomial time
algorithm for
approximating the minimum grammar for strings over
arbitrary alphabets within a factor of
(for arbitrary
).
Since the latter problem is known to be
NP-hard to approximate within a factor of 8569/8568, we provide a
first step
towards solving the long standing open question whether
minimum grammar-based compression of
binary strings is
NP-complete.