A C++ implementation of a fast and space-efficient algorithm converting the LZ77 parsing into a straight-line grammar. The algorithm is based on AVL grammars introduced by Rytter [TCS, 2003]. Our main improvements are: delayed ("lazy") merging of nonterminals, and utilizing Karp-Rabin fingerprints.
For more detailed description of the algorithm and reports of experimental evaluation, refer to the following paper.
@inproceedings{lazyavl,
author = {Dominik Kempa and Ben Langmead},
title = {Fast and Space-Efficient Construction of {AVL}
Grammars from the {LZ77} Parsing},
booktitle = {29th Annual European Symposium on Algorithms
(ESA 2021)},
pages = {56:1--56:14},
year = {2021},
doi = {10.4230/LIPIcs.ESA.2021.56},
}
The above paper is available on arXiv at https://arxiv.org/abs/2105.11052.
The latest version of this code is available from https://github.com/dominikkempa/lz77-to-slp.
Lazy-AVLG is compiled by simply typing make in the directory
containing this README. This will build the executable called
lz_to_grammar. The simplest usage of Lazy-AVLG is as follows.
Suppose the input LZ77 parsing (which can be computed using the
program in tools/text_to_lz) is located in /data/input.txt.lz77.
Then, to compute the lazy AVL grammar of input.txt, type:
$ ./lz_to_grammar /data/input.txt.lz77
This will write the output grammar to /data/input.txt.lz77.slg. For
details of the encoding of the grammar, see the function
write_to_file in include/lazy_avl_grammar/lazy_avl_grammar.hpp.
If you use this code, please cite the paper mentioned above. Lazy-AVLG is released under the MIT/X11 license. See the file LICENCE for more details.