[libvoikko] HFST backend performance observations

Thu Sep 29 20:43:19 EEST 2011

2011-09-29, Harri Pitkänen sanoi:

> To see if there is something that could be done to improve the speed
> of spell checking I ran a list of 1000 words through voikkospell with
> Finnish HFST backend and took a callgrind profile from the session.
> The call graph is here:
> 
>   http://www.puimula.org/htp/tmp/voikko-hfst-fi.pdf

My work Mac cannot seem to open it and remote xpdf apparently just
segfaults.

> Now we can use the graph to look for things that seem to have higher
> than expected cost and concentrate on fixing them. What immediately
> jumps out is the cost of "operator new" near the bottom of the graph.
> Its inclusive cost is 606 MI which is about half of the entire cost
> of the program. Based on this we can say that voikkospell/HFST is
> spending significant part of its running time doing memory
> management. Looking at the inward arrows we see that allocations are
> done both during the initialization and during spell checking.

That's certainly curious, I'm not very familiar with the low level
spelling code here either, but as far as I can grep, only place in
ospell library that uses new is at construction phase, where it
admittedly seems to be calling it at least arcs + nodes + alphabet size
times.

> - Initialization of course needs memory, but would it be possible to
> allocate it in larger chunks? I have not read the HFST code very
> closely but I would assume that many of the basic data structures
> could be allocated in larger arrays instead of doing 2.5 million
> individual allocations as it happens now. This might even save some
> memory.

Possibly the compression scheme and all contributes to the requirement
of construction state or arc at time or whatever.

> - There are about one million allocations made during spell checking.
> That is 1000 allocations for each word. This seems to be way too much
> since the only things for which memory needs to be allocated during
> lookup should be output strings and some sort of configuration
> history for backtracking (if depth-first search is used). I have not
> read the code yet to see how the lookup actually works. I have
> however written such mostly-allocation-free lookup code once so I
> know it is doable:
> 
>   http://www.puimula.org/htp/tmp/TransducerRunner.cpp

I don't see any memory allocations in the lookup code at first sight
though. But I don't know, maybe I'm missing something here.

> That file is part of a program I wrote for my master's thesis. As I
> understand the only difference to HFST is that we need to handle flag
> diacritics instead of the register operations used in my sample code
> but that hopefully should not be very large difference.

Well, I suppose it also is different from trivial transducer runner by
the fact that it runs two transducers in parallel, because the error
model is also just a transducer, and you cannot run them serially after
each other, because even for really simple things like edit distance 2
the intermediate results would be huge. I suppose the combination of
these two features will contribute for the memory usage somehow.

> I will now need to go offline for a week or so. But this starts to
> look very promising...

That's very nice to hear. I think there are odds and ends very
unpolished in the system still, for one I haven't really done anything
with memory use of the zipping and xmling part of the new code. Granted
it can only leak like few strings after reading xml headers and perhaps
some file names and whatnot.

-- 
Flammie, computer scientist bachelor, linguist master, free software
Finnish localiser, and more! <http://www.iki.fi/flammie/>