[libvoikko] HFST backend performance observations

Sam Hardwick sam.hardwick at gmail.com
Fri Sep 30 10:25:17 EEST 2011


I'm resending this message because it appears not to have gotten through 
- apologies if it ends up being a duplicate.

On 09/29/2011 07:16 PM, Harri Pitkänen wrote:
> - Initialization of course needs memory, but would it be possible to
> allocate it in larger chunks? I have not read the HFST code very closely
> but I would assume that many of the basic data structures could be
> allocated in larger arrays instead of doing 2.5 million individual
> allocations as it happens now. This might even save some memory.

The memory allocations are not really due to initialization, but the way 
states are handled. The speller is always in a triple (error-state, 
lexicon-state, lexicon-flag-state), and these states are generated and 
placed on a queue. When they're processed, they get removed. This causes 
a certain amount of allocating and deallocating small amounts of memory.

This is likely to be a bottleneck, and would (I suppose) be remedied by 
writing our own memory handling for this process.

Sam Hardwick



More information about the Libvoikko mailing list