Free Pascal hash maps

This is an exhausive benchmark of string-key based (hash)maps available for Free Pascal.

Maps to compare:
built-in:
contnrs. TFPHashList TFPDataHashTable
ghashmap. THashMap THashMap.shortstring
gmap. TMap TMap.shortstring
fgl. TFPGMap (sorted)
classes. TStringList (sorted) TStringList (CompareStr)
lazfglhash. TLazFPGHashTable
laz2 xmlutils. THashTable
rtl-generic's linear (TDictionary) quadratic double cuckoo2 cuckoo4 cuckoo6
linear.shortstring quadratic.shortstring double.shortstring cuckoo2.shortstring cuckoo4.shortstring cuckoo6.shortstring
3rd party:
Bero's TFLRECacheHashMap TPasMPHashTable
Yamer's TGenHashMap TGenHashMap.shortstring
Barry Kelly's THashList (fixed size)
CL4L's TStrHashMap (fixed size)
fundamental's TPointerDictionaryA
marcov's lightcontainers generic lightcontainers
hovadur's DeCAL
JUHA's StringHashMap
BeniBela's HAMT mutable HAMT immutable TXQHashmapStr
terrylao's CodaMinaHashMap
avk959's GHashMapLP GHashMapLPT GHashMapQP GChainHashMap GOrderedHashMap GThreadHashMapFG GLiteHashMapLP GLiteChainHashMap

non-Pascal:
libstdc++'s std::unordered map

Datasets: Dictionary words (~16 characters) Short random keys (8 bytes) Long random keys (200 bytes)

Model: ^{(writes,succeeding lookups,failed lookups)} Only writes ^(1,0,0) Balanced ^(1,3,3) Many Reads ^(1,20,2) Many Failed Lookups ^(1,2,20) Custom prediction: , ,

Metric: absolute time (ms) absolute memory (bytes) keys / time keys / memory time, memory memory, time time / memory memory / time time / keys memory / keys
bubbles: absolute time (abs. memory) abs. memory (abs. time) keys/time (keys/memory) memory/ keys (keys/time) keys/time (memory/keys) memory/keys (time/keys)
ranking: time memory

x-axis: linear logarithmic y-axis: linear logarithmic keys limit: to

General observations:

Good built-in maps:

For small keys, TFPHashList is the fastest map of all maps provided by FreePascal. However, for longer keys or dozens of lookups the performances degrades drastically. It uses shortstrings, so there is a hard-limit of 256 byte keys (which can also explain the lookup slowdown, as the benchmark uses ansistrings), and it seems to copy all strings, as its memory usage is very low for small keys, but also increases drastically with the key length.
The extreme performance drop at higher key counts might occur, because the computer ran out of memory and used the swap file.
The maps of the rtl-generic.collections package can be classified in two groups: non-cuckoo maps (linear, quadratic, double hashing) and cuckoo maps.
- With the parameters actually used in this benchmark, the non-cuckoo maps are always faster than the cuckoo map or other maps. Slightly slower than TFPHashList for small keys, but faster for long keys. The linear map appears to be slightly faster than the double hashing one (at least when the keys are truly random). The quadratic map is often faster than the linear map, but it was broken in older fpc versions, so it should only be used with fpc 3.2 or newer. The linear map is also called TDictionary.
- When the number of reads increases to a few dozen queries/key, the cuckoo maps will be faster than the non-cuckoo maps (shown by the difference between writing and reading speed. Cuckoo-2 is predicted to become faster than any other built-in map). They also use less memory and especially cuckoo-6 is the most memory efficient hash map of them all, but also the slowest. Cuckoo-2 is the fastest cuckoo-map, but uses nearly the same memory as the non-cuckoo maps. Cuckoo-4 is in-between.

Good 3rd-party maps:

Generally, all 3rd-party maps perform well, but some are especially notable as they are faster than the fpc provided maps in most situations:

avk959's LGenerics are overall the fastest maps in the benchmark. For insertions only, GLiteChainHashMap is the fastest, followed by GOrderedHashMap and GChainHashMap. However, for lookups there is a turning point at around 100 000 keys, where the GHashMapLP, GHashMapLPT, GHashMapQP and GLiteHashMapLP maps become the fastest maps. They are faster than rtl generics; and faster than TFPHashList, except in the case of only writing short keys. Unfortunately, these maps require fpc 3.2.
Yamer's TGenHashMap in the gcontnrs package is fast and memory efficient. It is the fastest map for lookups of short and dictionary keys. It faster than the rtl generics maps for short keys, and in-between for long keys. Similarly, its memory usage is between the best and worst maps of rtl generics. For short keys, it behaves similar to the maps of avk959 with better memory usage; for long keys, it is slower.
Bero's FLRE cache map is an internally used map in the FLRE-package, so it is not the easiest map to use. It is similar to TFPHashList, avk959's LGenerics and Yamer's map for short keys, and used to be the fastest map for long keys in older versions of the benchmark. However, it has slightly higher memory usage. A closer examination of its code shows that it uses a very fast hash function, so the performance might come from the hash function not the map construction. It does not handle deletions well and keeps memory unfreed till the next resize. Currently it keeps items in insertion-order, but there is no guarantee of that.
BeniBela's Hashmap is my own map used in the internet tools. It is just a modification of Bero's FLRE cache map to support generic types, so it behaves the same. However, it does not use Bero's fast hashfunction, so it is worse on long keys. On the other hand, it uses less memory.

Other built-in maps:

The array based TStringList and TFPGMap are the most memory efficient, using less memory than even the cuckoo maps. However, as they are not hash maps, their usage is extremely slow, even if the maps are sorted, so they are nearly useless as maps. Their default setting of unsorted and (in case of TStringList) case-insensitive matching are quite dangerous.
ghashmap is as fast as the rtl-generics maps for short keys, but similar to TFPHashList its performance degrades with longer keys. Unlike TFPHashList it is not the fastest before the slow-down afterwards, so it becomes one of the slowest. It also has a high memory usage.
gmap.TMap is a slow map, because it is implemented as tree.
TFPDataHashTable (as well as TLazFPGHashTable) is an interesting map. At first it appears very fast, but with more elements it becomes very slow. The explanation is that it has a high-default capacity of around 200 000 items and does not grow. Colliding items are stored in a tree-structure, which becomes inefficient as soon as the initial capacity is exceeded.
laz2_xmlutils.THashTable is also very peculiar. It begins slowly; but around 100 to 10 000 items, it is the fastest built-in map; then it becomes slower than TFPHashList; but, for more than 100 000 items, it is again often the fastest built-in map. However, there is a big difference between read and write speed, so it is predicted to become one of the slowest map for a lot of lookups. Generally, it behaves similarly to TFPHashList, although with slower insertion speed, but faster lookup speed. It also has high memory usage.
Its most important feature is that it supports lookups of pchars, e.g. to lookup individual words of a sentence, without making copies of each word.

How to choose a hashmap

This benchmark can be used to select the ideal map for a specific use case:

Enter the minimal and maximal count of expected items at "keys limit".
Select the model "custom prediction" and enter how often each item will be inserted, read, and not found.
Select the dataset that corresponds to your key length.
Select the maps you consider using.

Then you can choose a metric:

The "ranking" metrics might be the easiest metric to understand.
The rankings for time or memory sort all selected maps according to their time and memory performance. Then the best map is shown at the top of the list.
The ranking is only calculated for the upper limit entered in the "keys limit" field.
The bubble metric "keys/time (memory/keys)" shows speed and memory usage for varying key counts.
The fastest map will be shown at the top of the plot. The most memory efficient map has the smallest bubbles (only the bubbles in the same column can be compared to each other).
You can easily see how the performance of the maps changes when the key count increases. Hover the mouse over a bubble to see the exact performance stats in a tooltip.
The metric "time, memory" shows how time and memory performance relate to each other.
The points of fastest map will be on the left side, the maps with the lowest memory usage will be at the bottom.
Therefore, if you think of the plot as split into quarters, the bottom left will have optimal maps, the top left will have fast maps with huge memory usage, the bottom right will have memory efficient slow maps, and the top right will have useless maps.
This might be the most important metric, but also the hardest to understand. If you set minimal and maximal keys limit to the same value, each map is shown as single point, which makes it easier to see which map is the fastest and which one is the most lightweight.

If this page is too slow, you can uncheck all checkboxes in a category, then no plots are shown and selections are instantanously. You can also disable maps by clicking their name on the legend without affecting other plots. Some maps have no datapoints for high key count, then only points with low key counts are shown and the map might appear more efficient than it is in the scatter plot. Move the mouse over the data points to check, the tooltip shows the actual number of keys, time and memory usage.

This benchmark was compiled with freepascal 3.3.1 (r44777), -O4, and run on a 64-bit Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz openSUSE 12.2 system. Older versions: fpc 3.3.1 (r44126), -O3 fpc 3.1.1 (r35800), fpc 3.1.1 (r34557)