performance - Python: List vs Dict for look up table

ID : 20415

viewed : 14

Tags : pythonperformancepython

Top 5 Answer for performance - Python: List vs Dict for look up table

vote vote

95

Speed

Lookups in lists are O(n), lookups in dictionaries are amortized O(1), with regard to the number of items in the data structure. If you don't need to associate values, use sets.

Memory

Both dictionaries and sets use hashing and they use much more memory than only for object storage. According to A.M. Kuchling in Beautiful Code, the implementation tries to keep the hash 2/3 full, so you might waste quite some memory.

If you do not add new entries on the fly (which you do, based on your updated question), it might be worthwhile to sort the list and use binary search. This is O(log n), and is likely to be slower for strings, impossible for objects which do not have a natural ordering.

vote vote

80

A dict is a hash table, so it is really fast to find the keys. So between dict and list, dict would be faster. But if you don't have a value to associate, it is even better to use a set. It is a hash table, without the "table" part.


EDIT: for your new question, YES, a set would be better. Just create 2 sets, one for sequences ended in 1 and other for the sequences ended in 89. I have sucessfully solved this problem using sets.

vote vote

76

set() is exactly what you want. O(1) lookups, and smaller than a dict.

vote vote

63

I did some benchmarking and it turns out that dict is faster than both list and set for large data sets, running python 2.7.3 on an i7 CPU on linux:

  • python -mtimeit -s 'd=range(10**7)' '5*10**6 in d'

    10 loops, best of 3: 64.2 msec per loop

  • python -mtimeit -s 'd=dict.fromkeys(range(10**7))' '5*10**6 in d'

    10000000 loops, best of 3: 0.0759 usec per loop

  • python -mtimeit -s 'from sets import Set; d=Set(range(10**7))' '5*10**6 in d'

    1000000 loops, best of 3: 0.262 usec per loop

As you can see, dict is considerably faster than list and about 3 times faster than set. In some applications you might still want to choose set for the beauty of it, though. And if the data sets are really small (< 1000 elements) lists perform pretty well.

vote vote

60

You want a dict.

For (unsorted) lists in Python, the "in" operation requires O(n) time---not good when you have a large amount of data. A dict, on the other hand, is a hash table, so you can expect O(1) lookup time.

As others have noted, you might choose a set (a special type of dict) instead, if you only have keys rather than key/value pairs.

Related:

  • Python wiki: information on the time complexity of Python container operations.
  • SO: Python container operation time and memory complexities

Top 3 video Explaining performance - Python: List vs Dict for look up table

Related QUESTION?