Python - How To Create a List With a Specific Size in Python

ID : 258

viewed : 112

Tags : PythonPython List

vote vote

99

Preallocating storage for lists or arrays is a typical pattern among programmers when they know the number of elements ahead of time.

Unlike C++ and Java, in Python, you have to initialize all of your pre-allocated storage with some values. Usually, developers use false values for that purpose, such as None, '', False, and 0.

Python offers several ways to create a list of a fixed size, each with different performance characteristics.

To compare performances of different approaches, we will use Python’s standard module . It provides a handy way to measure run times of small chunks of Python code.

Preallocate Storage for Lists

The first and fastest way to use the * operator, which repeats a list a specified number of times.

>>> [None] * 10 [None, None, None, None, None, None, None, None, None, None] 

A million iterations (default value of iterations in timeit) take approximately 117 ms.

>>> timeit("[None] * 10") 0.11655918900214601 

Another approach is to use the built-in function with a list comprehension.

>>> [None for _ in range(10)] [None, None, None, None, None, None, None, None, None, None] 

It’s almost six times slower and takes 612 ms second per million iterations.

>>> timeit("[None for _ in range(10)]") 0.6115895550028654 

The third approach is to use a simple for loop together with the .

>>> a = [] >>> for _ in range(10): ...   a.append(None) ... >>> a [None, None, None, None, None, None, None, None, None, None] 

Using loops is the slowest method and takes 842 ms to complete a million iterations.

>>> timeit("for _ in range(10): a.append(None)", setup="a=[]") 0.8420009529945673 

Preallocate Storage for Other Sequential Data Structures

Since you’re preallocating storage for a sequential data structure, it may make a lot of sense to use the built-in data structure instead of a list.

>>> from array import array >>> array('i',(0,)*10) array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 

As we see below, this approach is second fastest after [None] * 10.

>>> timeit("array('i',(0,)*10)", setup="from array import array") 0.4557597979946877 

Let’s compare the above pure Python approaches to the Python package for scientific computing.

>>> from numpy import empty >>> empty(10) array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) 

The NumPy way takes 589 ms per million iterations.

>>> timeit("empty(10)", setup="from numpy import empty") 0.5890094790011062 

However, the NumPy way will be much faster for more massive lists.

>>> timeit("[None]*10000") 16.059584009999526 >>> timeit("empty(10000)", setup="from numpy import empty") 1.1065983309963485 

The conclusion is that it’s best to stick to [None] * 10 for small lists, but switch to NumPy’s empty() when dealing with more massive sequential data.

  • Related HOW TO?