python - How do you split a list into evenly sized chunks?

ID : 238

viewed : 84

Tags : pythonlistsplitchunkspython

Top 5 Answer for python - How do you split a list into evenly sized chunks?

vote vote

92

Here's a generator that yields the chunks you want:

def chunks(lst, n):     """Yield successive n-sized chunks from lst."""     for i in range(0, len(lst), n):         yield lst[i:i + n] 

import pprint pprint.pprint(list(chunks(range(10, 75), 10))) [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],  [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],  [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],  [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],  [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],  [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],  [70, 71, 72, 73, 74]] 

If you're using Python 2, you should use xrange() instead of range():

def chunks(lst, n):     """Yield successive n-sized chunks from lst."""     for i in xrange(0, len(lst), n):         yield lst[i:i + n] 

Also you can simply use list comprehension instead of writing a function, though it's a good idea to encapsulate operations like this in named functions so that your code is easier to understand. Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)] 

Python 2 version:

[lst[i:i + n] for i in xrange(0, len(lst), n)] 
vote vote

88

If you want something super simple:

def chunks(l, n):     n = max(1, n)     return (l[i:i+n] for i in range(0, len(l), n)) 

Use xrange() instead of range() in the case of Python 2.x

vote vote

76

I know this is kind of old but nobody yet mentioned numpy.array_split:

import numpy as np  lst = range(50) np.array_split(lst, 5) # [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), #  array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]), #  array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]), #  array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]), #  array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])] 
vote vote

67

Directly from the (old) Python documentation (recipes for itertools):

from itertools import izip, chain, repeat  def grouper(n, iterable, padvalue=None):     "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"     return izip(*[chain(iterable, repeat(padvalue, n-1))]*n) 

The current version, as suggested by J.F.Sebastian:

#from itertools import izip_longest as zip_longest # for Python 2.x from itertools import zip_longest # for Python 3.x #from six.moves import zip_longest # for both (uses the six compat library)  def grouper(n, iterable, padvalue=None):     "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"     return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue) 

I guess Guido's time machine works—worked—will work—will have worked—was working again.

These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.

vote vote

57

I'm surprised nobody has thought of using iter's two-argument form:

from itertools import islice  def chunk(it, size):     it = iter(it)     return iter(lambda: tuple(islice(it, size)), ()) 

Demo:

>>> list(chunk(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)] 

This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:

from itertools import islice, chain, repeat  def chunk_pad(it, size, padval=None):     it = chain(iter(it), repeat(padval))     return iter(lambda: tuple(islice(it, size)), (padval,) * size) 

Demo:

>>> list(chunk_pad(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)] >>> list(chunk_pad(range(14), 3, 'a')) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')] 

Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:

_no_padding = object()  def chunk(it, size, padval=_no_padding):     if padval == _no_padding:         it = iter(it)         sentinel = ()     else:         it = chain(iter(it), repeat(padval))         sentinel = (padval,) * size     return iter(lambda: tuple(islice(it, size)), sentinel) 

Demo:

>>> list(chunk(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)] >>> list(chunk(range(14), 3, None)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)] >>> list(chunk(range(14), 3, 'a')) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')] 

I believe this is the shortest chunker proposed that offers optional padding.

As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here's a final variation that works around that problem in a reasonable way:

_no_padding = object() def chunk(it, size, padval=_no_padding):     it = iter(it)     chunker = iter(lambda: tuple(islice(it, size)), ())     if padval == _no_padding:         yield from chunker     else:         for ch in chunker:             yield ch if len(ch) == size else ch + (padval,) * (size - len(ch)) 

Demo:

>>> list(chunk([1, 2, (), (), 5], 2)) [(1, 2), ((), ()), (5,)] >>> list(chunk([1, 2, None, None, 5], 2, None)) [(1, 2), (None, None), (5, None)] 

Top 3 video Explaining python - How do you split a list into evenly sized chunks?

Related QUESTION?