Iterating in fixed-size chunks in Python
Here’s a fairly common problem I have: I have an iterable, and I want to go through it in “chunks”. Rather than looking at every item of the sequence one-by-one, I want to process multiple elements at once.
For example, when I’m using the bulk APIs in Elasticsearch, I can index many document with a single API call, which is more efficient than making a new API call for every document.
Here’s the sort of output I want:
for c in chunked_iterable(range(14), size=4):
print(c)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13)
I have two requirements which are often missed in Stack Overflow answers or other snippets I’ve found:
-
It has to work with generators, where you don’t know the length upfront, and you can’t slice to a particular point in the generator. e.g. iterating over files in a directory
-
I don’t want “filler” values at the end – if it doesn’t line up neatly on a boundary, I’d rather have a truncated chunk than extra values.
So to save me having to find it again, this is what I usually use:
import itertools
def chunked_iterable(iterable, size):
it = iter(iterable)
while True:
chunk = tuple(itertools.islice(it, size))
if not chunk:
break
yield chunk
Most of the heavy lifting is done by itertools.islice(); I call that repeatedly until it returns an empty sequence. The itertools module has lots of useful functions for this sort of thing.
The it = iter(iterable)
line may be non-obvious – this ensures that the value it
is using the same iterator throughout. If you pass certain fixed iterables to islice(), it creates a new iterator each time – and then you only ever get the first handful of elements.
For example, trying to call chunked_iterable([1, 2, 3, 4, 5], size=2)
without this line would emit [1, 2]
forever.
I think it’s the difference between a container (for which iter(…)
returns a new object each time) and an iterator (for which iter(…)
returns itself). I forget the exact details, but I remember first reading about this in Brett Slatkin’s book Effective Python.