Last Updated: August 12, 2019
·
3.605K
· nvie

# Efficient sifting iterator for Python

Did you ever want to sift an iterator into two "sub" iterators? Now you can:

``````>>> even, odd = sift(xrange(10), lambda x: x % 2 == 0)
>>> list(even)
[0, 2, 4, 6, 8]
>>> list(odd)
[1, 3, 5, 7, 9]``````

Note that the input iterable is only lazily consumed:

``````>>> X = iter(xrange(10))
>>> even, odd = sift(X, lambda x: x % 2 == 0)
>>> X.next()
0
>>> X.next()
1
>>> X.next()
2
>>> list(even)
[4, 6, 8]
>>> list(odd)
[3, 5, 7, 9]``````

# A tiny warning

Beware that in order to consume one of the returned iterators fully, you might be consuming a big part of the the input iterable and keeping its values buffered in memory. It's best to consume the largest iterable first, whenever possible:

``````>>> small_ones, large_ones = sift(xrange(10000), lambda x: x < 9000)

# low memory peak (no buffering needed)
>>> list(small_ones)
[0, ..., 8999]
>>> list(large_ones)
[9000, ..., 9999]

# at its peak, 9000 items are buffered in memory
>>> list(large_ones)
[9000, ..., 9999]
>>> list(small_ones)
[0, ..., 8999]``````

The implementation: https://gist.github.com/nvie/5804375

337.4K
13

300.9K
24

### Emulate do-while loop in Python

253.5K
5

#### 2 Responses

Y U NO `partition` ?!

``````def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)``````

http://docs.python.org/dev/library/itertools.html#itertools-recipes

over 1 year ago ·

(1) "Once tee() has made a split, the original iterable should not be used anywhere else; otherwise, the iterable could get advanced without the tee objects being informed."

(2) tee() actually duplicates all elements from the input, making it much less memory-efficient as you need to consume both returned iterators relatively simultaneously, in order to prevent storing the complete input in memory. My version only keeps the minimum amount of input in memory.

``````>>> truthy, falsey = sift(xrange(10**6), lambda x: True)
>>> for x in truthy: pass  # as long as you consume `truthy` first, this costs no memory at all``````

The tee() based version keeps the full input in memory until falsey is consumed, too.

over 1 year ago ·

#### Have a fresh tip? Share with Coderwall community!

Best #Python Authors
terrasea
335.8K
cheglastgnat
299.8K
saji89
253.4K
lotia
205.9K
Awesome Job