Python — Yield, Iterator and Generator Introduction

Python — Yield, Iterator and Generator Introduction

Occasionally I've run into situations of confusion on the exact differences between the following related concepts in Python:

  • a container
  • an iterable
  • an iterator
  • a generator
  • a generator expression
  • a {list, set, dict} comprehension

I'm writing this post as a pocket reference for later.

Python Containers

Containers are data structures holding elements, and that support membership tests. They are data structures that live in memory, and typically hold all their values in memory, too. In Python, some well known examples are:

  • list, deque, …
  • set, frozensets, …
  • dict, defaultdict, OrderedDict, Counter, …
  • tuple, namedtuple, …
  • str

Containers are easy to grasp, because you can think of them as real life containers: a box, a cubboard, a house, a ship, etc.

Python Iterables

An iterable is an object that can return an iterator (to return all of its elements), for example:

mylist = ['a', 'b', 'c']
for i in mylist:
     print(i)

Just like list, objects that can be iterated using a for loop are iterable objects.

How Do Iterables Work?

Let's see how the Python interpreter handles iteration when it encounters iteration operations, such as for ... in x

  • It calls iter(x) function
  • It checks whether the object implements the __iter__ method, and if it is implemented, call it to obtain an iterator
  • If the __iter__ method is not implemented, but the __getitem__ method is implemented, Python will create an iterator and try to get the elements in order (starting at index 0)
  • If both methods are not implemented, a TypeError exception will be thrown, indicating that the object cannot be iterated

Therefore objects with __iter__() methods or __getitem__() methods are usually called iterable objects.

Let's check the mylist example:

print(dir(mylist))
# output:
[... '__getitem__', ... '__iter__', ...]

We can see mylist objects do have __iter__ and __getitem__ implemented. If you want to check whether an object is iterable, you can use the Iterable type from collections library.

from collections.abc import Iterable
print(isinstance(mylist, Iterable))

# output:
True

Python Iterator

An iterator is an object that contains a countable number of values. It can be iterated upon, meaning that you can traverse through all the values. Let's take a look at one iterator example:

for i in range(5):
    print(i)

# output:
0
1
2
3
4

Just like this, the process of printing elements one by one is iterative, and this process is also the operation we touch the most in daily code writing.

So lists, strings, etc.. that mentioned above are not iterators. However, you can use Python's built-in iter() function to obtain their iterator objects. Let us use the iterator pattern to rewrite the previous example:

mylist = [1,2,3]
it = iter(mylist)
while True:
    try:
        print(next(it))
    except StopIteration:
        print("Stop iteration!")
        break

# output:
1
2
3
Stop iteration!

In the above code, we first use the iterable object to construct the iterator it, and continuously call the next() function on the iterator to get the next element. If there are no characters, the iterator will throw a StopIteration exception and exit the loop.

Python Generator

Python provides a generator to create your iterator function. A generator is a special type of function which does not return a single value, instead, it returns an iterator object with a sequence of values. In a generator function, a yield statement is used rather than a return statement.

Now we already know the mechanism behind the for loop, but if the amount of data is too large, such as for i in range(1000000), using the for loop to store all values in memory not only takes up a lot of storage space but also if we only need to access the first few elements, the space is wasted. In this case, we can use generator.

The idea of the python generator is that we don't need to create this list all at once, we just need to remember its creation rules, and then when we need to use it, we will calculate and create it again and again. Let's take a look at one example:

my_generator = (x*x for x in range(10))
for i in my_generator:
    print(i)

Output:
0
1
4
9
16
25
36
49
64
81

But we can only execute for i in my_generator once, so even if you have two for loops, the output will be the same.

Python yield

In Python, yield is used to return from a function without destroying its variables. In a sense, yield pauses the execution of the function. When the function is invoked again, the execution continues from the last yield statement.

Here is an illustration of functions vs. generators:

A generator returns a generator object, also known as an iterator, that generates one value at a time. It does not store any values. This makes a generator memory-efficient. For example, you can use a generator to loop through a big group of numbers without storing any of them in memory.

Let's take a look at the following code snippet:

def test():
    print("First")
    yield 1
    print("Second")
    yield 2
    print("Third")
    yield 3

my_generator = test()
print(type(my_generator))

# output:
<class 'generator'>

Unlike normal functions, after the generator function is called, the code in the function body is not executed immediately (no value is printed after executing my_generator= test()), but a generator is returned! As we mentioned earlier, generator is iterator and yield can be treated as return , it is not hard to guess:

for item in my_generator:
    print(item)

# output:
First
1
Second
2
Third
3

What happens if we call next()?

next(my_generator)
next(my_generator)
next(my_generator)
next(my_generator)

# output:
First
Second
Third
Traceback (most recent call last):
  File "<string>", line 15, in <module>
StopIteration

Every time next(my_generator) is called, it only runs to the yield position and stops, and the next time it runs, it starts from the position where it ended last time! And the length of the generator depends on the number of times the yield is defined in the function.

How To Turn a Function Into a Generator

To turn a function into a generator, yield a value instead of returning it. This makes the function return a generator object. This is an iterator you can loop through like a list.

Let's create a square() function that squares an input list of numbers:

def square(numbers):
    result = []
    for n in numbers:
        result.append(n ** 2)
    return result
    
numbers = [1, 2, 3, 4, 5]
squared_numbers = square(numbers)

print(squared_numbers)

# output:
[1, 4, 9, 16, 25]

Let's turn this function into a generator. Instead of storing the squared numbers into a list, you can yield values one at a time without storing them:

def square(numbers):
    for n in numbers:
        yield n ** 2

numbers = [1, 2, 3, 4, 5]
squared_numbers = square(numbers)

print(squared_numbers)

# output
<generator object square at 0x7f621175b510>

Now you no longer get the list of squared numbers. This is because the result squared_numbers is a generator object.

Use the next() function to get the values

A generator object doesn't hold numbers in memory. Instead, it computes and yields one result at a time. It does this only when you ask for the next value using the next() function.

print(next(squared_numbers))

# output:
1

Let's make it compute the rest of the numbers by calling next() four more times:

print(next(squared_numbers))
print(next(squared_numbers))
print(next(squared_numbers))
print(next(squared_numbers))

# output:
4
9
16
25

Now the generator has squared all the numbers. If you call next() one more time:

print(next(squared_numbers))

# An error occurs:
Traceback (most recent call last):
  File "<string>", line 13, in <module>
StopIteration

This error lets you know there are no more numbers to be squared. In other words, the generator is exhausted. Now you understand how a generator works and how to make it compute values.

Do not use the next() function

Using the next() function demonstrates well how generators work. In reality, you don't need to call the next() function. Instead, you can use a for loop with the same syntax you would use if you looped through a list (the for loop calls the next() function under the hood for you).

For instance, let's repeat the generator example using a for loop:

def square(numbers):
    for n in numbers:
        yield n ** 2

numbers = [1, 2, 3, 4, 5]
squared_numbers = square(numbers)

for n in squared_numbers:
    print(n)

# output:
1
4
9
16
25

Generators vs. Lists—Runtime Comparison

Let's perform a runtime comparison between generators and functions. In this example, there's a list of ten numbers and two functions:

  • A data_list() function randomly selects a number from the list n times.
  • A data_generator() generator function also randomly selects a number from the list n times.

This code compares the runtimes of using these functions to construct a list of 1 million randomly selected numbers:

import random
import timeit
from math import floor

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

def data_list(n):
    result = []
    for i in range(n):
        result.append(random.choice(numbers))
    return result

def data_generator(n):
    for i in range(n):
        yield random.choice(numbers)

t_list_start = timeit.default_timer()
rand_list = data_list(1_000_000)
t_list_end = timeit.default_timer()

t_gen_start = timeit.default_timer()
rand_gen = data_generator(1_000_000)
t_gen_end = timeit.default_timer()

t_gen = t_gen_end - t_gen_start
t_list = t_list_end - t_list_start

print(f"List creation took {t_list} Seconds")
print(f"Generator creation took {t_gen} Seconds")

print(f"The generator is {floor(t_list / t_gen)} times faster")

# output:
List creation took 0.6045370370011369 Seconds
Generator creation took  3.48799949279055e-06 Seconds
The generator is 173319 times faster

This shows how a generator is way faster to create. This is because when you create a list, all the numbers have to be stored in memory. But when you use a generator, the numbers aren't stored anywhere, so it's lightning-fast.

Summary

Python Generators are an incredible powerful programming construct. They allow you to write streaming code with fewer intermediate variables and data structures. Besides that, they are more memory and CPU efficient. Finally, they tend to require fewer lines of code, too.

SUBSCRIBE FOR NEW ARTICLES

@
comments powered by Disqus