How does asyncio work?
https://stackoverflow.com/questions/49005651/how-does-asyncio-actually-work/51116910#51116910Before answering this question we need to understand a few base terms, skip these if you already know any of them.
Generators
Generators are objects that allow us to suspend the execution of a python function. User curated generators are implement using the keywordyield
. By creating a normal function containing the yield
keyword, we turn that function into a generator:>>> def test():
... yield 1
... yield 2
...
>>> gen = test()
>>> next(gen)
1
>>> next(gen)
2
>>> next(gen)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
As you can see, calling next()
on the generator causes the interpreter to load test's frame, and return the yield
ed value. Calling next()
again, cause the frame to load again into the interpreter stack, and continue on yield
ing another value.By the third time
next()
is called, our generator was finished, and StopIteration
was thrown.Communicating with a generator
A less-known feature of generators, is the fact that you can communicate with them using two methods:send()
and throw()
.>>> def test():
... val = yield 1
... print(val)
... yield 2
... yield 3
...
>>> gen = test()
>>> next(gen)
1
>>> gen.send("abc")
abc
2
>>> gen.throw(Exception())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in test
Exception
Upon calling gen.send()
, the value is passed as a return value from the yield
keyword.gen.throw()
on the other hand, allows throwing Exceptions inside generators, with the exception raised at the same spot yield
was called.Returning values from generators
Returning a value from a generator, results in the value being put inside theStopIteration
exception. We can later on recover the value from the exception and use it to our need.>>> def test():
... yield 1
... return "abc"
...
>>> gen = test()
>>> next(gen)
1
>>> try:
... next(gen)
... except StopIteration as exc:
... print(exc.value)
...
abc
Behold, a new keyword: yield from
Python 3.4 came with the addition of a new keyword: yield from
. What that keyword allows us to do, is pass on any next()
, send()
and throw()
into an inner-most nested generator. If the inner generator returns a value, it is also the return value of yield from
:>>> def inner():
... print('inner', (yield 2))
... return 3
...
>>> def outer():
... yield 1
... val = yield from inner()
... print('outer', val)
... yield 4
...
>>> gen = outer()
>>> next(gen)
1
>>> next(gen)
2
>>> gen.send("abc")
inner abc
outer 3
4
Putting it all together
Upon introducing the new keywordyield from
in Python
3.4, we were now able to create generators inside generators that just
like a tunnel, pass the data back and forth from the inner-most to the
outer-most generators. This has spawned a new meaning for generators - coroutines.Coroutines are functions that can be stopped and resumed while being run. In Python, they are defined using the
async def
keyword. Much like generators, they too use their own form of yield from
which is await
. Before async
and await
were introduced in Python 3.5, we created coroutines in the exact same way generators were created (with yield from
instead of await
).async def inner():
return 1
async def outer():
await inner()
Like every iterator or generator that implement the __iter__()
method, coroutines implement __await__()
which allows them to continue on every time await coro
is called.There's a nice sequence diagram inside the Python docs that you should check out.
In asyncio, apart from coroutine functions, we have 2 important objects: tasks and futures.
Futures
Futures are objects that have the__await__()
method implemented, and their job is to hold a certain state and result. The state can be one of the following:- PENDING - future does not have any result or exception set.
- CANCELLED - future was cancelled using
fut.cancel()
- FINISHED - future was finished, either by a result set using
fut.set_result()
or by an exception set usingfut.set_exception()
Another important feature of
future
objects, is that they contain a method called add_done_callback()
. This method allows functions to be called as soon as the task is done - whether it raised an exception or finished.Tasks
Task objects are special futures, which wrap around coroutines, and communicate with the inner-most and outer-most coroutines. Every time a coroutineawait
s a future, the future is passed all the way back to the task (just like in yield from
), and the task receives it.Next, the task binds itself to the future. It does so by calling
add_done_callback()
on the future. From now on, if the future will ever be done, by either
being cancelled, passed an exception or passed a Python object as a
result, the task's callback will be called, and it will rise back up to
existence.Asyncio
The final burning question we must answer is - how is the IO implemented?Deep inside asyncio, we have an event loop. An event loop of tasks. The event loop's job is to call tasks every time they are ready and coordinate all that effort into one single working machine.
The IO part of the event loop is built upon a single crucial function called
select
.
Select is a blocking function, implemented by the operating system
underneath, that allows waiting on sockets for incoming or outgoing
data. Upon data being received it wakes up, and returns the sockets
which received data, or the sockets whom are ready for writing.When you try to receive or send data over a socket through asyncio, what actually happens below is that the socket is first checked if it has any data that can be immediately read or sent. If it's
.send()
buffer is full, or the .recv()
buffer is empty, the socket is registered to the select
function (by simply adding it to one of the lists, rlist
for recv
and wlist
for send
) and the appropriate function await
s a newly created future
object, tied to that socket.When all available tasks are waiting for futures, the event loop calls
select
and waits. When the one of the sockets has incoming data, or it's send
buffer drained up, asyncio checks for the future object tied to that socket, and sets it to done.Now all the magic happens. The future is set to done, the task that added itself before with
add_done_callback()
rises up back to life, and calls .send()
on the coroutine which resumes the inner-most coroutine (because of the await
chain) and you read the newly received data from a nearby buffer it was spilled unto.Method chain again, in case of
recv()
:select.select
waits.- A ready socket, with data is returned.
- Data from the socket is moved into a buffer.
future.set_result()
is called.- Task that added itself with
add_done_callback()
is now woken up. - Task calls
.send()
on the coroutine which goes all the way into the inner-most coroutine and wakes it up. - Data is being read from the buffer and returned to our humble user.
yield from
capabilities that allow passing data back and forth from the inner-most
generator to the outer-most. It uses all of those in order to halt
function execution while it's waiting for IO to complete (by using the
OS select
function).And the best of all? While one function is paused, another may run and interleave with the delicate fabric, which is asyncio.