Skip to main content Link Search Menu Expand Document (external link)

APPENDIX

Table of contents

  1. Hardware
    1. Caveman Computers
    2. Electricity
    3. Inventions
    4. An Idealized Computer
    5. The CPU
    6. Memory and Caches
    7. Storage
    8. Inputs
    9. Outputs
    10. Relative Access Times
  2. Software
    1. In the Beginning Was the Bit
    2. Machine Language
    3. Assembler
    4. Higher-Level Languages
    5. Operating Systems
    6. Virtual Machines
    7. Containers
    8. Distributed Computing and Networks
    9. The Cloud
    10. Kubernetes
  3. APPENDIX B: Install Python 3
    1. Check Your Python Version
    2. Install Standard Python
      1. macOS
      2. Windows
      3. Linux or Unix
    3. Install the pip Package Manager
    4. Install virtualenv
    5. Other Packaging Solutions
    6. Install Anaconda
      1. Install Anaconda’s Package Manager conda
    7. Coroutines and Event Loops
    8. Asyncio Alternatives
    9. Async Versus…
    10. Async Frameworks and Servers
    11. Operator Precedence
    12. String Methods
      1. Change Case
      2. Search
      3. Modify
      4. Format
      5. String Type
    13. String Module Attributes
    14. Coda

APPENDIX A

Hardware and Software for

Beginning Programmers

Some things make intuitive sense. Some we see in nature, and others are human

inventions such as the wheel or pizza.

Others require more of a leap of faith. How does a television convert some invisible

wiggles in the air into sounds and moving images?

A computer is one of these hard-to-accept ideas. How can you type something and

get a machine to do what you want?

When I was learning to program, it was hard to find answers to some basic questions.

For example: some books explain computer memory with the analogy of books on a

library shelf. I wondered, if you read from memory, the analogy implies you’re taking

a book from the shelf. So, does that erase it from memory? Actually, no. It’s more like

getting a copy of the book from the shelf.

This appendix is a short review of computer hardware and software, if you’re rela‐

tively new to programming. I try to explain the things that become “obvious” eventu‐

ally but may be sticking points at the start.

Hardware

Caveman Computers

When the cavemen Og and Thog returned from hunting, they would each add a rock

to their own pile for each mammoth they slew. But they couldn’t do much with the

piles, other than gain bragging rights if one was noticeably larger than the other.

Distant descendents of Og (Thog got stomped by a mammoth one day, trying to add

to his pile) would learn to count, and write, and use an abacus. But some leaps of

imagination and technology were needed to get beyond these tools to the concept of a

computer. The first necessary technology was electricity.

Electricity

Ben Franklin thought that electricity was a flow of some invisible fluid from a place

with more fluid (positive) to a place with less (negative). He was right, but got the

terms backwards. Electrons flow from his “negative” to “positive,” but electrons

weren’t discovered until much later—too late to change the terminology. So, ever

since we’ve needed to remember that electrons flow one way and current is defined as

flowing the other way.

We’re all familiar with natural electrical phenomena like static electricity and light‐

ning. After people discovered how to push electrons through conducting wires to

make electrical circuits, we got one step closer to making computers.

I used to think that electric current in a wire was caused by jazzed electrons doing

laps around the track. It’s actually quite different. Electrons jump from one atom to

another. They behave a little like ball bearings in a tube (or tapioca balls in a bubble

tea straw). When you push a ball at one end, it pushes its neighbor, and so on until

the ball at the other end is pushed out. Although an average electron moves slowly

(drift speed in a wire is only about three inches/hour), this almost-simultaneous

bumping causes the generated electromagetic wave to propagate very quickly: 50 to

99% the speed of light, depending on the conductor.

Inventions

We still needed:

  • A way to remember things
  • A way to do stuff with the things that we remembered

One memory concept was a switch: something that’s either on or off, and stays as it is

until something flips it to the other state. An electrical switch works by opening or

closing a circuit, allowing electrons to flow or blocking them. We use switches all the

time to control lights and other electrical devices. What was needed was a way to con‐

trol the switch itself by electricity.

The earliest computers (and televisions) used vacuum tubes for this purpose, but

these were big and often burned out. The single key invention that led to modern

computers was the transistor: smaller, more efficient, and more reliable. The final key

step was to make transistors much smaller and connect them in integrated circuits.

For many years, computers got faster and ridiculously cheaper as they became smaller

and smaller. Signals move faster when the components are closer together.

But there’s a limit to how small we can stuff things together. This electron friskiness

encounters resistance, which generates heat. We reached that lower limit more than

10 years ago, and manufacturers have compensated by putting multiple “chips” on the

same board. This has increased the demand for distributed computing, which I discuss

in a bit.

Regardless of these details, with these inventions we have been able to construct com‐

puters: machines that can remember things and so something with them.

An Idealized Computer

Real computers have lots of complex features. Let’s focus on the essential parts.

A circuit “board” contains the CPU, memory, and wires connecting them to each

other and to plugs for external devices.

The CPU

The CPU (Central Processing Unit), or “chip,” does the actual “computing”:

  • Mathematical tasks like addition
  • Comparing values

Memory and Caches

RAM (Random Access Memory) does the “remembering.” It’s fast, but volatile (loses

its data if power is lost).

CPUs have been getting ever faster than memory, so computer designers have been

adding caches: smaller, faster memory between the CPU and main memory. When

your CPU tries to read some bytes from memory, it first tries the closest cache (called

an L1 cache), then the next (L2), and eventually to main RAM.

Storage

Because main memory loses its data, we also need nonvolatile storage. Such devices

are cheaper than memory and hold much more data, but are also much slower.

The traditional storage method has been “spinning rust”: magnetic disks (or hard

drives or HDD) with movable read-write heads, a little like vinyl records and a stylus.

A hybrid technology called SSD (Solid State Drive) is made of semiconductors like

RAM, but is nonvolatile like magnetic disks. Price and speed falls between the two.

Hardware and Software for Beginning Programmers | 503

Inputs

How do you get data into the computer? For people, the main choices are keyboards,

mice, and touchscreens.

Outputs

People generally see computer output with displays and printers.

Relative Access Times

The amount of time it takes to get data to and from any of these components varies

tremendously. This has big practical implications, For example, software needs to run

in memory and access data there, but it also needs to store data safely on nonvolatile

devices like disks. The problem is that disks are thousands of times slower, and net‐

works are even slower. This means that programmers spend a lot of time trying to

make the best trade-offs between speed and cost.

In Computer Latency at a Human Scale, David Jeppesen compares them. I’ve derived

Table A-1 from his numbers and others. The last columns—Ratio, Relative Time

(CPU = one second) and Relative Distance (CPU = one inch)—are easier for us to

relate to than the specific timings.

Table A-1. Relative access times

Location Time Ratio Relative Time Relative Distance
CPU 0.4 ns 1 1 sec 1 in
L1 cache 0.9 ns 2 2 sec 2 in
L2 cache 2.8 ns 7 7 sec 7 in
L3 cache 28 ns 70 1 min 6 ft
RAM 100 ns 250 4 min 20 ft
SSD 100 μs 250,000 3 days 4 miles
Mag disk 10 ms 25,000,000 9 months 400 miles
Internet: SF→NY 65 ms 162,500,000 5 years 2,500 miles

It’s a good thing that a CPU instruction actually takes less than a nanosecond instead

of a whole second, or else you could have a baby in the time it takes to access a mag‐

netic disk. Because disk and network times are so much slower than CPU and RAM,

it helps to do as much work in memory as you can. And since the CPU itself is so

much faster than RAM, it makes sense to keep data contiguous, so the bytes can be

served by the faster (but smaller) caches closer to the CPU.

Software

Given all this computer hardware, how would we control it? First, we have both

instructions (stuff that tells the CPU what to do) and data (inputs and outputs for the

instructions). In the stored-program computer, everything could be treated as data,

which simplified the design. But how do you represent instructions and data? What is

it that you save in one place and process in another? The far-flung descendants of

caveman Og wanted to know.

In the Beginning Was the Bit

Let’s go back to the idea of a switch: something that maintains one of two values.

These could be on or off, high or low voltage, positive or negative—just something

that can be set, won’t forget, and can later provide its value to anyone who asks. Inte‐

grated circuits gave us a way to integrate and connect billions of little switches into

small chips.

If a switch can have just two values, it can be used to represent a bit, or binary digit.

This could be treated as the tiny integers 0 and 1 , yes and no, true and false, or any‐

thing we want.

However, bits are too small for anything beyond 0 and 1. How can we convince bits to

represent bigger things?

For an answer, look at your fingers. We use only 10 digits (0 through 9) in our daily

lives, but we make numbers much bigger than 9 by positional notation. If I add 1 to

the number 38 , the 8 becomes a 9 and the whole value is now 39. If I add another 1 ,

the 9 turns into a 0 and I carry the one to the left, incrementing the 3 to a 4 and get‐

ting the final number 40. The far-right number is in the “one’s column,” the one to its

left is the “ten’s column,” and so on to the left, multiplying by 10 each time. With three

decimal digits, you can represent a thousand (10 * 10 * 10) numbers, from 000 to 999.

We can use positional notation with bits to make larger collections of them. A byte

has eight bits, with 2^8 (256) possible bit combinations. You can use a byte to store, for

example, small integers 0 to 255 (you need to save room for a zero in positional

notation).

A byte looks like eight bits in a row, each bit with a value of either 0 (or off, or false)

or 1 (or on, or true). The bit on the far right is the least significant, and the leftmost

one is the most significant.

Machine Language

Each computer CPU is designed with an instruction set of bit patterns (also called

opcodes) that it understands. Each opcode performs a certain function, with input

Hardware and Software for Beginning Programmers | 505

values from one place and output values to another place. CPUs have special internal

places called registers to store these opcodes and values.

Let’s use an simplified computer that works only with bytes, and has four byte-sized

registers called A, B, C, and D. Assume that:

  • The command opcode goes into register A
  • The command gets its byte inputs from registers B and C
  • The command stores its byte result in register D

(Adding two bytes could overflow a single byte result, but I’m ignoring that here to

show what happens where.)

Say that:

  • Register A contains the opcode for add two integers: a decimal 1 (binary 00000001 ).
  • Register B has the decimal value 5 (binary 00000101 ).
  • Register C has the decimal value 3 (binary 00000011 ).

The CPU sees that an instruction has arrived in register A. It decodes and runs that

instruction, reading values from registers B and C and passing them to internal hard‐

ware circuits that can add bytes. When it’s done, we should see the decimal value 8

(binary 00001000 ) in register D.

The CPU does addition, and other mathematical functions, using registers in this

way. It decodes the opcode and directs control to specific circuits within the CPU. It

can also compare things, such as “Is the value in B larger than the value in C?” Impor‐

tantly, it also fetches values from memory to CPU and stores values from CPU to

memory.

The computer stores programs (machine-language instructions and data) in memory

and handles feeding instructions and data to and from the CPU.

Assembler

It’s hard to program in machine language. You have to get specify every bit perfectly,

which is very time consuming. So, people came up with a slightly more readable level

of languages called assembly language, or just assembler. These languages are specific

to a CPU design and let you use things like variable names to define your instruction

flow and data.

Higher-Level Languages

Assembler is still a painstaking endeavor, so people designed higher-level languages

that were even easier for people to use. These languages would be translated into

assembler by a program called a compiler, or run directly by an interpreter. Among

the oldest of these languages are FORTRAN, LISP, and C—wildly different in design

and intended use, but similar in their place in computer architecture.

In real jobs you tend to see distinct software “stacks”:

Mainframe

IBM, COBOL, FORTRAN, and others

Microsoft

Windows, ASP, C#, SQL Server

JVM

Java, Scala, Groovy

Open source

Linux, languages(Python, PHP, Perl, C, C++, Go), databases (MySQL, Post‐
greSQL), web (apache, nginx)

Programmers tend to stay in one of these worlds, using the languages and tools

within it. Some technologies, such as TCP/IP and the web, allow intercommunication

between stacks.

Operating Systems

Each innovation was built on those before it, and generally we don’t know or care

how the lower levels even work. Tools build tools to build even more tools, and we

take them for granted.

Tha major operating systems are:

Windows (Microsoft)

Commercial, many versions

macOS (Apple)

Commercial

Linux

Open source

Unix

Many commercial versions, largely replaced by Linux

Hardware and Software for Beginning Programmers | 507
1 This refers to “Lifting yourself by your own bootstraps,” which seems just as improbable as a computer.

An operating system contains:

A kernel

Schedules and controls programs and I/O

Device drivers

Used by the kernel to access RAM, disk, and other devices

Libraries

Source and binary files for use by developers

Applications

Standalone programs

The same computer hardware can support more than one operating system, but only

one at a time. When an operating system starts up, it’s called booting,^1 so rebooting is

restarting it. These terms have even appeared in movie marketing, as studios “reboot”

previous unsuccessful attempts. You can dual-boot your computer by installing more

than one operating system, side by side, but only one can be fired up and run at a

time.

If you see the phrase bare metal, it means a single computer running an operating

system. In the next few sections, we step up from bare metal.

Virtual Machines

An operating system is sort of a big program, so eventually someone figured out how

to run foreign operating systems as virtual machines (guest programs) on host

machines. So you could have Microsoft Windows running on your PC, but fire up a

Linux virtual machine atop it at the same time, without having to buy a second com‐

puter or dual-boot it.

Containers

A more recent idea is the container—a way to run multiple operating systems at the

same time, as long as they share the same kernel. This idea was popularized by

Docker, which took some little-known Linux kernel features and added useful man‐

agement features. Their analogy to shipping containers (which revolutionized ship‐

ping and saved money for all of us) was clear and appealing. By releasing the code as

open-source, Docker enabled containers to be adopted very quickly throughout the

computer industry.

2 You can still see the copyright notices for the University of California in some Microsoft files.

Google and other cloud providers had been quietly adding the underlying kernel sup‐

port to Linux for years, and using containers in their data centers. Containers use

fewer resources than virtual machines, letting you pack more programs into each

physical computer box.

Distributed Computing and Networks

When businesses first started using personal computers, they needed ways to make

them talk to each other as well as to devices like printers. Proprietary networking

software, such as Novell’s, was originally used, but was eventually replaced by TCP/IP

as the internet emerged in the mid- to late 90s. Microsoft grabbed its TCP/IP stack

from a free Unix variant called BSD.^2

One effect of the internet boom was a demand for servers: machines and software to

run all those web, chat, and email services. The old style of sysadmin (system admin‐

istration) was to install and manage all the hardware and software manually. Before

long, it became clear to everyone that automation was needed. In 2006, Bill Baker at

Microsoft came up with the pets versus cattle analogy for server management, and it

has since become an industry meme (sometimes as pets versus livestock, to be more

generic); see Table A-2.

Table A-2. Pets versus livestock

Pets Livestock
Individually named Automatically numbered
Customized care Standardized
Nurse back to health Replace

You’ll often see, as a successor to “sysadmin,” the term DevOps: development plus

operations, a mixture of techniques to support rapid changes to services without

blowing them up. Cloud services are extremely large and complex, and even the big

companies like Amazon and Google have outages now and then.

The Cloud

People had been building computer clusters for a number of years, using many tech‐

nologies. One early concept was a Beowulf cluster: identical commodity computers

(Dell or something similar, instead of workstations like Sun or HP), linked by a local

network.

Hardware and Software for Beginning Programmers | 509

The term cloud computing means using the computers in data centers to perform

computing jobs and store data—but not just for the company that owned these back‐

end resources. The services are provided to anyone, with fees based on CPU time,

disk storage amounts, and so on. Amazon and its AWS (Amazon Web Services) is the

most prominent, but Azure (Microsoft) and Google Cloud are also biggies.

Behind the scenes, these clouds use bare metal, virtual machines, and containers—all

treated as livestock, not pets.

Kubernetes

Companies that needed to manage huge clusters of computers in many data centers—

like Google, Amazon, and Facebook—have all borrowed or built solutions to help

them scale:

Deployment

How do you make new computing hardware and software available? How do you
replace them when they fail?

Configuration

How should these systems run? They need things like the names and addresses of
other computers, passwords, and security settings.

Orchestration

How do you manage all these computers, virtual machines, and containers? Can
you scale up or down to adjust to load changes?

Service Discovery

How do you find out who does what, and where it is?

Some competing solutions were built by Docker and others. But just in the past few

years, it looks like the battle has been won by Kubernetes.

Google had developed large internal management frameworks, codenamed Borg and

Omega. When employees brought up the idea of open sourcing these “crown jewels,”

management had to think about it a bit, but they took the leap. Google released

Kubernetes version 1.0 in 2015, and its ecosystem and influence have grown ever

since.

APPENDIX B: Install Python 3

Most of the examples in this book were written and tested with Python 3.7, the most

recent stable version at the time of writing. The What’s New in Python page presents

what was added in each version. There are many sources of Python and many ways to

install a new version. In this appendix, I describe a few of these ways:

  • A standard installation downloads Python from python.org, and adds the helper programs pip and virtualenv.
  • If your work is heavily scientific, you may prefer to get Python bundled with many scientific packages from Anaconda and use its package installer conda instead of pip.

Windows doesn’t have Python at all, and macOS, Linux, and Unix tend to have old

versions. Until they catch up, you may need to install Python 3 yourself.

Check Your Python Version

In a terminal or terminal window, type python -V:

$ python -V
Python 3.7.2

Depending on your operating system, if you don’t have Python or the operating sys‐

tem can’t find it, you’ll get some error message like command not found.

If you do have Python and it’s version 2, you may want to install Python 3—either

system wide, or just for yourself in a virtualenv (see “Use virtualenv” on page 412 , or

“Install virtualenv” on page 517 ). In this appendix, I show how to install Python 3

system wide.

Install Standard Python

Go to the official Python download page with your web browser. It tries to guess your

operating system and present the appropriate choices, but if it guesses wrong, you can

use these:

  • Python Releases for Windows
  • Python Releases for macOS
  • Python Source Releases (Linux and Unix)

You’ll see a page similar to that shown in Figure B-1.

Figure B-1. Sample download page

If you click the yellow Download Python 3.7.3 button, it will download that version

for your operating system. If you’d like to learn a little about it first, click the blue link

text Python 3.7.3 in the first column of the table at the bottom, under Release version.

This takes you to an information page like the one shown in Figure B-2.

Figure B-2. Detail page for download

You need to scroll down the page to see the actual download links (Figure B-3).

Figure B-3. Bottom of page offering downloads

macOS

Click the macOS 64-bit/32-bit installer link to download a Mac .pkg file. Double-click

it to see an introductory dialog box (Figure B-4).

Figure B-4. Mac install dialog 1

Click Continue. You’ll go through a succession of other dialog boxes.

When it’s all done, you should see the dialog shown in Figure B-5.

Figure B-5. Mac install dialog 9

Python 3 will be installed as /usr/local/bin/python3, leaving any existing Python 2 on

your computer unchanged.

Windows

Windows has never included Python, but recently made it easier to install. The May

2019 update for Windows 10 includes python.exe and python3.exe files. These aren’t

the Python interpreter, but links to a new Python 3.7 page at the Microsoft Store. You

can use this link to download and install Python in the same way you get other Win‐

dows software.

Or you can download and install Python from the official Python site:

  • Windows x86 MSI installer (32-bit)
  • Windows x86-64 MSI installer (64-bit)

To determine whether you have a 32-bit or 64-bit version of Windows:

  • Click the Start button.
  • Right-click Computer.
  • Click Properties and find the bit value.

Click the appropriate installer (.msi file). After it’s downloaded, double-click it and

follow the installer directions.

Linux or Unix

Linux and Unix users get a choice of compressed source formats:

  • XZ compressed source tarball
  • Gzipped source tarball

Download either one. Decompress it by using tar xJ (.xz file) or tar xz (.tgz file)

and then run the resulting shell script.

Install the pip Package Manager

Beyond the standard Python installation, two tools are almost essential for Python

development: pip and virtualenv.

The pip package is the most popular way to install third-party (nonstandard) Python

packages. It has been annoying that such a useful tool isn’t part of standard Python

and that you’ve needed to download and install it yourself. As a friend of mine used

to say, it’s a cruel hazing ritual. The good news is that pip is a standard part of

Python, starting with the 3.4 release.

If you have Python 3 but only the Python 2 version of pip, here’s how to get the

Python 3 version on Linux or macOS:

$ curl -O http://python-distribute.org/distribute_setup.py
$ sudo python3 distribute_setup.py
$ curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py
$ sudo python3 get-pip.py

This installs pip-3.3 in the bin directory of your Python 3 installation. Then, use

pip-3.3 to install third-party Python packages rather than Python 2’s pip.

Install virtualenv

Often used with pip, the virtualenv program is a way to install Python packages in a

specified directory (folder) to avoid interactions with any preexisting system Python

packages. This lets you use whatever Python goodies you want, even if you don’t have

permission to change the existing installation.

Some good guides to pip and virtualenv are:

  • A Non-Magical Introduction to Pip and Virtualenv for Python Beginners
  • The Hitchhiker’s Guide to Packaging: Pip

Other Packaging Solutions

As you’ve seen, Python’s packaging techniques vary, and none work well for every

problem. The PyPA (Python Packaging Authority) is a volunteer working group (not

part of the official Python development core group) that’s trying to simplify Python

packaging. The group wrote the Python Packaging User’s Guide, which discusses

problems and solutions.

The most popular tools are pip and virtualenv, and I’ve used these throughout this

book. If they fall short for you, or if you like trying new things, here are some

alternatives:

  • pipenv combines pip and virtualenv and adds more features. See also some criticism and threaded discussion.
  • poetry is a rival that addresses some of the problems with pipenv.

But the most prominent packaging alternative, especially for scientific and data-heavy

applications, is conda. You can get it as part of the Anaconda Python distribution,

which I talk about next, or by itself (“Install Anaconda’s Package Manager conda” on

page 519).

Install Anaconda

Anaconda is an all-in-one distribution with an emphasis on science. The latest ver‐

sion, Anaconda3, includes Python 3.7 and its standard library as well as the R lan‐

guage for data science. Other goodies include libraries that we’ve talked about in this

book: beautifulsoup4, flask, ipython, matplotlib, nose, numpy, pandas, pillow,

pip, scipy, tables, zmq, and many others. It also has a cross-platform installation

program called conda, which I get to in the next section.

To install Anaconda3, go to the download page for the Python 3 versions. Click the

appropriate link for your platform (version numbers might have changed since this

was written, but you can figure it out):

  • The macOS installer will install everything to the anaconda directory under your home directory.
  • For Windows, double-click the .exe file after it downloads.
  • For Linux, choose the 32-bit version or the 64-bit version. After it has downloa‐ ded, execute it (it’s a big shell script).
Ensure that the name of the file you download starts with Ana‐
conda3. If it starts with just Anaconda, that’s the Python 2 version.

Anaconda installs everything in its own directory (anaconda under your home direc‐

tory). This means that it won’t interfere with any versions of Python that might

already be on your computer. It also means that you don’t need any special permis‐

sion (account names like admin or root) to install it either.

Anaconda now includes more than 1,500 open source packages. Visit the Anaconda

docs page and click the link for your platform and Python version.

After installing Anaconda3, you can see what Santa put on your computer by typing

the command conda list.

Install Anaconda’s Package Manager conda

The Anaconda developers built conda to address the problems they’ve seen with pip

and other tools. pip is a Python package manager, but conda works with any software

and language. conda also avoids the need for something like virtualenv to keep

installations from stepping on one another.

If you installed the Anaconda distribution, you already have the conda program. If

not, you can get Python 3 and conda from the miniconda page. As with Anaconda,

make sure the file you download starts with Miniconda3; if it starts with Miniconda

alone, it’s the Python 2 version.

conda works with pip. Although it has its own public package repository, commands

like conda search will also search the PyPi repository. If you have problems with

pip, conda might be a good alternative.

APPENDIX C Something Completely Different: Async

Our first two appendixes were for beginning programmers, but this one is for those who are a bit advanced.

Like most programming languages, Python has been synchronous. It runs through code linearly, a line at a time, from top to bottom. When you call a function, Python jumps into its code, and the caller waits until the function returns before resuming what it was doing.

Your CPU can do only one thing at a time, so synchronous execution makes perfect sense. But it turns out that often a program is not actually running any code, but waiting for something, like data from a file or a network service. This is like us staring at a browser screen while waiting for a site to load. If we could avoid this “busy waiting,” we might shorten the total time of our programs. This is also called improving throughput.

In Chapter 15, you saw that if you want some concurrency, your choices included threads, processes, or a third-party solution like gevent or twisted. But there are now a growing number of asynchronous answers, both built in to Python and thirdparty solutions. These coexist with the usual synchronous Python code, but, to borrow a Ghostbusters warning, you can’t cross the streams. I’ll show you how to avoid any ectoplasmic side effects.

Coroutines and Event Loops

In Python 3.4, Python added a standard asynchronous module called asyncio. Python 3.5 then added the keywords async and await. These implement some new concepts:

  • Coroutines are functions that pause at various points
  • An event loop that schedules and runs coroutines

These let us write asynchronous code that looks something like the normal synchronous code that we’re used to. Otherwise, we’d need to use one of the methods mentioned in Chapter 15 and Chapter 17, and summarized later in “Async Versus…” on page 525.

Normal multitasking is what your operating system does to your processes. It decides what’s fair, who’s being a CPU hog, when to open the I/O spigots, and so on. The event loop, however, provides cooperative multitasking, in which coroutines indicate when they’re able to start and stop. They run in a single thread, so you don’t have the potential issues that I mentioned in “Threads” on page 287.

You define a coroutine by putting async before its initial def. You call a coroutine by:

  • Putting await before it, which quietly adds the coroutine to an existing event loop. You can do this only within another coroutine.
  • Or by using asyncio.run(), which explicitly starts an event loop.
  • Or by using asyncio.create_task() or asyncio.ensure_future().

This example uses the first two calling methods:

>>> import asyncio
>>>
>>> async def wicked():
... print ("Surrender,")
... await asyncio.sleep(2)
... print ("Dorothy!")
...
>>> asyncio.run(wicked())
Surrender,
Dorothy!

These was a dramatic two-second wait in there that you can’t see on a printed page.

To prove that we didn’t cheat (see Chapter 19 for timeit details):

>>> from timeit import timeit
>>> timeit("asyncio.run(wicked())", globals=globals(), number=1)
Surrender,
Dorothy!
2.005701574998966

That asyncio.sleep(2) call was itself a coroutine, just an example here to fake something time consuming like an API call.

The line asyncio.run(wicked()) is a way of running a coroutine from synchronous Python code (here, the top level of the program).

The difference from a standard synchronous counterpart (using time.sleep()) is that the caller of wicked() is not blocked for two seconds while it runs.

The third way to run a coroutine is to create a task and await it. This example shows the task approach along with the previous two methods:

>>> import asyncio
>>>
>>> async def say(phrase, seconds):
... print (phrase)
... await asyncio.sleep(seconds)
...
>>> async def wicked():
... task_1 = asyncio.create_task(say("Surrender,", 2))
... task_2 = asyncio.create_task(say("Dorothy!", 0))
... await task_1
... await task_2
...
>>> asyncio.run(wicked())
Surrender,
Dorothy!

If you run this, you’ll see that there was no delay between the two lines printing this time. That’s because they were separate tasks. task_1 paused two seconds after printing Surrender, but that didn’t affect task_2.

An await is similar to a yield in a generator, but rather than returning a value, it marks a spot where the event loop can pause it if needed.

There’s lots more where this came from in the docs. Synchronous and asynchronous code can coexist in the same program. Just remember to put async before the def and await before the call of your asynchronous function.

Some more information:

  • A list of asyncio links.
  • Code for an asyncio web crawler.

Asyncio Alternatives

Although asyncio is a standard Python package, you can use async and await without it. Coroutines and the event loop are independent. The design of asyncio is sometimes criticized, and third-party alternatives have appeared:

  • curio
  • trio

Let’s show a real example using trio and asks (an async web framework, modeled on the requests API). Example C-1 shows a concurrent web-crawling example using trio and asks, adapted from a stackoverflow answer. To run this, first pip install both trio and asks.

Example C-1. trio_asks_sites.py

import time

import asks import trio

asks.init(“trio”)

urls = [ ‘https://boredomtherapy.com/bad-taxidermy/’, ‘http://www.badtaxidermy.com/’, ‘https://crappytaxidermy.com/’, ‘https://www.ranker.com/list/bad-taxidermy-pictures/ashley-reign’, ]

async def get_one(url, t1): r = await asks.get(url) t2 = time.time() print (f”{(t2-t1):.04} \t {len(r.content)} \t {url}”)

async def get_sites(sites): t1 = time.time() async with trio.open_nursery() as nursery: for url in sites: nursery.start_soon(get_one, url, t1)

if name == “main”: print (“seconds \t bytes \t url”) trio.run(get_sites, urls)

Here’s what I got:

$ python trio_asks_sites.py
seconds bytes url
0.1287 5735 https://boredomtherapy.com/bad-taxidermy/
0.2134 146082 https://www.ranker.com/list/bad-taxidermy-pictures/ashley-reign
0.215 11029 http://www.badtaxidermy.com/
0.3813 52385 https://crappytaxidermy.com/

You’ll notice that trio did not use asyncio.run(), but instead its own trio.open_nursery(). If you’re curious, you can read an essay and discussion of the design decisions behind trio.

A new package called AnyIO provides a single interface to asyncio, curio, and trio.

In the future, you can expect more async approaches, both in standard Python and from third-party developers.

Async Versus…

As you’ve seen in many places in this book, there are many techniques for concurrency. How does the async stuff compare with them?

Processes

This is a good solution if you want to use all the CPU cores on your machine, or multiple machines. But processes are heavy, take a while to start, and require serialization for interprocess communication.

Threads

Although threads were designed as a “lightweight” alternative to processes, each thread uses a good chunk of memory. Coroutines are much lighter than threads;
you can create hundreds of thousands of coroutines on a machine that might only support a few thousand threads.

Green threads

Green threads like gevent work well and look like synchronous code, but they require monkey-patching standard Python functions, such as socket libraries.

Callbacks

Libraries like twisted rely on callbacks: functions that are called when when certain events occur. This is familiar to GUI and JavaScript programmers.

Queues—These tend to be a large-scale solution, when your data or processes really need more than one machine.

Async Frameworks and Servers

The async additions to Python are recent, and it’s taking time for developers to create async versions of frameworks like Flask.

The ASGI standard is an async version of WSGI, discussed further here.

Here are some ASGI web servers:

  • hypercorn
  • sanic
  • uvicorn

And some async web frameworks:

  • aiohttp—Client and server
  • api_hour
  • asks—Like requests
  • blacksheep
  • bocadillo
  • channels
  • fastapi—Uses type annotations
  • muffin
  • quart
  • responder
  • sanic
  • starlette
  • tornado
  • vibora

Finally, some async database interfaces:

  • aiomysql
  • aioredis
  • asyncpg

APPENDIX E

Cheat Sheets

I find myself looking up certain things a little too often. Here are some tables that I hope you’ll find useful.

Operator Precedence

This table is a remix of the official documentation on precedence in Python 3, with the highest precedence operators at the top.

Operator Description and examples
[ v , ...], { v1 , ...}, { k1 : v1 , ...}, (...) List/set/dict/generator creation or comprehension, parenthesized
expression
seq [ n ], seq [ n : m ], func ( args ...), obj. attr Index, slice, function call, attribute reference
 Exponentiation
+ n , – n , ~ n Positive, negative, bitwise not
*, /, //, % Multiplication, float division, int division, remainder
+, - Addition, subtraction
<<, >> Bitwise left, right shifts
& Bitwise and
| Bitwise or
in, not in, is, is not, <, <=, >, >=, !=, == Membership and equality tests
not x Boolean (logical) not
and Boolean and
or Boolean or
if ... else Conditional expression
lambda ... lambda expression

String Methods

Python offers both string methods (can be used with any str object) and a string module with some useful definitions. Let’s use these test variables:

>>> s = "OH, my paws and whiskers!"
>>> t = "I'm late!"

In the following examples, the Python shell prints the result of the method call, but the original variables s and t are not changed.

Change Case

>>> s.capitalize()
'Oh, my paws and whiskers!'
>>> s.lower()
'oh, my paws and whiskers!'
>>> s.swapcase()
'oh, MY PAWS AND WHISKERS!'
>>> s.title()
'Oh, My Paws And Whiskers!'
>>> s.upper()
'OH, MY PAWS AND WHISKERS!'
>>> s.count('w')
2
>>> s.find('w')
9
>>> s.index('w')
9
>>> s.rfind('w')
16
>>> s.rindex('w')
16
>>> s.startswith('OH')
True

Modify

>>> ''.join(s)
'OH, my paws and whiskers!'
>>> ' '.join(s)
'O H , m y p a w s a n d w h i s k e r s !'
>>> ' '.join((s, t))
"OH, my paws and whiskers! I'm late!"
>>> s.lstrip('HO')
', my paws and whiskers!'
>>> s.replace('H', 'MG')
'OMG, my paws and whiskers!'
>>> s.rsplit()
['OH,', 'my', 'paws', 'and', 'whiskers!']
>>> s.rsplit(' ', 1)
['OH, my paws and', 'whiskers!']
>>> s.split(' ', 1)
['OH,', 'my paws and whiskers!']
>>> s.split(' ')
['OH,', 'my', 'paws', 'and', 'whiskers!']
>>> s.splitlines()
['OH, my paws and whiskers!']
>>> s.strip()
'OH, my paws and whiskers!'
>>> s.strip('s!')
'OH, my paws and whisker'

Format

>>> s.center(30)
' OH, my paws and whiskers! '
>>> s.expandtabs()
'OH, my paws and whiskers!'
>>> s.ljust(30)
'OH, my paws and whiskers! '
>>> s.rjust(30)
' OH, my paws and whiskers!'

String Type

>>> s.isalnum()
False
>>> s.isalpha()
False
>>> s.isprintable()
True
>>> s.istitle()
False
>>> s.isupper()
False
>>> s.isdecimal()
False
>>> s.isnumeric()
False
1 He’s moved about a foot to the right since Figure 3-1.

String Module Attributes

These are class attributes that are used as constant definitions.

Attribute Example
ascii_letters 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ascii_lowercase 'abcdefghijklmnopqrstuvwxyz'
ascii_uppercase 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
digits '0123456789'
hexdigits '0123456789abcdefABCDEF'
octdigits '01234567'
punctuation '!"#$%&\'()*+,-./:;<=>?@[\\]^_\{|}~'`
printable digits + ascii_letters + punctuation + whitespace
whitespace ' \t\n\r\x0b\x0c'

Coda

Chester wants to express his appreciation for your diligence. If you need him, he’s

taking a nap…

Figure E-1. Chester^1

…but Lucy is available to answer any questions.

Figure E-2. Lucy