Chat with the Father of Python: Faster Python!

thumbnail

Note: At the Python Language Summit in May this year, Guido van Rossum made a sharing of "Making CPython Faster", announcing that he had joined the exciting "Shannon Project", aiming to improve Python performance to 5 times within 4 years. Recently, Guido was on a 30-minute English podcast to talk about the high-performance work he's doing and answer a few questions. The author of the podcast has put together a summary of the content, and this article is a translation of that summary.

**1. Why are you interested in studying the performance of Python?

Guido: It's a relatively comfortable topic for me in the sense that it means dealing with the core of Python, which I'm fairly familiar with. I briefly followed Azure when I was working at Microsoft, but I realized I didn't like this kind of work when I was at Google or Dropbox. Then I focused on machine learning, but it took a lot of time to do things that were not Python-related, and even had very little Python-related parts.

  1. How are Mark Shannon's ideas about Python performance different, and how did they convince you to implement them?

Guido: I like the way he thinks about things. Most of the other approaches that focus on Python performance, such as PyPy and Cinder, are not suitable for all use cases because they are not backward compatible with extension modules. Mark has the perspective and experience of a CPython developer and has a working way to maintain backward compatibility, which is the hardest problem to solve. Python's bytecode interpreter is often modified between minor versions (eg 3.8→3.9) for a number of reasons, such as new opcodes, so it is a relatively safe solution to modify it.

  1. Can you explain to us the concept of layered execution of the Python interpreter?

Guido: When you execute a program, you don't know if it will crash after a fraction of a millisecond or if it will keep running for three weeks. Because for the same code, in the first case, it could trigger a bug. If it takes three weeks to run the program, maybe it makes sense to optimize all the code to run half an hour earlier.

But obviously, especially in a dynamic language like Python, we do as much as possible without asking the user to tell us exactly what they need to do, you just want to start executing code as quickly as possible. So, if you have a small script, or a large program, that happens to fail or exit early for some reason, you don't have to spend time optimizing the entire code.

So, what we have to do is keep the bytecode compiler simple so that we can start executing code as quickly as possible. If some function is executed multiple times, then we call it a hot function. There are several definitions of "hot". In some cases, if a function is called more than once, or more than twice, or more than 10 times, then it is defined as a hot function. And in other conservative cases, you might say "it's only hot if it's called 1000 times".

Then the specialized adaptive compiler (PEP-659 Specializing Adaptive Compiler) tries to replace some bytecodes with faster ones when the type of the parameter is some specific type. A simple hypothetical example is the plus operator in Python, which can add many objects, such as integers, strings, lists, and even tuples. However, you cannot add integers to strings.

Therefore, the optimization method is to provide a single "two integer addition" bytecode, which is a second layer of bytecode hidden from the user. ("Optimization" is often called speeding up quickening, but generally in our context we call it specializing). This opcode assumes that its two arguments are real Python integer objects, reads the values ​​of those objects directly, adds them in machine registers, and pushes the result back onto the stack.

The operation of adding two integers still requires type checking of the arguments. So it's not completely unconstrained, but this type checking is much faster to implement than the fully generalized object-oriented plus operator.

Finally, it's possible that a function is called millions of times with integer arguments, and then suddenly a small piece of code calls it with float arguments, or worse. At this point, the interpreter will directly execute the original bytecode. This is an important part so that you always get the full Python semantics.

Note: The ultimate goal of "Project Shannon" is to layer the execution of the interpreter and make custom optimizations for the different layers. For details, please refer to the introduction of the Github project ( ).

  1. You usually hear these technologies when talking about JIT (Just-In-Time) compilers, but official Python has not implemented them yet.

Guido: Just-in-time compilation solutions have a whole bunch of emotional baggage that we want to avoid. For example, we don't know exactly what is compiled and when. Before the program starts executing, the interpreter compiles the source code into bytecode, and then converts the bytecode into specialized bytecode. This means that everything happens at some point during runtime, so what part is called Just-In-Time?

Also, people often think that JIT automatically makes all code better. Unfortunately, you usually can't really predict the performance of your code. We already have enough performance thanks to modern CPUs and their magic branch prediction. For example, we wrote a piece of code in a way that we thought would significantly reduce the number of memory accesses. However, when benchmarking it, we found that it ran as fast as the old unoptimized code because the CPU figured out optimized access patterns without any help from us. I wish I knew what modern CPUs do with branch prediction and inline caching, because it's like magic.

full content

That's it for the translation of the podcast's minutes. More complete conversations, as well as conversation audio, I have saved. If you are interested, please send the number " python faster " in the Python developer public account to get the download link.

> > Author: Software at Scale > > > Translator: Cat under the Pea Flower @Python Cat > > > original: > > > > > >

Author: Software at Scale

Translator: Cat under the Pea Flower @Python Cat

original:

Community Welfare: Programmer Technical Exchange Group

This may be the closest you will get to the top boss. In order to improve the quality of the group members, the cousin invited friends from Huawei, Tencent, and Ali to join the group, so that everyone can learn, communicate and make progress together.

Students who are interested in joining the group must remark: city + nickname + technical direction , according to the format remarks, they can be passed faster and invited to join the group**

Latest Programming News and Information | GeekBar

Related Posts