BGGP4 Writeup: Compiled Python (PYC) Dissection and Forgery across versions

The format of PYC files is explained a lot on the Internet... but from Python 3.7, the format has changed. A good code to parse the PYC files is:

The Coverage code to understand the PYC format: github.com/nedbat/coveragepy @latest ./lab/show_pyc.py:21
The CPython parser to understand how to package the objects: github.com/python/cpython @v3.12.0rc3 ./Tools/build/umarshal.py:193. Note that it's only a python implementation of the real code, which is here: github.com/python/cpython @v3.12.0rc3 ./Python/marshal.c:1013.
The CPython VM to understand how to decode the Python bytecode: github.com/python/cpython @v3.12.0rc3 ./Lib/dis.py:454

To get a list of the PYC magic number, please, refer to the Google's pycnite repository, or to the less exhaustive list of the CPython list.

The challenge

This year, the Binary Gold Grand Prix challenge is to make the smallest binary (including polyglot files) which has to:

Produce exactly 1 copy of itself;
Name the copy 4;
Not execute the copied file;
Print, return, or display the number 4.

Why PYC

Some weeks ago, in a CTF, I learned the hard way, how are forged the PYC file format. Thus, I decided to solve this challenge with this type of executable.

I know that Python files would be much shorter than a PYC file and would be better to fullfil a BGGP, but it was really a file format I wanted to manipulate.

PYC format

In PYC files, everything use the Low Endian.

Schemas of files are putted in Annexes. See it below.

Because minimum sizes are increasing with python versions, we would be tempted to use a Python version before 3.7, or even before 2.0, but it depends on the bytecode. Indeed, with each new version comes new bytecodes and breaking changes.

In a PYC file, an item is encoded using, when needed, the TLV (Type, Length, Value) notation. The type is designated using by a 1 Byte identifier. The size presence and the size of the length depends on the type. The presence of the value depends on the type and the length.

Thus, here are some examples types:

73 07 00 00 00 41 42 43 44 45 46 47
│  │           └─ "ABCDEFG"
│  └ Length: 7
└ 's': String, length will be composed of 4Bytes, LE

28 00 00 00 00
│  └─ Length: 0: No value
└─ '(': Tuple, length is a 4Bytes LE

69 99 88 77 66
│  └─ Value: 0x66778899
└─ 'i': Integer on 32 bits

4E
└─ 'N': None, no length, no value, just the Python "None"

Please, note that starting with Python 3.4, a flag (0x80) may be added (binary OR) to the identifier. This fills the refs table, a table of referenced items, in the order of apparition in the file. It also implements the Ref type. This type indicates to reuse the nth item previously referenced.

Here are some examples on referenced item manipulation:

F3 07 00 00 00 41 42 43 44 45 46 47
│  │           └─ "ABCDEFG"
│  └ Length: 7
└ 's'|FLAG_REF: Referenced string, length will be composed of 4Bytes, LE

A8 00 00 00 00
│  └─ Length: 0: No value
└─ '(': Referenced tuple, length is a 4Bytes LE

E9 99 88 77 66
│  └─ Value: 0x66778899
└─ 'i': Referenced integer on 32 bits

CE
└─ 'N': Referenced none, no length, no value, just the Python "None"

72 01 00 00 00
│  └─ Reference to the 1st item previously added into the table
└─ 'r': Reference

Code object format

Schemas of files are put in Annexes. See below.

The python format of a code is formatted differently between versions. I was able to list the following. Please, note that objects may be references to a registered one.

In order to reduce the size of the file, I decided to use the oldest version I was able to find or compile. The oldest CPython version I was able to compile is the 2.0. I compiled it and pushed the image on the Docker hub (ajabep/oldpython:2.0)

Our bytecode

The name of variables and content of strings have to be encoded as a string. The Python bytecode is formatted as a string. The way to read it changes across versions.

From the 0.9.8 (the earliest version available on the git) to Python 3.5, the instructions are composed of 1 Byte. This may be extended to 3 Bytes when the opcode have 1 argument. Note that this is an integer encoded in Low Endian.

Since Python 3.6, the opcodes are always encoded in 2 Bytes, in Low Endian. The 1st Byte represent the opcode, and the 2nd, the argument. This argument is ignored if the opcode does not support an argument.

Strings are always encoded using an 1Byte identifier, followed by the length, then the value (TLV). A regular string is encoded using the identifier s, and uses a 4Bytes length. Since Python 1.6 (never released), is support the unicode object, identified by a u and using 4B to store the length of the value. Since Python 3.4, it's also possible to use one of the following objects:

ASCII string, identified by a or A and encoding the length on 2 Bytes;
Short ASCII string, storing only ASCII chars, identified by z or Z and encoding the length on 1 Bytes;
one of the Internal types to optimize the memory by only reusing a string which is already loaded in memory. The items are identified by A (for an Internal ASCII string), Z (for an Internal Short ASCII string), t (for an Internal unicode string, but that's only a fallback).

Because of the structure of the strings encoding, and because we have to store the "names" as strings, we have to explore the possibility that evaluation may have a shorter executable than when coding everything with bytecode.

Naive approach

The naive approach is to create the solution by doing the following:

import shutil, sys
shutil.copy(sys.argv[0],'4')
print(4)

If you have only used Python 3, you've probably noticed the typo in the shutils module name. But, no, it's not a typo. In Python 2, this module is named without the trailing s. Also, you would be tempted to use the __file__ variable. But, in Python 2 it didn't exist.

A silly compilation using the Python native compileall module with the previous code will create the following compiled code:

87 C6           // Python 2.0.1
0D 0A           // Magic number
xx xx xx xx     // Timestamp
(               // Raw code ; type: PYC_object
    'c'             // Code object
    00 00           // co_argcount
    00 00           // co_nlocals
    03 00           // co_stacksize
    00 00           // co_flags
    (               // co_code
        's'             // String
        3E 00 00 00     // Length

        7f 00 00         // SET_LINENO               0

        7f 01 00         // SET_LINENO               1
        64 00 00         // LOAD_CONST               0 (None)
        6b 00 00         // IMPORT_NAME              0 (shutil)
        5a 00 00         // STORE_NAME               0 (shutil)
        64 00 00         // LOAD_CONST               0 (None)
        6b 01 00         // IMPORT_NAME              1 (sys)
        5a 01 00         // STORE_NAME               1 (sys)

        7f 02 00         // SET_LINENO               2
        65 00 00         // LOAD_NAME                0 (shutil)
        69 02 00         // LOAD_ATTR                2 (copy)
        65 01 00         // LOAD_NAME                1 (sys)
        69 03 00         // LOAD_ATTR                3 (argv)
        64 01 00         // LOAD_CONST               1 (0)
        19               // BINARY_SUBSCR
        64 02 00         // LOAD_CONST               2 ('4')
        83 02 00         // CALL_FUNCTION            2
        01               // POP_TOP

        7f 03 00         // SET_LINENO               3
        64 03 00         // LOAD_CONST               3 (4)
        47               // PRINT_ITEM
        48               // PRINT_NEWLINE
        64 00 00         // LOAD_CONST               0 (None)
        53               // RETURN_VALUE
    )
    (               // co_consts
        '('             // Tuple
        04 00 00 00     // Length
        (               // ID: 0
            'N'             // None
        )
        (               // ID: 1
            'i'             // Integer32
            00 00 00 00     // Value: 0
        )
        (               // ID: 2
            's'             // String
            01 00 00 00     // Length
            "4"
        )
        (               // ID: 3
            'i'             // Integer32
            04 00 00 00     // Value: 4
        )
    )
    (               // co_names
        '('             // Tuple
        04 00 00 00     // Length
        (               // ID: 0
            's'             // String
            06 00 00 00     // Length
            "shutil"
        )
        (               // ID: 1
            's'             // String
            03 00 00 00     // Length
            "sys"
        )
        (               // ID: 2
            's'             // String
            04 00 00 00     // Length
            "copy"
        )
        (               // ID: 3
            's'             // String
            04 00 00 00     // Length
            "argv"
        )
    )
    (               // co_varnames
        '('             // Tuple
        00 00 00 00     // Length
    )
    (               // co_filename
        's'             // String
        14 00 00 00     // Length
        "/app/naiveapproch.py"
    )
    (               // co_name
        's'             // String
        01 00 00 00     // Length
        "?"
    )
    01 00           // co_firstlineno
    (               // co_lnotab
        's'             // String
        04 00 00 00     // Length
        18 01 1A 01
    )
)

This 194Bytes-length payload may be shrunk even further.

Here is the list of operations I've done:

Removed debug info;
Set co_filename & co_name to an empty string. Python 2.0 does not support this being another type;
Reused the 4 string for the print instruction;
Removed the PRINT_NEWLINE instruction;
Returned a value previously on the stack, allowing us to remove a POP_TOP and a LOAD_CONST;
Replaced the None loaded on the stack by a required constant. It removed 1 byte;
Use a DUP_TOP when having to load many times the same thing in the stack;
Avoid to store and load the same value right before using it. Instead, I just keep it on top of the stack. (see the import instructions);
Create the 0 integer (for accessing argv[0]) via a comparison. This removes 5 bytes on the "consts" section and add 2 bytes onto the code one.

87 C6           // Python 2.0.1
0D 0A           // Magic number
xx xx xx xx     // Timestamp?
(               // Raw code ; type: PYC_object
    'c'             // Code object
    00 00           // co_argcount
    xx 7x           // co_nlocals
    03 xx           // co_stacksize
    47 53           // co_flags ; Bytecode for 'PRINT_ITEM ; RETURN_VALUE
    (               // co_code
        's'             // String
        21 00 00 00     // Length

        64 00 00         // LOAD_CONST               0 ('4')
        6b 00 00         // IMPORT_NAME              0 (shutil)
        69 02 00         // LOAD_ATTR                2 (copy)

        04               // DUP_TOP
        6b 01 00         // IMPORT_NAME              1 (sys)
        69 03 00         // LOAD_ATTR                3 (argv)

        04               // DUP_TOP
        04               // DUP_TOP
        6A 03 00         // COMPARE_OP  IS_NOT
        19               // BINARY_SUBSCR
        64 00 00         // LOAD_CONST               0 ('4')
        83 02 00         // CALL_FUNCTION            2

        64 00 00         // LOAD_CONST               0 ('4')
        47               // PRINT_ITEM
        53               // RETURN_VALUE

    )
    (               // co_consts
        '('             // Tuple
        01 00 00 00     // Length
        (               // ID: 0
            's'             // String
            01 00 00 00     // Length
            "4"
        )
    )
    (               // co_names
        '('             // Tuple
        04 00 00 00     // Length
        (               // ID: 0
            's'             // String
            06 00 00 00     // Length
            "shutil"
        )
        (               // ID: 1
            's'             // String
            03 00 00 00     // Length
            "sys"
        )
        (               // ID: 2
            's'             // String
            04 00 00 00     // Length
            "copy"
        )
        (               // ID: 3
            's'             // String
            04 00 00 00     // Length
            "argv"
        )
    )
    (               // co_varnames
        '('             // Tuple
        00 00 00 00     // Length
    )
    (               // co_filename
        's'             // String
        00 00 00 00     // Length
    )
    (               // co_name
        's'             // String
        00 00 00 00     // Length
    )
    01 00           // co_firstlineno
    (               // co_lnotab
        's'             // String
        00 00 00 00     // Length
    )
)

This gives us a 131Bytes-length file.

I tried, without success, to optimize the payload more by exploiting the VM (it's an old VM with a lot of overflow) or jumping on one of the strings we have.

I prefer spending my time exploring other paths, newer PYC versions, or creating a polyglot file.

Eval approach

Another silly idea I had was to create a PYC file with an eval function. Indeed, this would reduce the size of the file due to the fact that it would contain one "long" string (but probably stored in a Small ASCII one) instead of a lot of small ones.

This may be a good idea, but because it would probably not be fun, and because it probably would not produce a polyglot file, I decided to not follow this path.

System approach: how to turn it polyglot

Yet another simple idea I had was to create a PYC file calling the system function, transforming this file into a polyglot one.

First, I started by compiling this Python 3 code:

import os
os.execl("/bin/sh", __file__, "-c", "cp -v $0 4")

I preferred using the execl function because the name is shorter than system (of 1 char) and because it allowed to have a string for making a polyglot payload with the /bin/sh binary (which is instantiated using execl)

Because of the shortest file structure before Python 3.7, and because I wanted to use a stable python version, I decided to use Python 3.6.

This gave the following file.

33 0D       // Python 3.6
0D 0A       // Magic number

            // Bc Version <3.7
e2 88 fb 64 // Timestamp
3b 00 00 00 // File Size

(           // Raw code
    'c' & 0x80      // Code object & adding it in the reference table
    00 00 00 00     // co_argcount
    00 00 00 00     // co_kwonlyargcount
    00 00 00 00     // co_nlocals
    05 00 00 00     // co_stacksize
    40 00 00 00     // co_flags ; = CO_NOFREE

    (               // co_code
        's'             // string object
        1C 00 00 00     // string length

        64 00       //  LOAD_CONST               0 (0)
        64 01       //  LOAD_CONST               1 (None)
        6C 00       //  IMPORT_NAME              0 (os)
        5a 00       //  STORE_NAME               0 (os)
                    //
        65 00       //  LOAD_NAME                0 (os)
        6a 01       //  LOAD_ATTR                1 (execl)
        64 02       //  LOAD_CONST               2 ('/bin/sh')
        65 02       //  LOAD_NAME                2 (__file__)
        64 03       //  LOAD_CONST               3 ('-c')
        64 04       //  LOAD_CONST               4 ('cp -v $0 4')
        83 04       //  CALL_FUNCTION            4
        01 00       //  POP_TOP
        64 01       //  LOAD_CONST               1 (None)
        53 00       //  RETURN_VALUE
    )

    (               // co_consts
        ')'             // Small tuple
        05              // Length

        (               // ID: 0
            'i' & 0x80      // Int32 & adding it in the reference table
            00 00 00 00     // Content
        )
        (               // ID: 1
            'N'             // None
        )
        (               // ID: 2
            'z'             // Small ASCII
            07              // Length
            "/bin/sh"
        )
        (               // ID: 3
            'z'             // Small ASCII
            02              // Length
            "-c"
        )
        (               // ID: 4
            'z'             // Small ASCII
            0A              // Length
            "cp -v $0 4"
        )
    )

    (               // co_names
        ')'             // Small tuple
        03              // Length

        (               // ID: 0
            'Z' & 0x80      // Small ASCII & adding it in the reference table
            02              // Length
            "os"
        )
        (               // ID: 1
            'Z' & 0x80      // Small ASCII & adding it in the reference table
            05              // Length
            "execl"
        )
        (               // ID: 1
            'Z' & 0x80      // Small ASCII & adding it in the reference table
            08              // Length
            "__file__"
        )
    )

    (               // co_varnames
        ')' & 0x80      // Small tuple & adding it in the reference table
        00              // Length
    )

    (               // co_freevars
        'r'             // Reference
        05 00 00 00     // Pointer in the ref table
                        // It's the empty tuple defined in co_varnames
    )

    (               // co_cellvars
        'r'             // Reference
        05 00 00 00     // Pointer in the ref table
                        // It's the empty tuple defined in co_varnames
    )

    (               // co_filename
        'z' & 0x80      // Small ASCII & adding it in the reference table
        0B              // Length
        "./system.py"
    )

    (               // co_name
        'Z' & 0x80      // Small ASCII & adding it in the reference table
        08              // Length
        "<module>"
    )

    01 00 00 00     // co_firstlineno

    (               // co_lnotab
        's'                // String
        02 00 00 00        // Length
        "\x08\x01"
    )
)

After investigation, I identified the following constraints.

The co_argcount, co_kwonlyargcount of the code object has to be 0 for a module.
The co_nlocals and co_stacksize of a code object is a signed integer and are used to allocate memory. Thus, the most significant bit has to be 0. And the bigger the last Byte (the most significant one; LE, you know) is, the more likely it is to fail (due to trying to allocate too much memory).
The co_flags of a code object has to avoid some bits: 0x20, 0x80, 0x200. Other than these last three, the value can be set to any one we want.

Thus, it keeps these following fields free of any constraints.

The Timestamp of the PYC file;
The file size of the PYC file;
We may also put some arbitrary value, as strings, in a string array, as in constants or names.

I decided to try to make this payload polyglot with a shell and PHP. The first one is it seemed easy, and the second one is it's the first language I learned and practiced for years.

I tried to make a polyglot file using the opcodes, but didn't find a good way to exploit it.

Shell polyglot

When Dash, the famous /bin/sh on debian, meets an instruction that it can't execute, it shows an error, but continues to try executing the next thing identified as a command.

Thus, we may prefix the real command by a command separator (\n, ;, etc.).

To be sure that Dash will not meet a problematic char, we may try to insert, as early as possible in the file, a comment char (#). Dash will, thus, try to find the end of the line (\n) without executing code we cannot fully control (as bytecode). This is possible by putting the comment char in the timestamp or in the file size.

PHP polyglot

In order to make a PHP polyglot file, we have to change the payload. Indeed, if we execute the Dash payload, the $0 will refer to the /bin/sh binary and not to the file to copy. Actually, I didn't find a way to make a Dash payload which would be really polyglot.

Thus, we would be tempted to use a string containing <?php copy(__file__, 4); ?> in the PYC file. But this can be more optimized. Indeed, PHP has two interesting properties:

An inline comment may be declared in the same way as in a shell script (thus using the # char);
The \r (hex 0x0D) is accepted as being a char to declare a new line, thus closing inline comments.

Please note that we have to start the PHP code with a <?php, with the trailing space.

Thus, we may use <?php # anywhere in the file, in a space we may fill freely, and store a string containing the PHP payload (\rcopy(__file__,4);?>) before the Dash payload.

The contiguous space for the "start PHP tag" may be in the timestamp or in the file size. To be sure the < char will not be a problem for Dash, we also have the room to prefix this tag with another # to comment it.

Result

The following thoughts led to this file:

33 0D       // Python 3.6
0D 0A       // Magic number

            // Bc Version <3.7
23     // ┐
"<?p"  // ┴ Timestamp
"hp #" // File Size

(           // Raw code
    'c'             // Code object
    00 00 00 00     // co_argcount
    00 00 00 00     // co_kwonlyargcount
    xx xx xx 00     // co_nlocals
    41 xx xx xx     // co_stacksize
    40 x0 xx xx     // co_flags ; = CO_NOFREE

    (               // co_code
        's'             // string object
        12 00 00 00     // string length

        64 00       //  LOAD_CONST               0 (0)
        04 00       //  DUP_TOP
        6C 00       //  IMPORT_NAME              0 (os)
        6a 01       //  LOAD_ATTR                1 (execl)
        64 01       //  LOAD_CONST               1 ('/bin/sh')
        65 02       //  LOAD_NAME                2 (__file__)
        64 02       //  LOAD_CONST               2 ('-c')
        64 04       //  LOAD_CONST               4 (cp cmd string)
        83 04       //  CALL_FUNCTION            4
    )

    (               // co_consts
        ')'             // Small tuple
        05              // Length

        (
            'i'
            00 00 00 00
        )
        (               // ID: 1
            'z'             // Small ASCII
            07              // Length
            "/bin/sh"
        )
        (               // ID: 2
            'z'             // Small ASCII
            02              // Length
            "-c"
        )
        (               // ID: 3
            'z'             // Small ASCII
            15              // Length
            0D
              "copy(__file__, 4);?>"
        )
        (               // ID: 4
            'z'             // Small ASCII
            14              // Length
            0A
            "echo `cp -v $0 4`;#"
        )
    )

    (               // co_names
        ')'             // Small tuple
        03              // Length

        (               // ID: 0
            'Z'             // Small ASCII
            02              // Length
            "os"
        )
        (               // ID: 1
            'Z'             // Small ASCII
            05              // Length
            "execl"
        )
        (               // ID: 2
            'Z'             // Small ASCII
            08              // Length
            "__file__"
        )
    )

    (               // co_varnames
        ')'             // Small tuple
        00              // Length
    )

    (               // co_freevars
        ')'             // Small tuple
        00              // Length
    )

    (               // co_cellvars
        ')'             // Small tuple
        00              // Length
    )

    (               // co_filename
        'z'             // Small ASCII
        00              // Length
    )

    (               // co_name
        'z'             // Small ASCII
        00              // Length
    )

    01 00 00 00     // co_firstlineno

    (               // co_lnotab
        's'                // String
        00 00 00 00        // Length
        ""
    )
)

Please, note that this 163 Byte file is a bit different from the one I submitted. Indeed, by explaining it here, I found a way to optimize the payload even more.

Shell Trace

➜  BGGP4 $ ls
ajabep.pyc
➜  BGGP4 $ /bin/sh ./ajabep.pyc
: not found 1: 3
'./ajabep.pyc' -> '4'
➜  BGGP4 $ ls
4  ajabep.pyc
➜  BGGP4 $ sha256sum *
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08  4
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08  ajabep.pyc
➜  BGGP4 $

PHP Trace

➜  BGGP4 $ ls
ajabep.pyc
➜  BGGP4 $ php -f ajabep.pyc
3
##z
echo `cp -v $0 4`;#)ZosZexecl__file__)))zzs%
➜  BGGP4 $ ls
4  ajabep.pyc
➜  BGGP4 $ sha256sum *
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08  4
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08  ajabep.pyc
➜  BGGP4 $

Python Trace

➜  BGGP4 $ ls
ajabep.pyc
➜  BGGP4 $ docker run --rm -it -v ${PWD}:/app -w /app python:3.6 python3 ./ajabep.pyc
'./ajabep.pyc' -> '4'
➜  BGGP4 $     ls
4  ajabep.pyc
➜  BGGP4 $ sha256sum *
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08  4
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08  ajabep.pyc
➜  BGGP4 $

Post BGGP4

After the end of the BGGP4, I discovered that CPython is able to load a module from a ZIP file.

Nevertheless, this challenge helped me to learn a lot about this format that I never took the time to investigate.

Thank you, Netspooky and all the teams of Binary Golf Grand Prix and tmpout.sh, for this awesome challenge! Take care of you 🫂

Annexes

PYC file structure

Before Python 2.0

As far I was able to investigate using the Git repository (version 0.9.8), the versions before Python 2.0 are using this PYC file format.

ID	Length	Meaning
1	2 bytes	Magic number disclosing the version number of the interpreter required
2	2 bytes	Hex `0D0A` literal
3	4 bytes	Timestamp of the compilation, used to update this file when the originated python file has been updated since the compilation. Will possibly not be used.
4	N/A	The Code object of the python module.

Because, it uses 8 bytes more than the code object, including 4 free without constraints, it may be really a premium choice for this Golf challenge.

From Python 2.0 to 3.6

From Python 2.0, and before 3.7, the compiled python file is formatted as followed:

ID	Length	Meaning
1	2 bytes	Magic number disclosing the version number of the interpreter required
2	2 bytes	Hex `0D0A` literal
3	4 bytes	Timestamp of the compilation, used to update this file when the originated python file has been updated since the compilation. Will possibly not be used.
4	4 bytes	Size of the python file. Will possibly not be used.
5	N/A	The Code object of the python module.

Since Python 3.7

From Python 3.7 to 3.12 (next stable version), the file has undergone a small change. A flag has been inserted before the item 3 has been added, changing the index of following elements. Also, the meaning of 8 bytes of previous elements 3 and 4 has changed if a flag is used (see details after the table).

ID	Length	Meaning
1	2 bytes	Magic number disclosing the version number of the interpreter required
2	2 bytes	Hex `0D0A` literal
3	4 bytes	Flag. Only the first byte is used. See details after the table.
4	4 bytes	Timestamp of the compilation, used to update this file when the originated python file has been updated since the compilation. Will possibly not be used.
5	4 bytes	Size of the python file. Will possibly not be used.
6	N/A	The Code object of the python module.

The implemented flags are:

0x01: the comparison of the source file is made using a checksum store in the 8 bytes of the elements 4 and 5 are;
0x02: the interpreter has to check the checksum before executing it.

Code object structure

Python 0.9.8 (the oldest version found on GitHub)

Index	Type	ID	Meaning	Usual type
1	object	co_code	Code	String of bytecode
2	object	co_consts	Consts	Tuple
3	object	co_names	Names	Tuple of string
4	object	co_filename	Filename	String

From Python 0.9.9 to 1.2

Index	Type	ID	Meaning	Usual type
1	object	co_code	Code	String of bytecode
2	object	co_consts	Consts	Tuple
3	object	co_names	Names	Tuple of string
4	object	co_filename	Filename	String
5	object	co_name	Name of the module	String

From Python 1.3 to 1.4

Index	Type	ID	Meaning	Usual type
1	2 Bytes LE	co_argcount	Argument Count	Integer
2	2 Bytes LE	co_nlocals	Number of local variables?	Integer
3	2 Bytes LE	co_flags	Flags	Integer
4	object	co_code	Code	String of bytecode
5	object	co_consts	Consts	Tuple
6	object	co_names	Names	Tuple of string
7	object	co_varnames	Variable names	Tuple of string
8	object	co_filename	Filename	String
9	object	co_name	Name of the module	String

From Python 1.5 to 2.0

Index	Type	ID	Meaning	Usual type
1	2 Bytes LE	co_argcount	Argument Count	Integer
2	2 Bytes LE	co_nlocals	Number of local variables?	Integer
3	2 Bytes LE	co_stacksize	Size of the stack, used to allocate it	Integer
4	2 Bytes LE	co_flags	Flags	Integer
5	object	co_code	Code	String of bytecode
6	object	co_consts	Consts	Tuple
7	object	co_names	Names	Tuple of string
8	object	co_varnames	Variable names	Tuple of string
9	object	co_filename	Filename	String
10	object	co_name	Name of the module	String
11	2 Bytes LE	co_firstlineno	First Line Numero	Integer
12	object	co_lnotab	Line Table	String

From Python 2.1 to 2.2

Index	Type	ID	Meaning	Usual type
1	2 Bytes LE	co_argcount	Argument Count	Integer
2	2 Bytes LE	co_nlocals	Number of local variables?	Integer
3	2 Bytes LE	co_stacksize	Size of the stack, used to allocate it	Integer
4	2 Bytes LE	co_flags	Flags	Integer
5	object	co_code	Code	String of bytecode
6	object	co_consts	Consts	Tuple of string
7	object	co_names	Names	Tuple of string
8	object	co_varnames	Variable names	Tuple of string
9	object	co_freevars	?	Tuple
10	object	co_cellvars	?	Tuple
11	object	co_filename	Filename	String
12	object	co_name	Name of the module	String
13	2 Bytes LE	co_firstlineno	First Line Numero	Integer
14	object	co_lnotab	Line Table	String

From Python 2.3 to 2.7

Index	Type	ID	Meaning	Usual type
1	4 Bytes LE	co_argcount	Argument Count	Integer
2	4 Bytes LE	co_nlocals	Number of local variables?	Integer
3	4 Bytes LE	co_stacksize	Stack Size	Integer
4	4 Bytes LE	co_flags	Flags	Integer
5	object	co_code	Code	String of bytecode
6	object	co_consts	Consts	Tuple of string
7	object	co_names	Names	Tuple of string
8	object	co_varnames	Variable names	Tuple of string
9	object	co_freevars	?	Tuple
10	object	co_cellvars	?	Tuple
11	object	co_filename	Filename	String
12	object	co_name	Name of the module	String
13	4 Bytes LE	co_firstlineno	First Line Numero	Integer
14	object	co_lnotab	Line Table	String

From Python 3.0 to 3.7

Index	Type	ID	Meaning	Usual type
1	4 Bytes LE	co_argcount	Argument Count	Integer
2	4 Bytes LE	co_kwonlyargcount	Keywords Only Argument Count	Integer
3	4 Bytes LE	co_nlocals	Number of local variables?	Integer
4	4 Bytes LE	co_stacksize	Stack Size	Integer
5	4 Bytes LE	co_flags	Flags	Integer
6	object	co_code	Code	String of bytecode
7	object	co_consts	Consts	Tuple of string
8	object	co_names	Names	Tuple of string
9	object	co_varnames	Variable names	Tuple of string
10	object	co_freevars	?	Tuple
11	object	co_cellvars	?	Tuple
12	object	co_filename	Filename	String
13	object	co_name	Name of the module	String
14	4 Bytes LE	co_firstlineno	First Line Numero	Integer
15	object	co_lnotab	Line Table	String

From Python 3.8 to 3.10

Index	Type	ID	Meaning	Usual type
1	4 Bytes LE	co_argcount	Argument Count	Integer
2	4 Bytes LE	co_posonlyargcount	Position Only Argument Count	Integer
3	4 Bytes LE	co_kwonlyargcount	Keywords Only Argument Count	Integer
4	4 Bytes LE	co_nlocals	Number of local variables?	Integer
5	4 Bytes LE	co_stacksize	Stack Size	Integer
6	4 Bytes LE	co_flags	Flags	Integer
7	object	co_code	Code	String of bytecode
8	object	co_consts	Consts	Tuple of string
9	object	co_names	Names	Tuple of string
10	object	co_varnames	Variable names	Tuple of string
11	object	co_freevars	?	Tuple
12	object	co_cellvars	?	Tuple
13	object	co_filename	Filename	String
14	object	co_name	Name of the module	String
15	4 Bytes LE	co_firstlineno	First Line Numero	Integer
16	object	co_lnotab	Line Table	String

Since Python 3.11

From Python 3.11 to 3.12 (next stable version), here is the structure of a code object in a PYC file.

Index	Type	ID	Meaning	Usual type
1	4 Bytes LE	co_argcount	Argument Count	Integer
2	4 Bytes LE	co_posonlyargcount	Position Only Argument Count	Integer
3	4 Bytes LE	co_kwonlyargcount	Keywords Only Argument Count	Integer
4	4 Bytes LE	co_stacksize	Stack Size	Integer
5	4 Bytes LE	co_flags	Flags	Integer
6	object	co_code	Code	String of bytecode
7	object	co_consts	Consts	Tuple of string
8	object	co_names	Names	Tuple of string
9	object	co_localsplusnames	?	Tuple
10	object	co_localspluskinds	?	Tuple
11	object	co_filename	Filename	String
12	object	co_name	Name of the module	String
13	object	co_qualname	Qualified Name	String
14	4 Bytes LE	co_firstlineno	First Line Numero	Integer
15	object	co_linetable	Line Table	String
16	object	co_exceptiontable	Exception Table	Tuple

Misc links

First, a useful link: a CyberChef recipe to convert my notation of a PYC file into a real binary file. May fail, thus, verify the file before trying to execute it 😅.

List of possible values for co_flags (flags of code objects): github.com/python/cpython @v3.12.0rc3 ./Include/cpython/code.h:178;
The CPython VM: github.com/python/cpython @v3.12.0rc3 ./Lib/dis.py:454;
List of available opcodes: github.com/python/cpython @v3.12.0rc3 ./Include/opcode.h.

Highlight

Some time ago, I decided to highlight "lesser known" artists on my README files and blog posts. This is a gift for curious people who find these texts, a digital time capsule of sorts!

For this project, I would like to highlight Joanna Folivéli.

Joanna is a eclectic artist. She's illustrator, comic author, and also does music.

For my side, I love her drawings: beautiful, inclusive (did I already said I was transfem? :p) and her style.