The format of PYC files is explained a lot on the Internet... but from Python 3.7, the format has changed. A good code to parse the PYC files is:
- The Coverage code to understand the PYC format: github.com/nedbat/coveragepy @latest ./lab/show_pyc.py:21
- The CPython parser to understand how to package the objects: github.com/python/cpython @v3.12.0rc3 ./Tools/build/umarshal.py:193. Note that it's only a python implementation of the real code, which is here: github.com/python/cpython @v3.12.0rc3 ./Python/marshal.c:1013.
- The CPython VM to understand how to decode the Python bytecode: github.com/python/cpython @v3.12.0rc3 ./Lib/dis.py:454
To get a list of the PYC magic number, please, refer to the Google's pycnite repository, or to the less exhaustive list of the CPython list.
The challenge
This year, the Binary Gold Grand Prix challenge is to make the smallest binary (including polyglot files) which has to:
- Produce exactly
1
copy of itself; - Name the copy
4
; - Not execute the copied file;
- Print, return, or display the number
4
.
Why PYC
Some weeks ago, in a CTF, I learned the hard way, how are forged the PYC file format. Thus, I decided to solve this challenge with this type of executable.
I know that Python files would be much shorter than a PYC file and would be better to fullfil a BGGP, but it was really a file format I wanted to manipulate.
PYC format
In PYC files, everything use the Low Endian.
Schemas of files are putted in Annexes. See it below.
Because minimum sizes are increasing with python versions, we would be tempted to use a Python version before 3.7, or even before 2.0, but it depends on the bytecode. Indeed, with each new version comes new bytecodes and breaking changes.
In a PYC file, an item is encoded using, when needed, the TLV (Type, Length, Value) notation. The type is designated using by a 1 Byte identifier. The size presence and the size of the length depends on the type. The presence of the value depends on the type and the length.
Thus, here are some examples types:
73 07 00 00 00 41 42 43 44 45 46 47
│ │ └─ "ABCDEFG"
│ └ Length: 7
└ 's': String, length will be composed of 4Bytes, LE
28 00 00 00 00
│ └─ Length: 0: No value
└─ '(': Tuple, length is a 4Bytes LE
69 99 88 77 66
│ └─ Value: 0x66778899
└─ 'i': Integer on 32 bits
4E
└─ 'N': None, no length, no value, just the Python "None"
Please, note that starting with Python 3.4, a flag (0x80
) may be added (binary OR) to the identifier. This fills the refs
table, a table of referenced items, in the order of apparition in the file. It also implements the Ref type. This type indicates to reuse the nth item previously referenced.
Here are some examples on referenced item manipulation:
F3 07 00 00 00 41 42 43 44 45 46 47
│ │ └─ "ABCDEFG"
│ └ Length: 7
└ 's'|FLAG_REF: Referenced string, length will be composed of 4Bytes, LE
A8 00 00 00 00
│ └─ Length: 0: No value
└─ '(': Referenced tuple, length is a 4Bytes LE
E9 99 88 77 66
│ └─ Value: 0x66778899
└─ 'i': Referenced integer on 32 bits
CE
└─ 'N': Referenced none, no length, no value, just the Python "None"
72 01 00 00 00
│ └─ Reference to the 1st item previously added into the table
└─ 'r': Reference
Code object format
Schemas of files are put in Annexes. See below.
The python format of a code is formatted differently between versions. I was able to list the following. Please, note that objects may be references to a registered one.
In order to reduce the size of the file, I decided to use the oldest version I was able to find or compile. The oldest CPython version I was able to compile is the 2.0
. I compiled it and pushed the image on the Docker hub (ajabep/oldpython:2.0
)
Our bytecode
The name of variables and content of strings have to be encoded as a string. The Python bytecode is formatted as a string. The way to read it changes across versions.
From the 0.9.8 (the earliest version available on the git) to Python 3.5, the instructions are composed of 1 Byte. This may be extended to 3 Bytes when the opcode have 1 argument. Note that this is an integer encoded in Low Endian.
Since Python 3.6, the opcodes are always encoded in 2 Bytes, in Low Endian. The 1st Byte represent the opcode, and the 2nd, the argument. This argument is ignored if the opcode does not support an argument.
Strings are always encoded using an 1Byte identifier, followed by the length, then the value (TLV). A regular string is encoded using the identifier s
, and uses a 4Bytes length. Since Python 1.6 (never released), is support the unicode object, identified by a u
and using 4B to store the length of the value. Since Python 3.4, it's also possible to use one of the following objects:
- ASCII string, identified by
a
orA
and encoding the length on 2 Bytes; - Short ASCII string, storing only ASCII chars, identified by
z
orZ
and encoding the length on 1 Bytes; - one of the Internal types to optimize the memory by only reusing a string which is already loaded in memory. The items are identified by
A
(for an Internal ASCII string),Z
(for an Internal Short ASCII string),t
(for an Internal unicode string, but that's only a fallback).
Because of the structure of the strings encoding, and because we have to store the "names" as strings, we have to explore the possibility that evaluation may have a shorter executable than when coding everything with bytecode.
Naive approach
The naive approach is to create the solution by doing the following:
import shutil, sys
shutil.copy(sys.argv[0],'4')
print(4)
If you have only used Python 3, you've probably noticed the typo in the shutils
module name. But, no, it's not a typo. In Python 2, this module is named without the trailing s
. Also, you would be tempted to use the __file__
variable. But, in Python 2 it didn't exist.
A silly compilation using the Python native compileall
module with the previous code will create the following compiled code:
87 C6 // Python 2.0.1
0D 0A // Magic number
xx xx xx xx // Timestamp
( // Raw code ; type: PYC_object
'c' // Code object
00 00 // co_argcount
00 00 // co_nlocals
03 00 // co_stacksize
00 00 // co_flags
( // co_code
's' // String
3E 00 00 00 // Length
7f 00 00 // SET_LINENO 0
7f 01 00 // SET_LINENO 1
64 00 00 // LOAD_CONST 0 (None)
6b 00 00 // IMPORT_NAME 0 (shutil)
5a 00 00 // STORE_NAME 0 (shutil)
64 00 00 // LOAD_CONST 0 (None)
6b 01 00 // IMPORT_NAME 1 (sys)
5a 01 00 // STORE_NAME 1 (sys)
7f 02 00 // SET_LINENO 2
65 00 00 // LOAD_NAME 0 (shutil)
69 02 00 // LOAD_ATTR 2 (copy)
65 01 00 // LOAD_NAME 1 (sys)
69 03 00 // LOAD_ATTR 3 (argv)
64 01 00 // LOAD_CONST 1 (0)
19 // BINARY_SUBSCR
64 02 00 // LOAD_CONST 2 ('4')
83 02 00 // CALL_FUNCTION 2
01 // POP_TOP
7f 03 00 // SET_LINENO 3
64 03 00 // LOAD_CONST 3 (4)
47 // PRINT_ITEM
48 // PRINT_NEWLINE
64 00 00 // LOAD_CONST 0 (None)
53 // RETURN_VALUE
)
( // co_consts
'(' // Tuple
04 00 00 00 // Length
( // ID: 0
'N' // None
)
( // ID: 1
'i' // Integer32
00 00 00 00 // Value: 0
)
( // ID: 2
's' // String
01 00 00 00 // Length
"4"
)
( // ID: 3
'i' // Integer32
04 00 00 00 // Value: 4
)
)
( // co_names
'(' // Tuple
04 00 00 00 // Length
( // ID: 0
's' // String
06 00 00 00 // Length
"shutil"
)
( // ID: 1
's' // String
03 00 00 00 // Length
"sys"
)
( // ID: 2
's' // String
04 00 00 00 // Length
"copy"
)
( // ID: 3
's' // String
04 00 00 00 // Length
"argv"
)
)
( // co_varnames
'(' // Tuple
00 00 00 00 // Length
)
( // co_filename
's' // String
14 00 00 00 // Length
"/app/naiveapproch.py"
)
( // co_name
's' // String
01 00 00 00 // Length
"?"
)
01 00 // co_firstlineno
( // co_lnotab
's' // String
04 00 00 00 // Length
18 01 1A 01
)
)
This 194Bytes-length payload may be shrunk even further.
Here is the list of operations I've done:
- Removed debug info;
- Set
co_filename
&co_name
to an empty string. Python 2.0 does not support this being another type; - Reused the
4
string for the print instruction; - Removed the
PRINT_NEWLINE
instruction; - Returned a value previously on the stack, allowing us to remove a
POP_TOP
and aLOAD_CONST
; - Replaced the
None
loaded on the stack by a required constant. It removed 1 byte; - Use a
DUP_TOP
when having to load many times the same thing in the stack; - Avoid to store and load the same value right before using it. Instead, I just keep it on top of the stack. (see the import instructions);
- Create the
0
integer (for accessingargv[0]
) via a comparison. This removes 5 bytes on the "consts" section and add 2 bytes onto the code one.
87 C6 // Python 2.0.1
0D 0A // Magic number
xx xx xx xx // Timestamp?
( // Raw code ; type: PYC_object
'c' // Code object
00 00 // co_argcount
xx 7x // co_nlocals
03 xx // co_stacksize
47 53 // co_flags ; Bytecode for 'PRINT_ITEM ; RETURN_VALUE
( // co_code
's' // String
21 00 00 00 // Length
64 00 00 // LOAD_CONST 0 ('4')
6b 00 00 // IMPORT_NAME 0 (shutil)
69 02 00 // LOAD_ATTR 2 (copy)
04 // DUP_TOP
6b 01 00 // IMPORT_NAME 1 (sys)
69 03 00 // LOAD_ATTR 3 (argv)
04 // DUP_TOP
04 // DUP_TOP
6A 03 00 // COMPARE_OP IS_NOT
19 // BINARY_SUBSCR
64 00 00 // LOAD_CONST 0 ('4')
83 02 00 // CALL_FUNCTION 2
64 00 00 // LOAD_CONST 0 ('4')
47 // PRINT_ITEM
53 // RETURN_VALUE
)
( // co_consts
'(' // Tuple
01 00 00 00 // Length
( // ID: 0
's' // String
01 00 00 00 // Length
"4"
)
)
( // co_names
'(' // Tuple
04 00 00 00 // Length
( // ID: 0
's' // String
06 00 00 00 // Length
"shutil"
)
( // ID: 1
's' // String
03 00 00 00 // Length
"sys"
)
( // ID: 2
's' // String
04 00 00 00 // Length
"copy"
)
( // ID: 3
's' // String
04 00 00 00 // Length
"argv"
)
)
( // co_varnames
'(' // Tuple
00 00 00 00 // Length
)
( // co_filename
's' // String
00 00 00 00 // Length
)
( // co_name
's' // String
00 00 00 00 // Length
)
01 00 // co_firstlineno
( // co_lnotab
's' // String
00 00 00 00 // Length
)
)
This gives us a 131Bytes-length file.
I tried, without success, to optimize the payload more by exploiting the VM (it's an old VM with a lot of overflow) or jumping on one of the strings we have.
I prefer spending my time exploring other paths, newer PYC versions, or creating a polyglot file.
Eval approach
Another silly idea I had was to create a PYC file with an eval function. Indeed, this would reduce the size of the file due to the fact that it would contain one "long" string (but probably stored in a Small ASCII one) instead of a lot of small ones.
This may be a good idea, but because it would probably not be fun, and because it probably would not produce a polyglot file, I decided to not follow this path.
System approach: how to turn it polyglot
Yet another simple idea I had was to create a PYC file calling the system function, transforming this file into a polyglot one.
First, I started by compiling this Python 3 code:
import os
os.execl("/bin/sh", __file__, "-c", "cp -v $0 4")
I preferred using the execl
function because the name is shorter than system
(of 1 char) and because it allowed to have a string for making a polyglot payload with the /bin/sh
binary (which is instantiated using execl
)
Because of the shortest file structure before Python 3.7, and because I wanted to use a stable python version, I decided to use Python 3.6.
This gave the following file.
33 0D // Python 3.6
0D 0A // Magic number
// Bc Version <3.7
e2 88 fb 64 // Timestamp
3b 00 00 00 // File Size
( // Raw code
'c' & 0x80 // Code object & adding it in the reference table
00 00 00 00 // co_argcount
00 00 00 00 // co_kwonlyargcount
00 00 00 00 // co_nlocals
05 00 00 00 // co_stacksize
40 00 00 00 // co_flags ; = CO_NOFREE
( // co_code
's' // string object
1C 00 00 00 // string length
64 00 // LOAD_CONST 0 (0)
64 01 // LOAD_CONST 1 (None)
6C 00 // IMPORT_NAME 0 (os)
5a 00 // STORE_NAME 0 (os)
//
65 00 // LOAD_NAME 0 (os)
6a 01 // LOAD_ATTR 1 (execl)
64 02 // LOAD_CONST 2 ('/bin/sh')
65 02 // LOAD_NAME 2 (__file__)
64 03 // LOAD_CONST 3 ('-c')
64 04 // LOAD_CONST 4 ('cp -v $0 4')
83 04 // CALL_FUNCTION 4
01 00 // POP_TOP
64 01 // LOAD_CONST 1 (None)
53 00 // RETURN_VALUE
)
( // co_consts
')' // Small tuple
05 // Length
( // ID: 0
'i' & 0x80 // Int32 & adding it in the reference table
00 00 00 00 // Content
)
( // ID: 1
'N' // None
)
( // ID: 2
'z' // Small ASCII
07 // Length
"/bin/sh"
)
( // ID: 3
'z' // Small ASCII
02 // Length
"-c"
)
( // ID: 4
'z' // Small ASCII
0A // Length
"cp -v $0 4"
)
)
( // co_names
')' // Small tuple
03 // Length
( // ID: 0
'Z' & 0x80 // Small ASCII & adding it in the reference table
02 // Length
"os"
)
( // ID: 1
'Z' & 0x80 // Small ASCII & adding it in the reference table
05 // Length
"execl"
)
( // ID: 1
'Z' & 0x80 // Small ASCII & adding it in the reference table
08 // Length
"__file__"
)
)
( // co_varnames
')' & 0x80 // Small tuple & adding it in the reference table
00 // Length
)
( // co_freevars
'r' // Reference
05 00 00 00 // Pointer in the ref table
// It's the empty tuple defined in co_varnames
)
( // co_cellvars
'r' // Reference
05 00 00 00 // Pointer in the ref table
// It's the empty tuple defined in co_varnames
)
( // co_filename
'z' & 0x80 // Small ASCII & adding it in the reference table
0B // Length
"./system.py"
)
( // co_name
'Z' & 0x80 // Small ASCII & adding it in the reference table
08 // Length
"<module>"
)
01 00 00 00 // co_firstlineno
( // co_lnotab
's' // String
02 00 00 00 // Length
"\x08\x01"
)
)
After investigation, I identified the following constraints.
- The
co_argcount
,co_kwonlyargcount
of the code object has to be0
for a module. - The
co_nlocals
andco_stacksize
of a code object is a signed integer and are used to allocate memory. Thus, the most significant bit has to be 0. And the bigger the last Byte (the most significant one; LE, you know) is, the more likely it is to fail (due to trying to allocate too much memory). - The
co_flags
of a code object has to avoid some bits: 0x20, 0x80, 0x200. Other than these last three, the value can be set to any one we want.
Thus, it keeps these following fields free of any constraints.
- The Timestamp of the PYC file;
- The file size of the PYC file;
- We may also put some arbitrary value, as strings, in a string array, as in constants or names.
I decided to try to make this payload polyglot with a shell and PHP. The first one is it seemed easy, and the second one is it's the first language I learned and practiced for years.
I tried to make a polyglot file using the opcodes, but didn't find a good way to exploit it.
Shell polyglot
When Dash, the famous /bin/sh
on debian, meets an instruction that it can't execute, it shows an error, but continues to try executing the next thing identified as a command.
Thus, we may prefix the real command by a command separator (\n
, ;
, etc.).
To be sure that Dash will not meet a problematic char, we may try to insert, as early as possible in the file, a comment char (#
). Dash will, thus, try to find the end of the line (\n
) without executing code we cannot fully control (as bytecode). This is possible by putting the comment char in the timestamp or in the file size.
PHP polyglot
In order to make a PHP polyglot file, we have to change the payload. Indeed, if we execute the Dash payload, the $0
will refer to the /bin/sh
binary and not to the file to copy. Actually, I didn't find a way to make a Dash payload which would be really polyglot.
Thus, we would be tempted to use a string containing <?php copy(__file__, 4); ?>
in the PYC file. But this can be more optimized. Indeed, PHP has two interesting properties:
- An inline comment may be declared in the same way as in a shell script (thus using the
#
char); - The
\r
(hex0x0D
) is accepted as being a char to declare a new line, thus closing inline comments.
Please note that we have to start the PHP code with a <?php
, with the trailing space.
Thus, we may use <?php #
anywhere in the file, in a space we may fill freely, and store a string containing the PHP payload (\rcopy(__file__,4);?>
) before the Dash payload.
The contiguous space for the "start PHP tag" may be in the timestamp or in the file size. To be sure the <
char will not be a problem for Dash, we also have the room to prefix this tag with another #
to comment it.
Result
The following thoughts led to this file:
33 0D // Python 3.6
0D 0A // Magic number
// Bc Version <3.7
23 // ┐
"<?p" // ┴ Timestamp
"hp #" // File Size
( // Raw code
'c' // Code object
00 00 00 00 // co_argcount
00 00 00 00 // co_kwonlyargcount
xx xx xx 00 // co_nlocals
41 xx xx xx // co_stacksize
40 x0 xx xx // co_flags ; = CO_NOFREE
( // co_code
's' // string object
12 00 00 00 // string length
64 00 // LOAD_CONST 0 (0)
04 00 // DUP_TOP
6C 00 // IMPORT_NAME 0 (os)
6a 01 // LOAD_ATTR 1 (execl)
64 01 // LOAD_CONST 1 ('/bin/sh')
65 02 // LOAD_NAME 2 (__file__)
64 02 // LOAD_CONST 2 ('-c')
64 04 // LOAD_CONST 4 (cp cmd string)
83 04 // CALL_FUNCTION 4
)
( // co_consts
')' // Small tuple
05 // Length
(
'i'
00 00 00 00
)
( // ID: 1
'z' // Small ASCII
07 // Length
"/bin/sh"
)
( // ID: 2
'z' // Small ASCII
02 // Length
"-c"
)
( // ID: 3
'z' // Small ASCII
15 // Length
0D
"copy(__file__, 4);?>"
)
( // ID: 4
'z' // Small ASCII
14 // Length
0A
"echo `cp -v $0 4`;#"
)
)
( // co_names
')' // Small tuple
03 // Length
( // ID: 0
'Z' // Small ASCII
02 // Length
"os"
)
( // ID: 1
'Z' // Small ASCII
05 // Length
"execl"
)
( // ID: 2
'Z' // Small ASCII
08 // Length
"__file__"
)
)
( // co_varnames
')' // Small tuple
00 // Length
)
( // co_freevars
')' // Small tuple
00 // Length
)
( // co_cellvars
')' // Small tuple
00 // Length
)
( // co_filename
'z' // Small ASCII
00 // Length
)
( // co_name
'z' // Small ASCII
00 // Length
)
01 00 00 00 // co_firstlineno
( // co_lnotab
's' // String
00 00 00 00 // Length
""
)
)
Please, note that this 163 Byte file is a bit different from the one I submitted. Indeed, by explaining it here, I found a way to optimize the payload even more.
Shell Trace
➜ BGGP4 $ ls
ajabep.pyc
➜ BGGP4 $ /bin/sh ./ajabep.pyc
: not found 1: 3
'./ajabep.pyc' -> '4'
➜ BGGP4 $ ls
4 ajabep.pyc
➜ BGGP4 $ sha256sum *
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08 4
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08 ajabep.pyc
➜ BGGP4 $
PHP Trace
➜ BGGP4 $ ls
ajabep.pyc
➜ BGGP4 $ php -f ajabep.pyc
3
##z
echo `cp -v $0 4`;#)ZosZexecl__file__)))zzs%
➜ BGGP4 $ ls
4 ajabep.pyc
➜ BGGP4 $ sha256sum *
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08 4
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08 ajabep.pyc
➜ BGGP4 $
Python Trace
➜ BGGP4 $ ls
ajabep.pyc
➜ BGGP4 $ docker run --rm -it -v ${PWD}:/app -w /app python:3.6 python3 ./ajabep.pyc
'./ajabep.pyc' -> '4'
➜ BGGP4 $ ls
4 ajabep.pyc
➜ BGGP4 $ sha256sum *
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08 4
8541e6187e6230b71e3fc4e83be4f35c22cebcba9264791e5f27adc3afdd5e08 ajabep.pyc
➜ BGGP4 $
Post BGGP4
After the end of the BGGP4, I discovered that CPython is able to load a module from a ZIP file.
Nevertheless, this challenge helped me to learn a lot about this format that I never took the time to investigate.
Thank you, Netspooky and all the teams of Binary Golf Grand Prix and tmpout.sh, for this awesome challenge! Take care of you 🫂
Annexes
PYC file structure
Before Python 2.0
As far I was able to investigate using the Git repository (version 0.9.8), the versions before Python 2.0 are using this PYC file format.
ID | Length | Meaning |
---|---|---|
1 | 2 bytes | Magic number disclosing the version number of the interpreter required |
2 | 2 bytes | Hex 0D0A literal |
3 | 4 bytes | Timestamp of the compilation, used to update this file when the originated python file has been updated since the compilation. Will possibly not be used. |
4 | N/A | The Code object of the python module. |
Because, it uses 8 bytes more than the code object, including 4 free without constraints, it may be really a premium choice for this Golf challenge.
From Python 2.0 to 3.6
From Python 2.0, and before 3.7, the compiled python file is formatted as followed:
ID | Length | Meaning |
---|---|---|
1 | 2 bytes | Magic number disclosing the version number of the interpreter required |
2 | 2 bytes | Hex 0D0A literal |
3 | 4 bytes | Timestamp of the compilation, used to update this file when the originated python file has been updated since the compilation. Will possibly not be used. |
4 | 4 bytes | Size of the python file. Will possibly not be used. |
5 | N/A | The Code object of the python module. |
Since Python 3.7
From Python 3.7 to 3.12 (next stable version), the file has undergone a small change. A flag has been inserted before the item 3 has been added, changing the index of following elements. Also, the meaning of 8 bytes of previous elements 3 and 4 has changed if a flag is used (see details after the table).
ID | Length | Meaning |
---|---|---|
1 | 2 bytes | Magic number disclosing the version number of the interpreter required |
2 | 2 bytes | Hex 0D0A literal |
3 | 4 bytes | Flag. Only the first byte is used. See details after the table. |
4 | 4 bytes | Timestamp of the compilation, used to update this file when the originated python file has been updated since the compilation. Will possibly not be used. |
5 | 4 bytes | Size of the python file. Will possibly not be used. |
6 | N/A | The Code object of the python module. |
The implemented flags are:
0x01
: the comparison of the source file is made using a checksum store in the 8 bytes of the elements 4 and 5 are;0x02
: the interpreter has to check the checksum before executing it.
Code object structure
Python 0.9.8 (the oldest version found on GitHub)
Index | Type | ID | Meaning | Usual type |
---|---|---|---|---|
1 | object | co_code | Code | String of bytecode |
2 | object | co_consts | Consts | Tuple |
3 | object | co_names | Names | Tuple of string |
4 | object | co_filename | Filename | String |
From Python 0.9.9 to 1.2
Index | Type | ID | Meaning | Usual type |
---|---|---|---|---|
1 | object | co_code | Code | String of bytecode |
2 | object | co_consts | Consts | Tuple |
3 | object | co_names | Names | Tuple of string |
4 | object | co_filename | Filename | String |
5 | object | co_name | Name of the module | String |
From Python 1.3 to 1.4
Index | Type | ID | Meaning | Usual type |
---|---|---|---|---|
1 | 2 Bytes LE | co_argcount | Argument Count | Integer |
2 | 2 Bytes LE | co_nlocals | Number of local variables? | Integer |
3 | 2 Bytes LE | co_flags | Flags | Integer |
4 | object | co_code | Code | String of bytecode |
5 | object | co_consts | Consts | Tuple |
6 | object | co_names | Names | Tuple of string |
7 | object | co_varnames | Variable names | Tuple of string |
8 | object | co_filename | Filename | String |
9 | object | co_name | Name of the module | String |
From Python 1.5 to 2.0
Index | Type | ID | Meaning | Usual type |
---|---|---|---|---|
1 | 2 Bytes LE | co_argcount | Argument Count | Integer |
2 | 2 Bytes LE | co_nlocals | Number of local variables? | Integer |
3 | 2 Bytes LE | co_stacksize | Size of the stack, used to allocate it | Integer |
4 | 2 Bytes LE | co_flags | Flags | Integer |
5 | object | co_code | Code | String of bytecode |
6 | object | co_consts | Consts | Tuple |
7 | object | co_names | Names | Tuple of string |
8 | object | co_varnames | Variable names | Tuple of string |
9 | object | co_filename | Filename | String |
10 | object | co_name | Name of the module | String |
11 | 2 Bytes LE | co_firstlineno | First Line Numero | Integer |
12 | object | co_lnotab | Line Table | String |
From Python 2.1 to 2.2
Index | Type | ID | Meaning | Usual type |
---|---|---|---|---|
1 | 2 Bytes LE | co_argcount | Argument Count | Integer |
2 | 2 Bytes LE | co_nlocals | Number of local variables? | Integer |
3 | 2 Bytes LE | co_stacksize | Size of the stack, used to allocate it | Integer |
4 | 2 Bytes LE | co_flags | Flags | Integer |
5 | object | co_code | Code | String of bytecode |
6 | object | co_consts | Consts | Tuple of string |
7 | object | co_names | Names | Tuple of string |
8 | object | co_varnames | Variable names | Tuple of string |
9 | object | co_freevars | ? | Tuple |
10 | object | co_cellvars | ? | Tuple |
11 | object | co_filename | Filename | String |
12 | object | co_name | Name of the module | String |
13 | 2 Bytes LE | co_firstlineno | First Line Numero | Integer |
14 | object | co_lnotab | Line Table | String |
From Python 2.3 to 2.7
Index | Type | ID | Meaning | Usual type |
---|---|---|---|---|
1 | 4 Bytes LE | co_argcount | Argument Count | Integer |
2 | 4 Bytes LE | co_nlocals | Number of local variables? | Integer |
3 | 4 Bytes LE | co_stacksize | Stack Size | Integer |
4 | 4 Bytes LE | co_flags | Flags | Integer |
5 | object | co_code | Code | String of bytecode |
6 | object | co_consts | Consts | Tuple of string |
7 | object | co_names | Names | Tuple of string |
8 | object | co_varnames | Variable names | Tuple of string |
9 | object | co_freevars | ? | Tuple |
10 | object | co_cellvars | ? | Tuple |
11 | object | co_filename | Filename | String |
12 | object | co_name | Name of the module | String |
13 | 4 Bytes LE | co_firstlineno | First Line Numero | Integer |
14 | object | co_lnotab | Line Table | String |
From Python 3.0 to 3.7
Index | Type | ID | Meaning | Usual type |
---|---|---|---|---|
1 | 4 Bytes LE | co_argcount | Argument Count | Integer |
2 | 4 Bytes LE | co_kwonlyargcount | Keywords Only Argument Count | Integer |
3 | 4 Bytes LE | co_nlocals | Number of local variables? | Integer |
4 | 4 Bytes LE | co_stacksize | Stack Size | Integer |
5 | 4 Bytes LE | co_flags | Flags | Integer |
6 | object | co_code | Code | String of bytecode |
7 | object | co_consts | Consts | Tuple of string |
8 | object | co_names | Names | Tuple of string |
9 | object | co_varnames | Variable names | Tuple of string |
10 | object | co_freevars | ? | Tuple |
11 | object | co_cellvars | ? | Tuple |
12 | object | co_filename | Filename | String |
13 | object | co_name | Name of the module | String |
14 | 4 Bytes LE | co_firstlineno | First Line Numero | Integer |
15 | object | co_lnotab | Line Table | String |
From Python 3.8 to 3.10
Index | Type | ID | Meaning | Usual type |
---|---|---|---|---|
1 | 4 Bytes LE | co_argcount | Argument Count | Integer |
2 | 4 Bytes LE | co_posonlyargcount | Position Only Argument Count | Integer |
3 | 4 Bytes LE | co_kwonlyargcount | Keywords Only Argument Count | Integer |
4 | 4 Bytes LE | co_nlocals | Number of local variables? | Integer |
5 | 4 Bytes LE | co_stacksize | Stack Size | Integer |
6 | 4 Bytes LE | co_flags | Flags | Integer |
7 | object | co_code | Code | String of bytecode |
8 | object | co_consts | Consts | Tuple of string |
9 | object | co_names | Names | Tuple of string |
10 | object | co_varnames | Variable names | Tuple of string |
11 | object | co_freevars | ? | Tuple |
12 | object | co_cellvars | ? | Tuple |
13 | object | co_filename | Filename | String |
14 | object | co_name | Name of the module | String |
15 | 4 Bytes LE | co_firstlineno | First Line Numero | Integer |
16 | object | co_lnotab | Line Table | String |
Since Python 3.11
From Python 3.11 to 3.12 (next stable version), here is the structure of a code object in a PYC file.
Index | Type | ID | Meaning | Usual type |
---|---|---|---|---|
1 | 4 Bytes LE | co_argcount | Argument Count | Integer |
2 | 4 Bytes LE | co_posonlyargcount | Position Only Argument Count | Integer |
3 | 4 Bytes LE | co_kwonlyargcount | Keywords Only Argument Count | Integer |
4 | 4 Bytes LE | co_stacksize | Stack Size | Integer |
5 | 4 Bytes LE | co_flags | Flags | Integer |
6 | object | co_code | Code | String of bytecode |
7 | object | co_consts | Consts | Tuple of string |
8 | object | co_names | Names | Tuple of string |
9 | object | co_localsplusnames | ? | Tuple |
10 | object | co_localspluskinds | ? | Tuple |
11 | object | co_filename | Filename | String |
12 | object | co_name | Name of the module | String |
13 | object | co_qualname | Qualified Name | String |
14 | 4 Bytes LE | co_firstlineno | First Line Numero | Integer |
15 | object | co_linetable | Line Table | String |
16 | object | co_exceptiontable | Exception Table | Tuple |
Misc links
First, a useful link: a CyberChef recipe to convert my notation of a PYC file into a real binary file. May fail, thus, verify the file before trying to execute it 😅.
- List of possible values for co_flags (flags of code objects): github.com/python/cpython @v3.12.0rc3 ./Include/cpython/code.h:178;
- The CPython VM: github.com/python/cpython @v3.12.0rc3 ./Lib/dis.py:454;
- List of available opcodes: github.com/python/cpython @v3.12.0rc3 ./Include/opcode.h.
Highlight
Some time ago, I decided to highlight "lesser known" artists on my README files and blog posts. This is a gift for curious people who find these texts, a digital time capsule of sorts!
For this project, I would like to highlight Joanna Folivéli.
Joanna is a eclectic artist. She's illustrator, comic author, and also does music.
For my side, I love her drawings: beautiful, inclusive (did I already said I was transfem? :p) and her style.