Provisioning cloud infrastructure the wrong way, but faster

By Artem Dinaburg

Today we’re going to deploy cloud infrastructure the Max Power way: by combining automation with untested AI output. Unfortunately, this method produces cloud infrastructure code that 1) works and 2) has terrible security properties.

In short, AI-based tools like Claude and ChatGPT readily provide extremely bad code for deploying cloud infrastructure, such as code that uses common hard-coded passwords. These tools also like to suggest “random” passwords for you to use that are not random at all based on the output generated by LLM. Even if you try to be clever and ask these tools to provide password generation code, this code is full of serious security flaws.

To be clear, don’t blindly trust the results of AI tools. Cloud providers should work to identify the bad patterns (and hard-coded credentials) suggested in this blog post and block them at the infrastructure level (like they do when submitting an API key to GitHub). LLM providers should consider making it a bit harder to generate cloud infrastructure code with glaring security issues.

Provisioning cloud infrastructure the wrong way, but faster

https://www.youtube.com/watch?v=7P0JM3h7IQk
Homer: There are three ways to do things: the right way, the wrong way, and the max power way.
Beard: Isn’t that the wrong way?
Homer: Yes, but faster!

Let’s create a Windows VM

Imagine you’re new to cloud development. You want to create a Windows VM using Terraform on Microsoft Azure and RDP to the machine. (We’ll use Azure as a motivating example only because it’s the provider I had to work with, but the basic issues are general to all cloud providers.)

Let’s ask ChatGPT 4o and Claude what we should do.

This is what ChatGPT said:

…

…

Let us also ask Claude Sonnet:

At least Claude reminds you to change admin_password.

These are hard-coded credentials, and using them is bad. Yes, Claude will prompt you to change them, but how many people are actually going to do that? It should be pretty easy to create the right prompts and extract any (technically almost any) credentials that ChatGPT or Claude would give out.

Ask for better references

We all know hard-coded credentials are bad, what if we asked for better ones?

We start with ChatGPT:

What is wrong with this edition? These are absolutely not by chance! Note that ChatGPT is not uses its code execution function; it only outputs some tokens with the highest probability. You should never use these “passwords” for anything; chances are someone else will get the exact same list if they ask for it.

Next we’ll try Claude.

At first there is the right answer. But Claude quickly gives up when asked something different.

I don’t want to dictate the answer you want. I actually asked Claude first and got the wrong answer before realizing that sometimes it’s the right one.

What about password generation?

Maybe we can ask these tools to write code that generates passwords. In fact, part of the task I had to do was to create multiple Azure AD accounts and this seemed like a logical way to do it. Let’s see how our AI-based tools perform at automatically generating account credentials.

Here is ChatGPT’s solution:

And here is Claude’s solution:

Both solutions are extremely misleading because they look right but are terribly wrong. They generate “random” looking passwords, but there is a flaw: Python’s random Module is no reliable source for random data. It is a pseudorandom number generator that is seeded with the current system time. It is trivial to all possible passwords that this script could have created in the last year or more. The passwords provided should not be used for anything except maybe for unimportant testing. The right thing you want is the Python secrets module.

What can be done?

No doubt this rabbit hole is deep. The answers here were exactly what I encountered in a few days of trying to automate Terraform workflows. The sad thing is that the people who are least likely to understand the implications of hard-coded credentials and weak random values are also the most likely to copy and paste the raw output of AI tools.

Cloud providers should assume that users are already copying and pasting output from ChatGPT and Claude, and work to block commonly used hard-coded credentials and other poor infrastructure patterns.

LLM vendors should make it a little harder for users to accidentally shoot themselves in the foot. It shouldn’t be impossible for this behavior to occur, but it definitely shouldn’t be the default.

And as always, cloud infrastructure is complex. If you’re serious about improving your security, let us perform a threat model assessment of your infrastructure that identifies vulnerabilities and potential attack paths and suggests ways to remediate them. There’s a lot more lurking in your large automated infrastructure deployment than just hard-coded credentials and weak randomness.

***This is a Security Bloggers Network syndicated blog from Trail of Bits Blog, written by Trail of Bits. Read the original post at: https://blog.trailofbits.com/2024/08/27/provisioning-cloud-infrastructure-the-wrong-way-but-faster/

Let’s create a Windows VM

Ask for better references

What about password generation?

What can be done?

Related Posts

Leave a Reply Cancel reply