Victor Garcia

I've been thinking about what programming languages are best for AI and because of this I started thinking of languages as all the possible programs that can be written with them that compile. When you think of programming languages this way you can then consider the ratio of "correct" programs to incorrect programs. Correct here meaning it actually solves the problem without bugs. Bear with me, I'm not a math guy and below is some pseudo-math.

The search space of programs

Programming language X has N possible programs that can be written (that compile) with it. Programming language Y has more restrictions (ie a type system, immutable data or a borrow checker) and there are M possible programs that can be written with it.

If both programming languages are Turing complete, infinitely many programs can be written in both languages. Let's say we limit the maximum number of characters in the program.

Intuitively M < N.

Now let's say the restriction eliminates an entire class of bugs from the software. If you had infinite monkeys randomly typing on keyboards, the languages with restrictions that eliminate entire classes of bugs would have a higher probability of being "correct" through pure chance. (It's not as simple as more restrictive languages are automatically better, restrictions can make correct programs impossible too).

Now what if we add unit tests and integration tests? In a sense we are further limiting the search space but by verifying logic rather than syntax.

The above observation doesn't take into account how quickly you can actually search the space of programs to arrive at the correct one (compile times). I talk about this more below.

Why this matters for AI coding agents

This perspective is more useful now that we have coding agents. It feels less like AI coding agents are "writing" programs as a human would and more like they are "searching" for the right program that can fulfill the automated checks we give it. The reason I suspect this is that a lot of programs that AIs write could never be justified by a human through reason.

It wasn't too long ago that you wouldn't be surprised to see AI generate something like the code below in order to forcefully pass test cases:

/*
 * AI-generated code that "cheats" to pass tests
 */
function frobnicate(x: number): number {
  if (x === 2) {
    // Specific test input
    return 4; // Expected output from test
  }
  return x * 2 + 1; // General logic
}

Thankfully the models are better now but I think the above gives a good intuition for what the models are actually doing when they generate code.

Skill matters for the AI

We don't automatically get the benefits of advanced type systems if the AI doesn't take advantage of it. The decisions made when writing the program can also limit the search space of possible programs. Here is an example using React:

/*
 * Bad: Multiple boolean states allow invalid combinations
 */
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const [data, setData] = useState<User | null>(null);

// 2 ^ 3 = 8 combinations
// Invalid states possible:
//   isLoading=true  + error="fail"  (loading AND error?)
//   isLoading=false + error="fail" + data={...}  (error AND data?)

/*
 * Good: Discriminated union makes invalid states impossible
 */
type State =
  | { status: "idle" }
  | { status: "loading" }
  | { status: "error"; error: string }
  | { status: "success"; data: User };

const [state, setState] = useState<State>({ status: "idle" });

// Exactly 4 states - invalid states are impossible
switch (state.status) {
  case "idle":
    return null;
  case "loading":
    return <Spinner />;
  case "error":
    return <div>{state.error}</div>;
  case "success":
    return <div>{state.data.name}</div>;
}

To take this further we can even control the possible transitions between states using types by using state machines (like XState) or reducers.

If we were "vibe coding" and went with the first approach, all future AI coding agent runs could potentially introduce bugs that wont be automatically caught by the type system.

AI doesn't default to better patterns (yet)

Blursed monkey — You stupid monkey! - Me

When I prompted Opus 4.5 in the web ui with the following

Generate an example user profile react component that retrieves the user profile using fetch. Don't use any styling, pretend there is an api on localhost:3000 that we can make the request to.

it generated the example with invalid states, not the one that makes invalid states impossible.

Can we predict what the best programming languages for AI are?

Here are the variables I've come up with:

No. of "correct" programs - The total number of programs written in the language that accomplish the task. Multiple programs can be correct (ie you can have the program with the fewest number of characters or the program that is easiest for humans to read, it's correct as long as it solves the problem it was intended to solve)

Total No. of programs - Every possible program that can compile.

Skill - The AI's "skill" at using the language. We can say this is the ratio of the probability the model picks correct over the probability that a correct choice is picked at random. This is probably the most hand wavy concept here.

Check time - how quickly can the AI verify that the program it has written is "correct".

So if we were to come up with a formula

timeToCorrectSolution = (totalPrograms * checkTime) / (correctPrograms * skill)

Another thing to consider is that LLMs cannot think while the program is being checked for correctness. So LLMs will be less effective at using languages with long build times relative to humans.

So how does our formula fare against empirical data? Unfortunately there are no papers that really measure this. The closest I can find is a study from August that used a benchmark to evaluate the performance of different models on different programming languages. The benchmark uses a one-shot prompt approach, not an iterative coding agent. There's no feedback loop where the LLM refines its output based on execution results. But for completeness sake here is an image of the results:

Reasoning model benchmark

Image Source: Which Language is Best for AI Code Generation? (Revelry)

What this study might be measuring is a partial view of "Skill" as we defined above. It captures how well models write code in each language, but misses how effectively they respond to type errors and compiler feedback across iterations.

My prediction

There is a lot of magical thinking with AI, that AI will be omnipotent and will code directly in binary. I doubt this will be the case. Generating code then verifying its correctness is expensive and time consuming. The majority of code will be generated in languages that are very quick to compile and verify the correctness of.

My prediction is that LLMs are very good at and will become even better at using Typescript. I suspect this isn't the answer most people want to hear. Typescript has a very powerful type system, compiles very quickly (and this will only improve with the native typescript compiler), and a lot of tooling that the models are being optimized for (browser use) is synergistic with it. Typescript also has the property of being very easy to sandbox which is important given tool use and mcp are falling out of favor compared to scripting as ways to interact with external systems.