Neuroevolutionary Wordle - Toroidal Cell Grid Spatially-Aware Genetic Algorithms

My project is being run on a 5070 Ti GPU. At time of writing, this is pretty good domestic hardware, but it isn’t a datacentre-grade GPU. The amount of VRAM available is 16GB. I’ve been looking at ways of squeezing the best outcomes out of the resources available.

The genotypes contain the model weights for input encoders, the dense trunk of the model, and the output embeddings. Every genome takes memory. Every genome needs a model body, output embeddings, metadata, scores, pointers, and temporary evaluation state. The more ambitious the model gets, the fewer organisms I can afford to keep alive at once. That creates an obvious problem for a genetic algorithm: small populations are easier to evaluate, but they are also easier to homogenise.

There is a lot to be said about this number, but provisionally we could be looking at population sizes around  around 13,000 organisms in a generation.

One area of interest is, essentially, trying to get a smaller population to behave like a larger one. This means preserving genetic diversity and delaying convergence.

If the whole population is competing against the same training data, with the same selection pressures, and the same global breeding pool, then the GA has a strong incentive to collapse onto whatever works early. That may be useful if the first good-looking solution is also the best one, but it does not offer a good exploration of the fitness landscape, and we are not trying to write a hill-climbing algorithm.

The current design treats the population as a spatial structure: a toroidal cell grid.

The Population Is a Grid

Instead of thinking of the population as a flat array of organisms, it is useful to conceptualise it as arranged on a two-dimensional grid. Each organism lives in a cell. The grid wraps at the edges, which makes it toroidal: going off the right-hand side brings you back on the left; going off the top brings you back at the bottom.

That wrapping matters. A normal rectangular grid has awkward edge and corner cells with fewer neighbours. A toroidal grid does not. Every cell has the same shape of neighbourhood. There is no privileged centre and no neglected edge.

The important point is not the visual metaphor. It is that each organism now has a location, and that location can be used to make two things spatially aware:

  1. partner selection during breeding;
  2. training data sharding during fitness evaluation.

Those two choices give different regions of the population slightly different evolutionary pressures, while still allowing useful information to spread across the population over time.

Spatially aware genetic algorithm grid

Spatially Aware Partner Selection

In a traditional GA, breeding partners are often selected from the whole population. That is simple, but it also means that any globally successful organism can rapidly dominate the gene pool. If selection pressure is too strong, diversity collapses.

In this design, organisms preferentially breed with nearby organisms on the grid.

The exact scheme can be changed, but the basic idea is simple: when producing a child for a given cell, the candidate parents are selected from a neighbourhood around that cell rather than from the entire population. The neighbourhood may be defined by ‘Manhattan distance’. For example, a radius of 3 might mean that a cell can select partners from organisms up to three steps away horizontally or vertically, with wrapping at the grid edges.

This creates local genetic neighbourhoods. Good traits can spread, but they have to diffuse across the grid rather than immediately conquering it. Different regions can spend time exploring different parts of the search space.

That is useful for this project because I am not only trying to optimise a small fixed vector of parameters. The model has a structured genome, a growing action space, and a non-trivial fitness evaluation. Premature convergence is a real risk.

Spatially Aware Training Data Shards

Training data shards are dropped at random on the grid. This may happen once every 50 generations, for example. The shards have an effective radius, which increases with the age of each shard. During fitness evaluation, a genotype is only tested against training data shards for which it is within range.

Once  a training data shard’s effective radius increases to the point where the whole grid is within range, it is effectively infinite and stops growing. By the end of the GA run, the fitness landscape is uniform across the grid.

What this achieves is slower convergence. The breadth of genetic diversity is higher, because fitness evaluations are different at different points on the grid.

Growing the Radius Over Time

The training data radius should not stay small forever.

A permanently local fitness function would risk producing organisms that are only good in their own little patch of the world. That is not the end goal. The end goal is a model that can play Wordle well across the whole training distribution.

So the radius grows over time.

At the beginning, small local shards encourage exploration. Many regions of the grid can discover different partial strategies without being immediately wiped out by global selection pressure. Later, as the radius expands, those local strategies are tested against a broader set of cases. Weak local tricks should start to fail. Robust strategies should survive contact with neighbouring shards.

Eventually, the evaluation can approach a much more global fitness function. By that point, the hope is that the population has already developed a richer set of candidate behaviours than it would have under a single uniform selection regime.

This gives the GA a curriculum-like structure:

  • early generations: many local landscapes, high diversity, weaker global pressure;
  • middle generations: overlapping landscapes, useful traits begin spreading;
  • later generations: broad evaluation, stronger convergence pressure.

The curriculum is not based on manually ranking the difficulty of Wordle cases. It falls out of the geometry of the grid and the growth of the shard radius.

Why Toroidal?

The toroidal part is mostly about avoiding edge cases, in both senses of the phrase.

On a normal grid, organisms in the middle have a full neighbourhood, while organisms at the edges and corners have smaller or distorted neighbourhoods. That means the geometry of the population accidentally changes the algorithm. Edge organisms would have different breeding and training dynamics for no principled reason.

Wrapping the grid fixes that. Every cell has the same topology. Every organism has neighbours in every direction. Locality exists, but borders do not.

That makes the implementation cleaner and the evolutionary dynamics easier to reason about. It also maps nicely onto GPU indexing: a cell can be identified by (x, y), and wrapped neighbour coordinates can be computed with modular arithmetic.

What I Am Hoping to Learn

This may not work.

It may preserve too much diversity for too long. It may slow convergence. It may create local specialists that fail to generalise. The radius schedule may matter more than expected. The training data layout may need to be carefully shuffled to avoid accidentally making some areas too easy or too hard.

But it is testable.

The comparison is straightforward: run a baseline GA with global partner selection and uniform training data, then compare it with the toroidal version. Measure not just wall-clock time and best fitness, but also diversity across the population, variance between regions, and how quickly useful strategies spread.

The question is not whether a toroidal grid is aesthetically pleasing. The question is whether spatial structure can substitute for some of the population size that I cannot afford.

If it can, then this becomes a useful trick for running more interesting evolutionary experiments on ordinary hardware. If it cannot, then at least the failure should be informative.

Either way, it gives the genetic algorithm a schedule for progression.

And for this project, that might be exactly what it needs.