Neuroevolution and Wordle - Creating the Wordlists

An inspection of the original Javascript from the New York Times website tells us that there are 14,855 words the player may use as a guess. There are 2,309 words that are allowed to be the solution. They are typically not pluralised 4-letter nouns, and not 4-letter verbs with a 3rd person inflection. The solutions tend to be words any English speaker would know.

January 2028 will presumably be like y2k for word games, and there will be no more Wordle. Or perhaps the javascript loops round again. Yet another global supply chain crisis for which we don’t have all the answers.

Sizing of the Action Space

The action space for valid guesses in a Wordle is around 15k words. This presents a problem of scale, regarding machine learning and domestic hardware. Many of the words in the list turn out to be weird botany or Egyptology terms that aren’t widely known, and there are undoubtedly users who have played Wordle successfully without knowing what an aimag is.

The large action space, coupled with the fact that most of the actions are ‘bad’, causes machine learning challenges. Try picking 6 random words out of the list and playing them in today’s Wordle, and you will learn nothing about what works. We cannot jump straight in with Reinforcement Learning here.

Similarly, the Neural Net’s output should be considered. An early draft of the design called for an output layer of around 15k logits: far from ideal for domestic hardware.

Ideal Size for the Action Space

I have aimed for an action space of around 5k words. This seems sensible.

Data Sources

Apart from the NYT website, there is also a large amount of data taken from TV subtitles in American English, from the Department of Psychology at Ghent University. With a few bash one-liners to massage the data, we now have an action space of 4,739 words with the following properties:

  • All words in the ‘action space’ are in the ‘allowed guesses’;
  • All words in the ‘allowed solutions’ are in the ‘action space’;
  • The ‘action space’ is made up of commonly used words on American television.
$$ action\; space = (\; (5k\; common\; words) \cap (allowed\; guesses)\; )\; \cup (allowed\; solutions) $$

Summary

The action space is now 4,739 words, which are all valid Wordle guesses. They are commonly used words, and include all possible solutions to a Wordle.