The Tango Problem

Personal experiments with genetic algorithms and neuroevolution, in Rust, with the Bevy engine.

A number of "Psychics" are placed inside a 45x45 play area, and submitted to one of the following two challenges:

Get as close as possible to a "Beacon of Light" randomly placed in the space before the 100 turn limit is elapsed. Walls are disabled in this experiment.
Paint as many walls in pink as possible within the space before the 100 turn limit is elapsed. Walls are enabled in this experiment.

The Psychics do not "know" these objectives. All they are given is a series of inputs representing their senses, such as:

The angle between themselves and the Beacon of Light
The North-West-East-South direction in which the Beacon of Light is located
Which adjacent tiles are solid
Which tiles in range 2 of themselves are solid
Which adjacent tiles are painted walls
Which adjacent tiles are unpainted walls

A neural network receives these inputs as a vector of floating-point numbers (for example, "the Beacon is North of you" could look like [1.0, 0.0, 0.0, 0.0], "the Beacon is East of you" would be [0.0, 0.0, 1.0, 0.0]).

Then, snazzy things happen inside one or more hidden layer(s), and a vector of floating-point numbers comes out (such as [0.32, 0.55, 0.38, 0.85, 0.11]). These correspond to actions, such as [Move North, Move West, Move East, Move South, Paint Nearby Walls]. The highest number is the chosen action.

After each 100-turn sequence, each Psychic has their fitness calculated - how well they performed. Initially, this was only based on the objective, such as "the number of tiles separating a Psychic from the Beacon" or "the number of tiles painted." The best Psychics are selected through "roulette pick" (everyone has a chance to be chosen, that chance increases with the fitness score), some mutations are applied to the neural network's weights, and a new generation is born, which will hopefully fare better than the last.

But, this proved insufficient. I first observed the "Tango Problem" in the Beacon experiment - large numbers of Psychics would simply move down until they were on the same Y coordinate as the Beacon, then simply move up and down over and over.

Why was this happening? After some struggling, I figured it out.

A lucky Psychic would occasionally spawn on the same X level as the Beacon. Then, they "figured out" that they could just move down until the Beacon was reached, and start moving up and down over and over again. This was gloriously rewarded by the system - after all, they perfectly completed the task of reaching the Beacon and staying on it!

Other Psychics would start copying this behaviour - except these are not actually on the same X level of the Beacon, and as such would fail the test, forming a line passing through the Beacon across the entire level. Naturally, there would always be at least one lucky Psychic succeeding through this strategy and making this behaviour repeat itself.

My answer was to add, in the fitness function, a HUGE reward for Psychics who would dare to use more than just 2 different actions. This immediately solved the issue, as the "bob up and down and hope to be lucky" strategy could no longer result in the maximum possible fitness a Psychic could reach. Here is the result!

The painting experiment followed. It, too, experienced struggled related to the Tango Problem (which, I believe, is known as "getting stuck in a local maximum" in the literature): some Psychics would occasionally be very lucky and spawn next to many walls, which means they can just stay there, paint their surroundings, never move and still score 3-7 points without effort. Psychics that actually see an unpainted wall, move towards it and paint it are not rewarded as well.

Once again, I added extra objectives:

Do not stay in the same place for too long. (this resulted in Psychics bobbing up and down just like in the original Tango Problem and not doing anything else)
Receive a penalty for re-painting already painted walls (this resulted in Psychics being terrified of ever painting anything)
Receive a bonus for moving far away from the spawn location (this result in Psychics all rushing the bottom of the screen and painting it, leaving the rest of the level unpainted)

Significant tweaking may eventually produce even more performant results, but for now, a balance of all these objectives does succeed in achieving a "trickle down" strategy where all Psychics move towards the bottom of the screen, painting walls as they fall down. (Creatures being in walls at the bottom is purely a visual glitch - I am not quite sure how to handle getting just the right amount of Bevy entities in each simulation without causing major performance issues).

Overall, I am happy with how much I have learned from this mini-project. Studying genetic algorithms initially made it seem like they could handle anything with brute force and sufficient time, but I see now that the objectives, "sense" inputs and possible action outputs change everything. It is very easy for these algorithms to find something judged to be "good enough" and stick to it without much of an attempt to improve or innovate. I am aware a good algorithm should balance between exploitation and exploration, but the amount of tweaking required to reach this fine balance is truly a work of absolute precision and patience.

Rust is an interesting piece of technology. Its "fearless concurrency" was very useful in making this program able to BOTH display the results of a completed simulation and train more simulations in the background at the same time. As someone coming from dynamic, untyped languages like JavaScript and Python, it is truly astounding how many errors are prevented just through rust-analyzer's watchful eye. I am however disappointed by the constant long recompilation times at each added library, the rather poor step-by-step debugging support, and how rust-analyzer loves turning my CPU into a localized micro-sun.

"Evolution is war. The prey wishes to escape, the predator wishes to catch - one will inevitably suffer, from hunger or fang."

"When you step in a flowery meadow and frolick in joy through the bushes and butterflies... You are blind to the agonizing battlefield raging all around you. Each blade of grass strives to choke its neighbour, each insect would love to devour you whole had you been a hundredth of your size."

"Fu! And you say nature and its diversity is beautiful. With affection, you lick the hand which chokes you by the throat. We would laugh, had we not been feeling infinite pity for you in this moment."

The Games Foxes Play

(complete source code - rust source code | view all previous posts)

The advancements of this week are highly experimental and possibly insane. Given repeated troubles related to the player constantly breaking the smooth flow and design of the game, I am attempting the obvious: removing the player.

A grid of tiles contains a certain number of creatures. Each one operates as follows:

It receives some Senses, defined by the player, like the distance between itself and the yellow X Beacon, or the angle between itself and the Beacon.
The Senses pass through a simple neural network (only 1 hidden layer as to not turn my computer into a miniaturized sun)
The neural network chooses one among a list of Axioms, also defined by the player, and casts it. This could be moving one tile, or doing some cool ability.

These creatures are then tortured over and over again in my test chamber. Every 100 turns, each one is judged according to an efficiency score. The player defines the criteria, like "get as close to the Beacon as possible".

Through some genetic algorithm technomagic, all the creatures are returned to their start position, and their neural networks are tweaked to look more like those of the best performers, with some random mutations thrown in there for science.

Eventually, the creatures "learn" how to complete the objective defined by the player! I believe the buzzword for this technique is "neuroevolution".

In order for things to remain smooth, the game window actually makes you watch a re-enactment of a simulation that already happened while the game is simulating hundreds of 100-turn sequences in the background. This means that even if the CPU is choking through the neural networks, there is no visible FPS drop, only a mild slowdown of the training speed. Thank you, Rust, for this so-called "fearless concurrency".

While I find this educational and entertaining, the complexity of this design is ramping up very fast. I am currently trying to set up a paintball minigame of sorts where two factions of creatures compete to paint the map in their colour. I may fail and give up, or I may succeed and be inspired. We will see.

Videos:

An oddly satisfying anthill.

A tango line caused by a poorly thought out efficiency score formula. (walls disabled)

A well-trained population. (walls disabled)

The Tango Problem

The Tango Problem

Genetic Trials

The Games Foxes Play