An agent in this test should learn to play simple board games on its own. The developer of the agent is unaware of the rules, board characteristics and the goal of a game. These things are to be discovered by the agent using reward signals. Estimated complexity of the test for humans – 6-8 years old.
Before the game, the agent receives a description of the playing field in terms of the board size and the initial location of his and opponent's pieces.
At the beginning of each move, an agent is trying to move a piece in some way, indicating the ID of the piece and coordinates of the target square. If such a move cannot be made for any reason (either it is prohibited by the rules or the target square is occupied), the platform returns a response with error “Wrong move”. The agent repeats multiple trials to make a correct move until the platform returns "The move is made", then the agent is considered to have made a move. Next, the platform makes the move of the opponent’s side and returns positions of the pieces after the move.
As the game progresses, the agent receives intermediate rewards if the platform believes that the agent's move improves the game situation. At the end of the game, the agent receives a message about winning or losing. In one test the agent must play five games in a row, transfering the gained knowledge from the previous game to the next.
A testing session consists of some number of games, for example, 10 different games, each of which the agent must play 5 times. Games for each session are randomly selected from a total set of 50 games. Thus, the composition of the test session is always unique which makes it ineffective for the developer to tune the agent's algorithm for a specific test set.
The number of points per session is calculated as the number of wins and the sum of the awards received during all games, with certain coefficients.
Game 1. “Occupy the opposite row”:
The goal of the game is to occupy as many positions in the opposite row with own pieces as possible and to do it before the opponent. The game continues until all the pieces are placed on the opposite row, but not more than a certain number of moves. A piece can move to any adjacent square, including diagonally. It is prohibited to occupy a square occupied by another piece.
Game 2. “Capture your enemy”:
The goal of the game is to capture as many opponent’s pieces as possible, keeping own pieces. An agent can capture a piece by jumping to its square to the right, left, up or down, but not diagonally. A piece can move to any adjacent square, including diagonally. The winner is the one who has more pieces left after a certain number of moves.