The goal of an agent in the test is to learn to recognize an action based on two pictures, “before” and “after”. Estimated complexity of the test for humans is 6-7 years old.
The test consist of several steps, on each step an agent is presented with two pictures which display some situation as it develops in time.
Picture 1 displays a ball falling on some surface
After showing a two-part picture the platform asks an agent “What is it?”. The agent has either to name a recognized action or say “I don’t know”. In the first case the agent is given one point if the name of action was correct. If the agent’s guess was incorrect, the platform responds with “No, it is <right action>”.
One test session consists of some number of different actions, say 20 actions distributed over 50 steps. In each session actions and their variations are randomly sampled from the overall set containing 100 actions, so an agent is never given a chance to pass the same session twice. This eliminates the possibility of a developer to fine tune the agent’s algorithm on a concrete test set.
Here is a 10-step demo run of the test
Step 1:
Platform: What is it?
Agent: I don’t know
Platform: it’s falling
Step 2:
Platform: What is it?
Agent: I don’t know
Platform: it’s toppling
Step 3:
Platform: What is it?
Agent: It's falling
Platform: correct!
Step 4:
Platform: What is it?
Agent: It's falling
Platform: no, it's bouncing
Step 5:
Platform: What is it?
Agent: It's toppling
Platform: correct!
Step 6:
Platform: What is it?
Agent: I don't know
Platform: it's explosion
Step 7:
Platform: What is it?
Agent: It's bouncing
Platform: correct!
Step 8:
Platform: What is it?
Agent: It's explosion
Platform: correct!
Step 9:
Platform: What is it?
Agent: I don't know
Platform: it's rolling down
Step 10:
Platform: What is it?
Agent: I don't know
Platform: it's recovery from explosion
In this session the agent got 4 points, as it correctly guessed an action 4 times. The goals is to gain the maximum number of points