The Free Connectionist Q-learning Java Framework is an Open Source Java library for developing simple or complicated learning systems. It can be used anywhere, where an action can be chosen, depending on the environment state, and where executing the action can be rewarded or punished.
No specialist knowledge is required. The framework module is small, easy-to-use and speeds up the development of your intelligent agents.
Check out the video below showing Boston Dynamics' 4-legged BigDog robot traversing a number of different surfaces. This is really amazing!
News
Recently i've got some support from one of my framework users and i changed the frame work a little. I also improved Wanderbot, so it learns faster and avoids stucking.
Before i started with q-learning, iI was fascinated with this work: http://citeseer.ist.psu.edu/schmidhuber90making.html But when i tried few times to implement this, it didn't work. Parhaps i didn't understand the algorithm, i don't know. Now thought of implementing something similar - a recurrent neural network that does both - try to make a model of the world (represented by weights in the neural network). The model would be used to predict what would happen in the next step. The second thing is to give the output for movement control, based on the model, that would give maximal reward. I'm not clearly presenting what i mean, am i?
The second thing (actually i would like to start with it) is to make two layers of q-learning. First layer would have one q-learned network and the second would have for example 4 q-learned networks, each one being able to choose between 6 actions. The first layer (supervising network) would choose, based on the input, which of the 4 nets would be choosing which action to perform. That would enable the agent to develop 4 behaviours and use them depending on the situation. For example - in soccer - 1. net would be good for fetching the ball, 2. would try to avoid enemy players when having the ball, 3. would be activated, when the goal is near, and 4. would be good for the defense. The supervising network (SN) and the behavioral networks (BN) would all have the same input - singlans from sensors. SN has 4 output actions and as was said BN has 6. That is only a example, but the result can be promising. I was using 1- and 2-tiered NN for the q-learning algorithm and in my experience 1-tiered is the best, but there are some complex situations in which 1-tier is not enough, so i started to think about the concept above with 2 separate q-network layers.