Double deep Q learning