I don’t have as much experience with Keras, but I think the issue is that you are using biases, as well as weights initialized from a normal rather than uniform distribution. I changed the code to this:
model = Sequential([Dense(4, input_dim=16,bias=False, weights=[0.01*np.random.uniform(size=[16,4])])])
And it seems to be able to learn the task. I didn’t run it for the full 2000 episodes however, since it seemed to be taking a while on the laptop I was on. Hope that change solves your problem!