These are good questions. I think that because of the nature of the Q-update, I would expect that the values closer to the goal are more accurate, and the values further away are less so. This is because the values are updated recursively backward from the reward-providing state.
As far as the path not being “optimal” the stochasticity of the gridworld as well as the dangers of the pits both change what would be the optimal path somewhat. It is also the case that depending on the amount of exploration the agent is performing, it may not arrive at the absolute optimal path, but rather a local optima which achieves “good enough” reward.