I’m glad you’ve enjoyed the article. If I understand you correctly, you’d like to be able to classify sounds that are similar, but from multiple different sources? Depending on the number of possibilities, you will need that many output units. So, if you’d like to classify 10 sounds, you will need an output layer with 10 units. It should also be a layer with a softmax nonlinearity, so that the activation of all the units sums to 1.0
If on the other hand you want to classify how similar any sound is to a given “control” sound, then you should indeed use one unit. You should design your loss function as a logistic loss that encourages a high score for “correct” sounds, and a low score for all other sounds.
I hope that clarifies things some.