Most machine learning algorithms are sensitive to class imbalances
of the training data and tend to behave inaccurately on classes represented
by only a few examples. The case of neural nets applied to speech
recognition is no exception, but this situation is unusual in the
sense that the neural nets here act as posterior probability estimators
and not as classifiers. Most remedies designed to handle the class
imbalance problem in classification invalidate the proof that justifies
the use of neural nets as posterior probability models. In this paper
we examine one of these, the training scheme called probabilistic
sampling, and show that it is fortunately still applicable. First,
we argue that theoretically it makes the net estimate scaled class-conditionals
instead of class posteriors, but for the hidden Markov model speech
recognition framework it causes no problems, and in fact fits it even
better. Second, we will carry out experiments to show the feasibility
of this training scheme. In the experiments we create and examine
a transition between the conventional and the class-based sampling,
knowing that in practice the conditions of the mathematical proofs
are unrealistic. The results show that the optimal performance can
indeed be attained somewhere in between, and is slightly better than
the scores obtained in the traditional way.