Thanks, got it. Actually, I have tried several different activation functions for the output layer of my neural network. This gave an understanding of why we should use sigmoid or softmax for classification functions instead of, for example, ReLu. ReLu is very good for hidden layers of neurons, but it cannot be used for the output layer in classification.
Also, I was struggling with the problem of single class prediction when using softmax. I was really disappointed by the fact that my neural network always predicts the same class, disregarding the input. Eventually, I have figured out that I need to have 2 output neurons when using softmax in binary classification. And in general, one neuron for each class should be defined for multiclass classification with softmax. But I really detected that for binary classification it is better to use the sigmoid function as the final activation in many cases.
Probably, my thought here will help someone who has similar tasks and issues.