Softmax function instead of Sigmoid in binary classification

Max16Neil report abuse

Hi. I am new to data science and image processing. I want to create CNN model with Keras for binary image classification.

I know that Sigmoid function is widely used in binary classification problems due to its output range (0 .. 1). But can I use the sigmoid function as final activation function instead?


Seb-at-Imaginghub report abuse

Hi @Max16Neil

For binary classification, it should give almost the same results, because softmax is a generalization of sigmoid for a larger number of classes. One difference may be in computation time, if you have a really large dataset. In this case, I would suggest you to use the old Sigmoid function.


Max16Neil report abuse

Thanks, got it. Actually, I have tried several different activation functions for the output layer of my neural network. This gave an understanding of why we should use sigmoid or softmax for classification functions instead of, for example, ReLu. ReLu is very good for hidden layers of neurons, but it cannot be used for the output layer in classification. Also, I was struggling with the problem of single class prediction when using softmax. I was really disappointed by the fact that my neural network always predicts the same class, disregarding the input. Eventually, I have figured out that I need to have 2 output neurons when using softmax in binary classification. And in general, one neuron for each class should be defined for multiclass classification with softmax. But I really detected that for binary classification it is better to use the sigmoid function as the final activation in many cases. Probably, my thought here will help someone who has similar tasks and issues.

Add Answer

Need support?

Just drop us an email to ... Show more