Hi! Help me please to clarify it. Why we can not use square error as cost function for logistic regression and neural networks?
If our cost function J(theta) = ½ mean(h(x) - y)2 and h(x) is linear function: h(x) = theta0 + theta1 * x, than J(theta) = ½ mean(theta0 + theta1 * x - y). We have proof that such a function is convex, because the second derivative is positive. For sigmoid hypothesis function this is not true. So it turns out that the cost function will not be convex and there is a chance to get stuck in local optima.