It seems ReLU is often used as an activation function in neural networks? But could not quite understand why. What advantages does in have compared to regular Linear Activation?
This is a quite vague question, I would recommend reading more about activation functions and ReLU in particular. But here is a short answer: Linear activation function gives you linear relationship between input and output, and you generally want your model to be non-linear. You can achieve this with several other activation functions, including ReLU.