Well, I would say that this is neither supervised nor unsupervised learning. This is rather a very practical domain of neural networks exploitation. Briefly, you need to use one of the popular and powerful pretrained neural networks. When you feed an image to the already trained neural network, on each layer it generates different features of the image based on the chosen activation function, weights, regularization settings, etc. So, if you take one neural network and feed the image with the car to it, then on some layer N you should obtain some features, for example, wheels, doors, windows of the car, etc. And if you then take the other image, where the bird is depicted, and feed this image into the absolutely identical neural network and look at the same layer N of this layer, then you will notice such features generated as wings, beak, feathers, etc. The features would be represented as numbers, of course. But by comparing these numbers you will be able to say if it is the same image or different. It is also obvious, that you can measure the similarity between images by comparing the distance between feature vectors. The distance between the black Porsche and red Mercedes will be smaller than between the black Porsche and the lovely countryside landscape.