I wanted to try an automatic picture recognition like Google Vision. So I fed it 1,705 pictures of MPs in Austria.
The idea is fairly simple: You send a picture through the API and it returns information about what is going on in this picture. Actually, a lot of information. The results contain key words about the possible content (what the program sees). If it thinks there is a face, it looks for clues about emotions like joy, sorrow or anger. Where is the nose, where are the eyes? In which direction is the person looking? Furthermore there is data about colors, possible links to search results online and safe search indicators.
As I used portraits of Austrian politicians, I was mainly interested in the key words the API would associate with them, as well as data about faces and colors. Long story short: It works, but it has its limits.
Emotions are recognised quite well (as far as I can tell based on a lot of testing, but I did not check every picture) and the color extraction works fine. It finds certain objects in the pictures, like beards or glasses, there is even a hat. Neckties however are only identified if the contrast is high enough, which means there are almost only false negatives in grayscale-pictures. Shadows in the wrong places are also your enemy - that's why there are women with moustaches.
The key words are a different story: Many of the suggestions simply don't fit. However, thinking about it, what is really to expect? The API gets a single picture that is a very classical (read: boring) portrait photo, no context or other information. How much can you ask?
Sometimes it's interesting to look at the results first and only then look at the picture - suddenly some key words make sense, for instance certain traditional clothes are identified as uniforms or women wearing bright jackets are automatically turned into doctors. In one (very old) portrait there is a sheet of music in the background, making the program think it sees a musician. Google Vision even labels some of the haircuts as afro, which is daring to say the least.
Besides the general critique, there are some insights to gain from the analysis: It's obvious how the convention to smile in pictures became more and more standard over time, also the moustaches get rarer (correlation, anyone?). No one looks angry or surprised, only a few look a little worried (did they all know what they were getting into?). And somehow modern fashion makes politicians look like financial advisors a lot of the time.
Some notes on the process
Not all faces are recognised, especially some very old photos with bad quality fail. As almost all of the pictures are portraits, the background is unicolor most of the time and not really interesting. Therefore I deleted the main color, usually being this background. Some key words were merged (politican and orator for instance) and some even deleted as they were not useful at all.
The result is far from being exact, there are many unclear and even plain wrong findings. To be fair (again), the API only returns key words with some sort of probability attached to them and I defined a rather low threshold of 0.5.
One final note on the pictures - which are clearly missing here. They can be found on the homepage of the Austrian Parliament. I decided against showing or linking to individual portraits because of the copyright. Even more I don't want to single out and make fun of anyone based on their picture and the results - only to test picture recognition.