19 December 2017 (Paris, France) – Together with the progress made in the field of Artificial Intelligence, a new question arises: will A.I. algorithms manage to be as creative as humans are? Being that we consider art as being one of the most specific human abilities, it confers weight to the issue at hand: can Artificial Intelligence create “art”?
This past year as part of my AI program at ETH Zurich, I chose the approach of deconstructing the characteristics contained by a good photography collection and teaching them to an image processing Artificial Intelligence algorithm. As a source of photography I used Google’s Street View service, by retrieving various GPS points and processing the surroundings in the view line. I also used the Facebook photo archive.
The reason is that Google’s latest artificial intelligence experiment is taking in Street View imagery from Google Maps and transforming it into professional-grade photography through post-processing — all without a human touch. The chap behind this is Hui Fang, a software engineer on Google’s Machine Perception team, one of the team members I met in Paris today at one of the Google Imaging workshops. Ok, it was really a Xmas party hiding behind a workshop. But when Google invites you, you go.
The project uses machine learning techniques to train a deep neural network to scan thousands of Street View images in California for shots with impressive landscape potential. The software then “mimics the workflow of a professional photographer” to turn that imagery into an aesthetically pleasing panorama such as this:
This place does not exist. AI created it.
I am going get into this a bit more deeply next week when I publish 52 Incredible Things I Learned At Technology Conferences This Year: A Weekly Waltz Through 2017.
As a photographer, I have always thought that pretty soon cameras will be able to automatically adjust highlights, shadows, contrast, cast and saturation based on human aesthetic preferences. Controlling light and shadow is the trickiest thing for a photographer. It’s why manual controls are needed for a pro’s unique art. But it can be done:
And today I found out we are pretty damn close to that “automatic adjustment function” and we face the ugly downside: you can no longer trust what you see. Yes, the possibilities with AI are boundless. But because you can, it does not mean you should. But just like the technology for cyber attacks is now “off the shelf”, the imaging machine learning algorithms are developed using easily accessible materials and open-source code that anyone with a working knowledge of deep learning algorithms could put together.
As I wrote this year from NIPS, the largest conference for machine learning and AI which is attended by everybody who is anybody in AI or ML, quantification of image quality and aesthetics has been a long-standing problem in image processing and computer vision. While technical quality assessment deals with measuring pixel-level degradations such as noise, blur, compression artifacts, etc., aesthetic assessment captures semantic level characteristics associated with emotions and beauty in images. Recently, deep convolutional neural networks (CNNs) trained with human-labelled data have been used to address the subjective nature of image quality for specific classes of images, such as landscapes. However, these approaches can be limited in their scope, as they typically categorize images to two classes of low and high quality.
And now comes Google, introducing a deep CNN that is trained to predict which images a typical user would rate as looking good (technically) or attractive (aesthetically). It relies on the success of state-of-the-art deep object recognition networks, building on their ability to understand general categories of objects despite many variations. Their proposed network can be used to not only score images reliably and with high correlation to human perception, but also it is useful for a variety of labor intensive and subjective tasks such as intelligent photo editing, optimizing visual quality for increased user engagement, or minimizing perceived visual errors in an imaging pipeline.
You see where this is going. In a direct sense, the network (and others like it) can act as reasonable, though imperfect, proxies for human taste in photos and possibly videos and create … no humans required.