Header Title
AI meets art, and builds business
AI is a moniker currently much bandied around in management-speak. It becomes wholly more comprehensible and concrete in down-to-earth operational settings.
Algorithms meet art
Manually re-drawing just one static image into a different artistic style has traditionally required serious artistic and artisanal skill, and almost always takes a prohibitively long time – and that’s what people use Photoshop for. Doing this same thing for a video sequence, featuring countless continually moving pixels and points, would therefore require a Herculean effort, oodles of artistic manpower and vast resources.
As in so many manpower-intensive processes in industry and commerce, digitalisation seems to provide new possibilities here. In April 2016, Manuel Ruder, Alexey Dosovitskiy and Thomas Brox at the University of Freiburg in Germany submitted a computer science paper entitled “Artistic style transfer for videos“, presenting a new algorithm-based method for transferring the artistic style from one image (a painting, for example) to an entire film sequence. Their new method (the details of which lie far beyond my comprehension) apparently uses deep neural networks to generate consistent, stable stylised video sequences, even in films that feature considerable amounts of motion and where items in the 3D space block other objects from view. The authors reckon the proposed method clearly outperforms any previous methods, both qualitatively and quantitatively.
This all sounds very abstruse and nerdy, but it quickly becomes much clearer if you can see what these AI agility algorithms can actually do, transferring and replicating an artistic style from a particular picture into a whole film sequence.
It seems that deep neural networks can be built as consisting of multiple layers that each extract information from an image, and then pass on the leftover data to the next layer. The first layers extract broad patterns such as colour, while deeper subsequent layers extract progressively more detail, which – according to expert explanations – pave the way to effective object recognition. This whole approach established that it’s possible to capture artistic style by looking at digitally identifiable correlations between layers, rather than the information in each layer. The big point is that this means the content of an image can be completely separated from the artistic style. That’s how any particular style can be copy-and-pasted into the content, artistic technique and colour profile of any other image.
The next step on this journey of computer science discovery lay in images in motion. But even small differences between successive frames can lead to majorly big differences in the way the desired artistic style is applied, as well as whole new image areas coming into view after they’d been covered by something else in the foreground in previous frames. How is a computer to deal with that, without giving rise to flickering, jarring and rough transitions?
The redoubtable researchers found a solution by introducing a temporal constraint that “penalises” deviation between two frames, meaning it considers the flow of the original video, in order to preserve smooth transitions. To improve the method’s consistency over longer periods of time, they also incorporated long-term motion estimates and developed a multi-pass algorithm to work around any “bumps” created during style transfer. Clever stuff, eh?
Putting out a conceptual shingle
It’s not too hard to imagine that these styles could be edited or combined to produce mind-boggling hybrids as well as adding new visual depths and textures to moving images. Apparently, this technique is still fairly computationally intensive, but that’s only a temporary hindrance/bottleneck.
One early showcase is the Deep Slow Flow project in which videos made using this kind of neural network algorithms are presented. This video from artist Danil Krivoruchko, in which slow-motion scenes from New York are reimagined in swirling brushstrokes and vibrant colours, is an example.
If it’s to be more than just a techno-geek academic achievement, or a “fashion of the month” artistic fad, this technique probably needs to be conceptualised more clearly for mainstream consumption. It seems to have potential to become much more than just “Photoshop for video”. Which is almost certainly why Google has a team called Magenta working on this kind of thing, and a startup named CreativeAI keeps track of projects that use computer code to mould music, imagery and other artistic expressions.
In my naive, non-expert perceptions, this technique seems to provide a possibly powerful “third path” for moving images. Path number one is film material that’s (relatively) undoctored, where we can more or less reckon that there’s some kind of unmediated correlation between what’s seen and what happened. Path number two consists of an increasing volume of film material that relies to varying degrees on CGI effects and/or computer animation – what we see (or think we see) never existed and never happened.
Artistic style transfer seems to provide a new, luscious sense of depth and texture to moving images – beguiling because they’re brimming full of details never seen in real life. Can this perhaps become a third (curated, artistic, enrichening) choice to real or fake?
Style becoming a commodity?
All this seems to herald an entirely new approach to filmmaking but also to art. When artistic style becomes a digital commodity that can be cut and pasted from one image to another, what does that portend for the work of artists? How will it impact the artistic claim to uniqueness, and the whole legacy body of copyright?
But as with many disruptive new technologies, there’s no point in wailing about impact on restrictive legacy structures. We quickly get used to them, and they become part of the mainstream experience. Photoshopping of still images takes place far more than most of us mere mortals will ever realise, and “artistic style transfer” (there has to be a better, less geeky name in the pipeline …) really only does the same for moving images, albeit with x degrees of additional complexity.
Art/AI becoming business?
There seem to be some sophisticated, high-level business opportunities lurking here, particularly in light of the proliferation of visual media streams all around us. One example is at Clarifai Inc., which claims to be able to understand any and every image and video, using the world’s best visual recognition technology. Kinda’ sounds like the reality of what spy movies try to show us that the surveillance spooks can already do. Another group (see here) seems to be using the same/similar technology for navigation by autonomous drones.
The artists, communicators and entrepreneurs of the world have just been given what seems to be a very powerful new tool. It’ll be interesting to see whether this technique takes off – and where it goes.