Computer Algorithms for Cartoon Style Graphics

Jacob Settlemyre

As computer graphi­­­­cs hardware and rendering techniques become more and more sophisticated, much of the general public sees them as opportunities to create animations and still images more close to life than anything seen before. However, the appeal of traditionally drawn cartoons is one that is unlikely to fade, and with the demand for high production rate, or throughput, in the animation industry increasing, many professionals see a place for these technologies in the traditional animation and drawing production cycle, or pipeline (D. Sýkora et al. 2011).  To that end, graphics researchers have worked to develop computer based techniques that ease the production of traditional, hand-drawn cartoon effects without making the result look overtly artificial. While most tasks in computer science can be ranked objectively by metrics such as performance or polygon count in the case of graphics, cartoon rendering techniques can only be evaluated by having a human compare them to more traditional techniques (T. Inglis et al. 2013). As a result, progress in this field has been slow moving, but the few examples that have been developed are all the more impressive. Some examples of techniques that have been developed to varying success include methods for mapping textures to hand drawn animation, morphing computer generated animations to look more natural,  distorting hand drawn 2D images to animate them, and converting between pixel art and traditional vector based art styles. These advances are the beginning steps in progress towards a wholly computer based creative process.

In an ideal world, computers would be involved in every step of the animation process. However, most artists prefer to work with traditional methods when originally drawing frames of their animations. With this in mind, computers can be used to map textures to hand drawn images, and then distort the textures to move with the frames of the animation. An artist can perform the creative work to draw the outline of a character and the abstract texture that should be applied to them, while a computer is capable of performing the complicated matrix operations needed to distort that texture. Unfortunately, it is difficult for a computer to tell which parts of one frame correlate to the same part as the other, points known as correspondences (A. Ogale and Y. Aloimonos. 2005). Normally for this technique, an artist would have to change their entire production pipeline and create their art in the form of polygonal meshes, much like 3D art, which can be an insurmountable burden for artist accustomed to traditional methods. A team of researchers from the Czech Technical University, Walt Disney Animations studios, and other associated organizations led by Daniel Sýkora have presented a method they call “TexToons” which is supposedly capable of determining correspondences between frames of a hand drawn 2D animation and using that information to map simple textures to the objects in frame (D. Sýkora et al. 2011). With this technique, the artist must simple fill in “color scribbles” in corresponding parts of the frames, and the algorithm will automatically fill and map the texture to the rest of the image (D. Sýkora et al. 2009b). The algorithm is able to determine correspondences between frames by noting that the edges of a cartoon image are much more meaningful in terms of defining the shape than the insides. In terms of performance, this technique is significantly faster than any attempt by an artist to recreate it manually, and the effort put in by the artist themselves after drawing the frames and the basic texture is negligible. When reviewed by a professional artist, for many scenarios, the results make the original line art animation look much more lifelike, making the results very promising.

It has been noted that the technique is not effective when used with textures that have a high visual salience. For example, if one were to use the technique to map a checkerboard pattern onto an animated character’s clothing, it would be immediately apparent that the texture was not applied by a professional artist. Therefore, while this technique may be helpful in some cases, its actual application in an animation pipeline will likely be limited. Additionally, if an artist tried to use this technique in combination with more traditional texturing methods, the results would likely not look very similar. The most likely use case of this technique will be for creating characters that are specifically stylized in a different manner from the rest of the image. In these cases, this technique will massively reduce production time and produce a higher quality work than traditional methods.

An artist may also find it convenient to have a computer help create the animation frames themselves. While traditional cartoon animation require that a team of artists draw each and every frame individually (J. Bray and W. Carlson. 1919), modern, computer-based techniques allow artists to draw key-frames during important moments in the animation, and allow the computer to interpolate between them. The downside of this technique is that it requires that the artist creates their work in an application specifically designed for key-frame based animation, such as Adobe Flash (Adobe. 2010). A team from Trinity College Dublin again led by Daniel Sýkora has presented a technique that would allow an artist to draw key-frames arbitrarily and still have the computer automatically interpolate them (D. Sýkora et al. 2009a). The technique is specifically designed for articulating the limbs of a humanoid figure, although it can be used for other applications where similar distortions are needed. The advancement here again comes from being able to detect correspondences between frames of an animation. While the researchers would have hoped to do this using traditional computer vision algorithms, the vast majority of those algorithms are designed for processing real world images which do not contain the sort of distortions of form present in a cartoon animation. Instead, they developed a new approach specifically designed for cartoons which works by registering shape data common to both images. It starts with a simple mesh which is then mapped to the shape of the cartoon using an algorithm. Once the mesh has been formed on both key-frames, it is trivial to interpolate between them.

The technique is very impressive for animating cartoon images of human figures, as was the intended use case. However, the research team did note that some resolution detail is lost during the transformation which could be problematic for many scenarios where a high level of detail is necessary. Additionally, the algorithm gets “confused” when an element it is trying to track gets occluded behind another one. It is also not able to process 3D transformations including simple scaling and shearing. While these limitations could potentially be worked around, further research has not been conducted, and similar to TexToons, this algorithm is likely only going to be useful in extremely specific circumstances. In those cases, however, the increase in productivity will be very significant.

Although computers have been able to produce animated images automatically for a long time through “tweening” techniques, one common complaint is that the resultant animation often appears too artificial. For example, if a computer uses a simple linear interpolation for its “tweening” algorithm, the animation will have sudden jumps which are uncharacteristic of real world movement. Even if one applies a function with a smoother second or higher derivative, the animation will still be too exacting to adequately mimic a human artist’s style. For example, a human artist will often exaggerate motion in order to emphasize it or else to make up for the innate artificiality of cartoon animation in an attempt to make the movement appear more lifelike. These techniques are styles viewers have come to expect from animation, so when they are missing the effect can be quite jarring. Of course, a professional artist with sufficient time and skill could recreate the effect on a computer frame by frame. However, this technique would be time consuming, and not applicable to arbitrary animations.

In an attempt to make this task more automatic, a team from Microsoft Research and its partners lead by Jue Wang have developed an algorithm called “the cartoon animation filter” that can be applied to any arbitrary animation to create that natural effect without requiring extensive skill from the creator (J. Wang. 2006). The filter is able to distort not just the animation, but also the image being animated itself, stretching and squeezing it in an amount proportional to the second derivative of the animation element’s motion. The animation itself is mapped to a Laplace transform of a Gaussian curve, making it smoother than a linear interpolation while capturing the exaggeration of a traditional animation (E. Weisstein; J. Wang. 2006). The only parameter the user of the filter needs to configure is a coefficient for the height of the Gaussian curve, which determined the extent by which the animation will be exaggerated. Because of this simplicity, and because the filter can be applied to any arbitrary animation, it should be possible to implement this technique in situations where it would not be practical to use a professional artist. The authors cite the example of a PowerPoint presentation as a natural use case for this filter. The filter also has the advantage of being simple to calculate. This means it is possible for it to have applications in real time rendering scenarios, such as the animated elements of a video game. Of course, while the technique is very impressive and clearly applicable to many areas, the results will not always have the same exactitude of a skilled artist.

When creating graphics for video games in the early days of computer graphics artists had to deal with limited resources in terms of the number of colors and pixels early computers were able to render.  As such, it was important to use every pixel available to its fullest in an attempt to convey as much detail as possible in a very small space. To people who have grown up with these sorts of video games, as well as others, this style, referred to as pixel art, is very appealing for its minimalistic qualities (N. Esposito. 2005). Even though modern, vector based styles consist of much smoother lines and are able to contain essentially infinitely more detail in a given area, many people still wish to recreate the pixel art style. However, for artists who are accustomed to the modern style, it can be difficult to change mindsets to work with a pixel art style.

Additionally, traditional computer systems designed to automatically rasterize vector art often fail when rendering a low resolution as they are not designed to account for pixel level detail, causing them to leave out important features. A research team from the University of Waterloo lead by Tiffiany Inglis has presented a method designed specifically for rasterizing vector art into a pixel art style (T. Inglis et al. 2013). First, rather than blindly mapping the original art into an arbitrary grid, it distorts the image slightly so important features fit nicely into the squares of the pixel grid.

Another important element of pixel art is being able to draw diagonal and curved lines properly, so the algorithms includes preprogrammed patterns for dealing with instances of diagonal lines. It adds further human-style touches to the final piece by moving pixels around to better preserve symmetry and to ensure that large clusters of black pixels near one another do not create an unpleasant effect. When rasterizing a series of images in an animation, it was noted that this technique had more temporal coherence than others. That is, when the same element was featured at different locations in different frames, it was rendered in the same way, rather than changing as the figure moves.  The results are comparable to a professional pixel artist’s work, although it was noted that it does not contain the regular repeating pixel patterns that appear in human work. While the algorithm is several orders of magnitude faster than a human artist, it takes about ten times longer than traditional algorithms, making it impractical for real time rendering environments. Despite these limitations, the algorithm has been put in use in an editor designed specifically for creating pixel art from scratch. Although this does not fit the original intended use case of automatically converting vector art to pixel art, it is nevertheless a powerful and useful application of this new technology.

A team from Microsoft Research faced similar issues to the Waterloo team when attempting to develop an algorithm to do just the opposite: convert pixel art into a smooth, vector-based rendition (J. Kopf and D. Lischinski. 2011). While some people prefer the pixel art style, many others desire to play older video games that use pixel art in a more modern, smoothed-out art style which may be more appealing to the eye. However, traditional vectorization algorithms are designed to process real world photographs of much higher resolutions than pixel art, and are therefore designed to ignore pixel sized variations as outliers, producing undesirable effects when applied to pixel art where every pixel is significant. Artists working on converting pixel art from one game to a style suited for a more powerful system have developed algorithms designed to convert pixel art to a higher resolution, such as the original EPX algorithm developed by Eric Johnson at LucasArts (N. Berry. 2013). These techniques still just produce higher resolution raster graphics, often at an even multiple of the original image size, rather than vectorized output which can be rendered at any arbitrary resolution. The algorithm designed by the Microsoft Team divides the original image into cells before applying a similarity graph to determine which pixels meeting at a diagonal are meant to be connected. It then uses this data to smooth out the vector curves, thus creating a more natural effect. The results are much more true to the original spirit of the art than other vectorization techniques and are superior to other algorithms designed for pixel art in terms of representation of natural curves. The results are not always perfect and sometimes lead to details being blurred together or large splotches of black being formed where one would expect lines to be meeting at a point. Additionally, the algorithm is too slow to render images in real time, preventing the most obvious use case of automatically vectorizing images outputted from a video game to display to the end user. However, as computer hardware becomes more powerful, it is possible that this algorithm may become viable for that application at some point in the future.

Although these few results seem to show promise for the future of the industry, the sad state of the field is that most academics are uninterested in developing these sorts of techniques. Aside from the work from a few dedicated enthusiasts, the majority of research in this areas has and will come from animation and game development studios looking for faster and cheaper ways to produce their works. Therefore, especially for studios such as Disney that have traditionally always used hand-drawn animations, the increase in research indicates not an academic interest, but a sign of the changing paradigm in the animation industry. In the professional world, humans are becoming increasingly reliant on computers for every-day, integral parts of the workflow, tasks once thought never possible for a computer to accomplish.

Works Cited

Abhijit S. Ogale and Yiannis Aloimonos. 2005. “Shape and the stereo correspondence problem.” International Journal of Computer Vision, vol. 65, no. 3, 147-162.

Adobe Systems Inc. 2010. “Flash glossary: Tween.” From Adobe Developer Connection. http://www.adobe.com/devnet/flash/articles/concept_tween.html

Daniel Sýkora, John Dingliana, and Steven Collins. 2009a. “As-rigid-as-possible image registration for hand-drawn cartoon animations.” In Proceedings of the 7th International Symposium on Non-Photorealistic Animation and Rendering (NPAR ’09), Stephen N. Spencer (Ed.). ACM, New York, NY, USA, 25-33.

Daniel Sýkora, John Dingliana, and Steven Collins. 2009b. “LazyBrush: Flexible Painting Tool for Hand-drawn Cartoons.” Computer Graphics Forum, vol. 28, no. 2, 599-608.

Daniel Sýkora, Mirela Ben-Chen, Martin Čadík, Brian Whited, and Maryann Simmons. 2011. “TexToons: practical texture mapping for hand-drawn cartoon animations.”  In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Non-Photorealistic Animation and Rendering (NPAR ’11), Stephen N. Spencer (Ed.). ACM, New York, NY, USA, 75-84.

Eric W. Weistein. “Laplace Transform.” From Mathworld—A Wolfram Web Resource http://mathworld.wolfram.com/LaplaceTransform.html

Johannes Kopf and Dani Lischinski. 2011. “Depixelizing pixel art.” In ACM SIGGRAPH 2011 papers (SIGGRAPH ’11), Hugues Hoppe (Ed.). ACM, New York, NY, USA, Article 99, 8 pages.

John Randolph Bray and Wallace Carlson. 1919. How Animated Cartoons are Made [Documentary]. United States.

Jue Wang, Steven M. Drucker, Maneesh Agrawala, and Michael F. Cohen. 2006. “The cartoon animation filter.” ACM Trans. Graph. 25, 3 (July 2006), 1169-1173.

Nick Berry. 2013. “Pixel Scalers.” From Data Genetics. http://www.datagenetics.com/blog/december32013/index.html

Nicolas Esposito. 2005. “How Video Game History Shows Us Why Video Game Nostalgia Is So Important Now.” University of Technology of Compiègne.

Tiffany C. Inglis, Daniel Vogel, and Craig S. Kaplan. 2013. “Rasterizing and antialiasing vector line art in the pixel art style.” In Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR ’13), Stephen N. Spencer (Ed.). ACM, New York, NY, USA, 25-32.