Essay,
Pages 6 (1416 words)

Views

12

Humans have used drawing to depict the ocular world since earliest times. Sketching has been the fundamental method of depiction of ideas and objects. Our aim is to tap those free-flowing curves and enable a machine to identify the drawings. Sketch-rnn, a recurrent neural network (RNN) is capable of sketching stroke-based drawings of common objects. The model is trained on a data set of human-drawn images showing various different classes. It outlines a structure for conditional and unconditional sketch generation, and shows new fast training methods for generating similar drawings in a vector format.

In recent times, there has been major development in producing and modeling of images employing neural networks as a generative tool. Generative Adversarial Networks (GANs)[1], Variational Inference and Autoregressive models[2] have been instrumental in this rapidly growing field. Almost all of the work thus far has been focused towards modeling low resolution, pixel images. Humans, however, do not interpret the world as a framework of pixels, but rather evolve abstract concepts to represent what we see.

From an early age, we inculcate the ability to interpret what we see by drawing on paper with a pencil or crayon. In this way we learn to reveal a sequential, vector depiction of an image as a short sequence of strokes. In this paper we investigate another traditional pixel image modeling approaches, and present a generative model for vector images. The goal is to train machines to draw and figure out abstract concepts in a manner similar to humans.

The paper highlights a framework for both unconditional and conditional generation of vector images composed of a sequence of lines. The recurrent neural network-based generative model is capable to of constructing sketches of common objects in a vector format. It develops a training method distinctive to vector images to make the training more fast and accurate. In the conditional generation model, it has been investigated that the latent space is produced by the model to represent a vector image. The paper also discusses potential creative applications of our methodology.

The dataset are vector drawings obtained from Quick, Draw!, an online game where the players are needed to draw objects belonging to a particular object class in less than 20 seconds. QuickDraw includes of hundreds of classes of common objects. Each label of QuickDraw is a dataset of 70K training samples, in addition to 2.5K validation and 2.5K test samples. It uses data format that represents a sketch as a set of pen stroke actions. This description is an extension of the format used. The format extends the binary pen stroke event into a multi-state event. In the given data format, the early absolute coordinate of the drawing is located at the origin. A sketch is a list of points, and each point is a vector consisting of 5 elements: (x, y, p1, p2, p3). The first two elements are the offset distance in the x and y directions of the pen from the earlier point. The last 3 elements show a binary one-hot vector of 3 possible states. The first pen state, p1, indicates that the pen is currently touching the paper, and that a line will be drawn connecting the next point with the current point. The second pen state, p2, indicates that the pen will be lifted from the paper after the current point, and that no line will be drawn next. The final pen state, p3, indicates that the drawing has ended, and subsequent points, including the current point, will not be rendered.

Our training procedure follows the approach of the Variational Autoencoder, where the loss function is the addition of two terms: the Reconstruction Loss, LR, and the Kullback-Leibler Divergence Loss, LKL.

We train our model to optimize this two-part loss function. The Reconstruction loss term, maximizes the log-likelihood of the generated probability distribution to explain the training data S. We can calculate this reconstruction loss, LR, using the generated parameters of the PDF and the training data S. LR is composed of the sum of the log loss of the offset terms (x, y), Ls, and the log loss of the pen state terms (p1, p2, p3), Lp.

We evolve a system to model sketch drawings using recurrent neural networks. Sketch-rnn is able to produce possible ways to finish an existing, but unfinished sketch drawing. Our model can also encode existing sketches into a latent vector, and generate similar looking sketches conditioned on the latent space. This helps in mapping sketches under the same label and also eliminating outlier data. The model often encounters noisy data which may have to be classified separately or discounted from modeling under the same label. By generating similar looking sketches and completing vectors, it has a high prediction accuracy.

Neural Network-based techniques have been created for generative models of pictures, although many of neural network-related research on image generation deal with pixel images [1,2]. There has been comparatively less work done on vector image generation using neural networks. A previous work [3] makes use of Hidden Markov Models to synthesize lines and curves of a human sketch. Current work [4] on handwriting generation with Recurrent Neural Networks puts the groundwork for employing Mixture Density Networks [1] to produce continuous data points. Current works of this approach attempted to generate vectorized Kanji characters unconditionally [12] and subject to [8] modeling Chinese characters as a list of pen stroke actions.

In addition to unconditionally generating sketches, we also explore encoding existing sketches into a latent space of embedding vectors. Previous work [2] outlined a methodology to combine Sequence-to-Sequence models with a Variational Autoencoder to model natural English sentences in latent vector space. A related work [9], utilizes probabilistic program induction, rather than neural networks, to perform one-shot modeling of the Omniglot dataset containing images of symbols.

RNN-VAE is a generative model that improves the RNN-AE to capture the global feature of the input sentence. The model is a Sequence-to-Sequence Variational Autoencoder (VAE), similar to the architecture described in [2, 11].

The general perspective is widely homogeneous to current effort on the computer vision community in which huge, unconditional databases of visual phenomena are used to train recognition systems. Highprofile examples of this include the Caltech-256 database of object drawing [Griffin et al. 2007], the SUN database of scenes [Xiao et al. 2010], and the LabelMe [Russell et al. 2008] and Pascal VOC [Everingham et al. 2010] databases of spatially annotated objects in scenes. The considerable effort that goes into building these databases has allowed techniques to learn increasingly effective classifiers and to compare recognition systems on common benchmarks. Our pipeline is similar to many modern computer vision algorithms although we are working in a new domain which requires a new, carefully tailored representation.

In this work, we develop a strategy to model sketch drawings using recurrent neural networks. Sketch-rnn is able to generate possible ways to finish an existing, but unfinished sketch drawing. The model can also encode existing sketches into a latent vector, and generate similar looking sketches conditioned on the latent space. It depicts what it means to insert between two different sketches by adding between its latent space, and also show that we can manipulate attributes of a sketch by augmenting the latent space. We show the importance of enforcing a prior distribution on the latent vector for coherent vector image generation during adding. By making available a large dataset of sketch drawings, we hope to encourage further research and development in the area of generative vector image modeling. These methods aid in improving the accuracy of the model and making it more reliable.

- C. M. Bishop. Mixture density networks. Technical Report, 1994.
- S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jezefowicz, and S. Bengio. Generating Sentences from a Continuous Space. CoRR, abs/1511.06349, 2015.
- H. Dong, P. Neekhara, C. Wu, and Y. Guo. Unsupervised Image-to-Image Translation with Generative
- Adversarial Networks. ArXiv e-prints, Jan. 2017.
- M. Eitz, J. Hays, and M. Alexa. How Do Humans Sketch Objects? ACM Trans. Graph. (Proc. SIGGRAPH),
- I. Goodfellow. NIPS 2016 Tutorial: Generative Adversarial Networks. ArXiv e-prints, Dec. 2017.
- A. Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850, 2013.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Imageto-Image Translation with Conditional Adversarial Networks. ArXiv e-prints, Nov. 2016.
- J. Jongejan, H. Rowley, T. Kawashima, J. Kim, and N. Fox-Gieg. The Quick, Draw! – A.I. Experiment. draw.with google.com/, 2016.
- Recurrent Neural Network-Based Semantic Variational Autoencoder.
- Story-based CALL for Japanese Kanji Characters: A Study on Student Learning Motivation.

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment