banner



What Do 50 Million Drawings Look Like

The Quick, Draw! Dataset

preview

The Quick Draw Dataset is a drove of 50 1000000 drawings across 345 categories, contributed by players of the game Quick, Describe!. The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to depict and in which country the player was located. You can browse the recognized drawings on quickdraw.withgoogle.com/data.

We're sharing them here for developers, researchers, and artists to explore, study, and learn from. If y'all create something with this dataset, please allow usa know by electronic mail or at A.I. Experiments.

Nosotros take also released a tutorial and model for training your ain cartoon classifier on tensorflow.org.

Delight go along in listen that while this collection of drawings was individually moderated, it may even so contain inappropriate content.

Content

  • The raw chastened dataset
  • Preprocessed dataset
  • Go the data
  • Projects using the dataset
  • Changes
  • License

The raw chastened dataset

The raw data is bachelor as ndjson files seperated by category, in the following format:

Key Type Description
key_id 64-bit unsigned integer A unique identifier beyond all drawings.
give-and-take string Category the player was prompted to draw.
recognized boolean Whether the word was recognized by the game.
timestamp datetime When the cartoon was created.
countrycode string A two letter land code (ISO 3166-1 blastoff-2) of where the player was located.
cartoon string A JSON array representing the vector drawing

Each line contains one drawing. Here's an instance of a unmarried cartoon:

{              "key_id":"5891796615823360"              ,              "give-and-take":"nose"              ,              "countrycode":"AE"              ,              "timestamp":"2017-03-01 20:41:36.70725 UTC"              ,              "recognized":true              ,              "drawing":[              [              [              129              ,              128              ,              129              ,              129              ,              130              ,              130              ,              131              ,              132              ,              132              ,              133              ,              133              ,              133              ,              133              ,...]              ]              ]              }            

The format of the drawing array is every bit following:

              [              [              // First stroke                            [              x0              ,              x1              ,              x2              ,              x3              ,              ...]              ,              [              y0              ,              y1              ,              y2              ,              y3              ,              ...]              ,              [              t0              ,              t1              ,              t2              ,              t3              ,              ...]              ]              ,              [              // Second stroke              [              x0              ,              x1              ,              x2              ,              x3              ,              ...]              ,              [              y0              ,              y1              ,              y2              ,              y3              ,              ...]              ,              [              t0              ,              t1              ,              t2              ,              t3              ,              ...]              ]              ,              ...              // Additional strokes              ]            

Where 10 and y are the pixel coordinates, and t is the time in milliseconds since the first signal. x and y are existent-valued while t is an integer. The raw drawings can have vastly different bounding boxes and number of points due to the unlike devices used for display and input.

Preprocessed dataset

Nosotros've preprocessed and split the dataset into dissimilar files and formats to brand it faster and easier to download and explore.

Simplified Cartoon files (.ndjson)

We've simplified the vectors, removed the timing information, and positioned and scaled the data into a 256x256 region. The information is exported in ndjson format with the same metadata as the raw format. The simplification process was:

  1. Align the drawing to the summit-left corner, to have minimum values of 0.
  2. Uniformly scale the drawing, to have a maximum value of 255.
  3. Resample all strokes with a ane pixel spacing.
  4. Simplify all strokes using the Ramer–Douglas–Peucker algorithm with an epsilon value of ii.0.

There is an example in examples/nodejs/simplified-parser.js showing how to read ndjson files in NodeJS.
Additionally, the examples/nodejs/ndjson.md document details a fix of command-line tools that can help explore subsets of these quite large files.

Binary files (.bin)

The simplified drawings and metadata are also available in a custom binary format for efficient pinch and loading.

There is an example in examples/binary_file_parser.py showing how to load the binary files in Python.
There is also an instance in examples/nodejs/binary-parser.js showing how to read the binary files in NodeJS.

Numpy bitmaps (.npy)

All the simplified drawings take been rendered into a 28x28 grayscale bitmap in numpy .npy format. The files tin be loaded with np.load(). These images were generated from the simplified information, simply are aligned to the centre of the drawing's bounding box rather than the superlative-left corner. Come across here for lawmaking snippet used for generation.

Get the information

The dataset is available on Google Deject Storage every bit ndjson files seperated past category. Encounter the listing of files in Cloud , or read more well-nigh accessing public datasets using other methods. As an case, to hands download all simplified drawings, one way is to run the control gsutil -grand cp 'gs://quickdraw_dataset/total/simplified/*.ndjson' .

Full dataset seperated by categories

  • Raw files (.ndjson)
  • Simplified drawings files (.ndjson)
  • Binary files (.bin)
  • Numpy bitmap files (.npy)

Sketch-RNN QuickDraw Dataset

This data is too used for training the Sketch-RNN model. An open up source, TensorFlow implementation of this model is available in the Magenta Project, (link to GitHub repo). You tin likewise read more about this model in this Google Inquiry web log post. The data is stored in compressed .npz files, in a format suitable for inputs into a recurrent neural network.

In this dataset, 75K samples (70K Training, two.5K Validation, 2.5K Test) has been randomly selected from each category, processed with RDP line simplification with an epsilon parameter of two.0. Each category will be stored in its ain .npz file, for example, cat.npz.

We have also provided the total information for each category, if you lot want to use more than 70K training examples. These are stored with the .full.npz extensions.

  • Numpy .npz files

Note: For Python3, loading the npz files using np.load(data_filepath, encoding='latin1', allow_pickle=True)

Instructions for converting Raw ndjson files to this npz format is available in this notebook.

Projects using the dataset

Hither are some projects and experiments that are using or featuring the dataset in interesting ways. Got something to add together? Permit us know!

Creative and artistic projects

  • Alphabetic character collages by Deborah Schmidt
  • Confront tracking experiment by Neil Mendoza
  • Faces of Humanity by Tortue
  • Infinite QuickDraw by kynd.info
  • Misfire.io past Matthew Collyer
  • Depict This by Dan Macnish
  • Scribbling Spoken language by Xinyue Yang
  • illustrAItion past Ling Chen
  • Dreaming of Electric Sheep past Dr. Ernesto Diaz-Aviles

Data analyses

  • How do you draw a circle? by Quartz
  • Forma Fluens by Mauro Martino, Hendrik Strobelt and Owen Cornec
  • How Long Does information technology Have to (Quick) Draw a Dog? past Jim Vallandingham
  • Finding bad flamingo drawings with recurrent neural networks by Colin Morris
  • Facets Dive x Quick, Draw! by People + AI Research Initiative (PAIR), Google
  • Exploring and Visualizing an Open Global Dataset past Google Enquiry
  • Automobile Learning for Visualization - Talk / article by Ian Johnson

Papers

  • A Neural Representation of Sketch Drawings by David Ha, Douglas Eck, ICLR 2022. lawmaking
  • Sketchmate: Deep hashing for meg-scale homo sketch retrieval by Peng Xu et al., CVPR 2022.
  • Multi-graph transformer for free-hand sketch recognition by Peng Xu, Chaitanya K Joshi, Xavier Bresson, ArXiv 2022. code
  • Deep Cocky-Supervised Representation Learning for Free-Hand Sketch past Peng Xu et al., ArXiv 2022. code
  • SketchTransfer: A Challenging New Task for Exploring Detail-Invariance and the Abstractions Learned by Deep Networks past Alex Lamb, Sherjil Ozair, Vikas Verma, David Ha, WACV 2022.
  • Deep Learning for Gratis-Mitt Sketch: A Survey past Peng Xu, ArXiv 2022.
  • A Novel Sketch Recognition Model based on Convolutional Neural Networks past Abdullah Talha Kabakus, 2nd International Congress on Human-Reckoner Interaction, Optimization and Robotic Applications, pp. 101-106, 2022.

Guides & Tutorials

  • TensorFlow tutorial for drawing classification
  • Train a model in tf.keras with Colab, and run it in the browser with TensorFlow.js past Zaid Alyafeai

Lawmaking and tools

  • Quick, Draw! Polymer Component & Information API past Nick Jonas
  • Quick, Draw for Processing by Cody Ben Lewis
  • Quick, Draw! prediction model by Keisuke Irie
  • Random sample tool by Learning statistics is crawly
  • SVG rendering in d3.js example past Ian Johnson (read more about the procedure here)
  • Sketch-RNN Classification past Payal Bajaj
  • quickdraw.js by Thomas Wagenaar
  • ~ Doodler ~ past Krishna Sri Somepalli
  • quickdraw Python API by Martin O'Hanlon
  • RealTime QuickDraw past Akshay Bahadur
  • DataFlow processing by Guillem Xercavins
  • QuickDrawGH Rhino Plugin past James Dalessandro

Changes

May 25, 2022: Updated Sketch-RNN QuickDraw dataset, created .total.npz complementary sets.

License

This data made available by Google, Inc. nether the Creative Commons Attribution iv.0 International license.

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
proper name The Quick, Draw! Dataset
alternateName Quick Draw Dataset
alternateName quickdraw-dataset
url
sameAs https://github.com/googlecreativelab/quickdraw-dataset
description The Quick Draw Dataset is a drove of 50 meg drawings across 345 categories, contributed by players of the game "Quick, Describe!". The drawings were captured equally timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located.\due north \n Example drawings: ![preview](https://raw.githubusercontent.com/googlecreativelab/quickdraw-dataset/primary/preview.jpg)
provider
property value
proper name Google
sameAs https://en.wikipedia.org/wiki/Google
license
property value
name CC BY 4.0
url

Source: https://github.com/googlecreativelab/quickdraw-dataset

Posted by: knottgrecond.blogspot.com

0 Response to "What Do 50 Million Drawings Look Like"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel