Using TensorFlow to Train a CNN Model to Recognize Rock-Paper-Scissors

In this tutorial, we are going to learn how to use the VIA Pixetto to
distinguish between the gestures for “Rock, Paper, Scissors.” Along
with using the VIA Pixetto’s machine learning tool, we will write
Python code in order to train a CNN model that the VIA Pixetto can
implement to correctly detect the specific hand shapes required for
the game.

Step 1

First, enter the machine learning accelerator platform, log in to your
account, and click “Python”.

Under “Notebook”, click on “Python 3” file. This enables us to write Python programs via a notebook.

Step 2

Next, we have to perform the most difficult part of “writing Python
code”, which can be divided into the following steps:

  1. Importing packages
  2. Importing data
  3. Data visualization
  4. Data processing
  5. Model training
  6. Outputting the model

1. Importing packages

The kits we will use this time include:
• tensorflow (2.2.0)
• tensorflow-addons (0.11.2)
• tensorflow_datasets (4.0.1+nightly)
• matplotlib (3.1.2)
• numpy (1.18.5)

import tensorflow as tf  
import tensorflow_datasets as tfds  
import tensorflow_addons as tfa  
%matplotlib inline  
import matplotlib.pyplot as plt  
import numpy as np  
import gc

2. Importing data

When the package is imported, we can use the rock_paper_scissors
data set in tensorflow_datasets. Use tfds.load() to import and save the dataset and the dataset information into “dataset” and “dataset_info” respectively.

dataset, dataset_info = tfds.load(name='rock_paper_scissors',                                                                                data_dir='tmp', with_info=True,as_supervised=True)  
dataset_train = dataset['train'] dataset_test = dataset['test'] 

The parameter “with_info” refers to allows us to view the details for the rock_paper_scissors data set later, and “as_supervised” will return a 2-tuple structure rather than a dictionary, which is more complex to work with.

It’s possible for us to view the details of the rock_paper_scissors data set:

We can see that the data set is composed of (300, 300, 3) picture files. This means that the image is composed of 300×300 pixels, and uses 3 colour channels (RGB). The training data set has 2,520 pictures, and the test data set has 372 pictures. There are 3 categories in total: rock, paper, and scissors.

train_size = dataset_info.splits['train'].num_examples  
test_size = dataset_info.splits['test'].num_examples  
dataset_classes = dataset_info.features['label'].num_classes  
print('dataset name:',  
print('train dataset:', dataset_train)  
print('test dataset:', dataset_test)  
print('train dataset size:', train_size)  
print('test dataset size:', test_size)  
print('number of classes in train and test dataset:', dataset_classes,
print('shape of images in train and test dataset:', dataset_info.features['image'].shape)

3. Data visualization

We can use plot_image(n) to display the Nth picture, and you can change the value for n to change which picture to display:

def plot_image(n=1):
  for image, label in dataset_train.take(n):  
        image = image.numpy()  
        label = label.numpy()  

  image_label = dataset_info.features['label'].int2str(label)  


We can also look at multiple images at a time, as a dataset. For this
example, we will look at 5, but we can change this if we want by
changing the second argument of the function. We will also use this
visualization method after we do some data processing.

def plot_dataset(dataset, num=5):  
    plt.figure(figsize=(15, 15))  
    plot_index = 0  
 for image, label in dataset.take(num):  
        image = image.numpy()  
        label = label.numpy()  

        image_label = dataset_info.features['label'].int2str(label)  

        plt.subplot(3, 5, plot_index)  

plot_dataset(dataset_train, 5)

4. Data Processing

To reduce the amount of processing required for training, we can
change the size of the images. This involves reducing the resolution to make them easier to process, the same way the resolution of a video stream may drop so there is less buffering. We will also set the batch size to 64. This is the amount of images that the model will process before it updates its parameters.

batch_size = 64  
image_size = 64  

def format_image(image, label):  
    image = tf.cast(image, tf.float32)  
    image = tf.image.resize(image, (image_size, image_size))  
    image /= 255  
 return image, label  

dataset_train =  
dataset_test =  

# Explore preprocessed training dataset images.

Next, we enter perhaps the most important stage: picture preprocessing.
First, we should make a main picture processing function, which can be divided into the following parts:

  • Image transposition: randomly transpose the image
  • Picture flip: randomly flip the picture up down, left and right
  • Picture rotation: randomly rotate pictures 0-360 degrees
  • Picture color adjustment: randomly change picture saturation, brightness, contrast and hue
  • Image color inversion: randomly invert the image color. In order to increase the generalization ability of the machine learning model, we add a little noise (20%) to avoid overfitting
  • Image zoom: zoom into the image randomly
def image_transpose(image):  
    rand = tf.random.uniform(shape=[], minval=0.0, maxval=1.0, dtype=tf.float32)   
    image = tf.cond(rand < 0.5,   
 lambda: tf.identity(image),   
 lambda: tf.image.transpose(image))   
 return image  

def image_flip(image: tf.Tensor) -> tf.Tensor:  
    image = tf.image.random_flip_left_right(image)  
    image = tf.image.random_flip_up_down(image)  
 return image  

def image_rotate(image):  
    image = tf.image.rot90(image, tf.random.uniform(shape=[], minval=0, maxval=4,
    rand = tf.random.uniform(shape=[], minval=0.0, maxval=1.0, dtype=tf.float32)   
 def random_rotate(image):  
        image = tfa.image.rotate(  
image, tf.random.uniform(shape=[], minval=0 * np.pi / 180, maxval=360 * np.pi / 180,
 return image  

    image = tf.cond(rand < 0.5,   
 lambda: tf.identity(image),   
 lambda: random_rotate(image))   
 return image    

def image_color(image: tf.Tensor) -> tf.Tensor:  
    image = tf.image.random_saturation(image, lower=0.5, upper=3)  
    image = tf.image.random_brightness(image, max_delta=0.2)  
    image = tf.image.random_contrast(image, lower=0.8, upper=1)  
    image = tf.image.random_hue(image, max_delta=0.03)  
    image = tf.clip_by_value(image, clip_value_min=0, clip_value_max=1)  
 return image
def image_inversion(image: tf.Tensor) -> tf.Tensor:       rand = tf.random.uniform(shape=[], minval=0.0, maxval=1.0, dtype=tf.float32)  
    image = tf.cond(rand < 0.8,   
 lambda: tf.identity(image),    lambda: tf.math.add(tf.math.multiply(image, -1), 1))   return image  
def image_zoom(image: tf.Tensor, min_zoom=0.8, max_zoom=1.0) -> tf.Tensor:       image_width, image_height, image_colors = image.shape  
    crop_size = (image_width, image_height)  

 # Generate crop settings, ranging from a 1% to 20% crop.     scales = list(np.arange(min_zoom, max_zoom, 0.01))  
    boxes = np.zeros((len(scales), 4))  

 for i, scale in enumerate(scales):           x1 = y1 = 0.5 - (0.5 * scale)  
        x2 = y2 = 0.5 + (0.5 * scale)  
        boxes[i] = [x1, y1, x2, y2]  

 def random_crop(img):   # Create different crops for an image         crops = tf.image.crop_and_resize(  
            [img], boxes=boxes, box_indices=np.zeros(len(scales)),               crop_size=crop_size  
 # Return a random crop
 return crops[tf.random.uniform(shape=[],
minval=0, maxval=len(scales), dtype=tf.int32)]  
    choice = tf.random.uniform(shape=[], minval=0., maxval=1., dtype=tf.float32)  

 # Only apply cropping 50% of the time return tf.cond(choice < 0.5, lambda: image, lambda: random_crop(image))  

After all the functions are unified, the entire data set can be processed using the map() function.

def augment_data(image, label):  
    image = image_flip(image)  
    image = image_color(image)  
    image = image_zoom(image)  
    image = image_transpose(image)  
    image = image_inversion(image)  
    image = image_rotate(image)  
 return image, label  

dataset_train_augmented =  


The images depicted in the above screenshot will be different to what we might get when we repeat this part, as there are many random aspects to the preprocessing.

5. Model training

Before formally training the model, we must scramble the data to
increase the randomness of the sample. The previous batch size is set to 64, and buffer_size is set automatically using

dataset_train_batches = dataset_train_augmented.shuffle(  
    buffer_size=train_size).batch(batch_size=batch_size). prefetch(   dataset_test_batches = dataset_test.batch(batch_size)  
print(dataset_train_batches)   print(dataset_test_batches) 

Finally, we build the CNN model.
The different layers of the CNN model are listed below:

• tf.keras.layers.Convolution2D()
• tf.keras.layers.MaxPooling2D()
• tf.keras.layers.Dense()
• tf.keras.layers.Flatten()
• tf.keras.layers.Dropout()

model = tf.keras.Sequential([  
    tf.keras.layers.Convolution2D(input_shape=(image_size, image_size, 3), filters=64, kernel_
size=3, activation='relu'), 
    tf.keras.layers.Convolution2D(input_shape=(image_size, image_size, 3), filters=64, kernel_
size=3, activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),  
    tf.keras.layers.Convolution2D(input_shape=(image_size, image_size, 3), filters=128, kernel_
size=3, activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),  
    tf.keras.layers.Convolution2D(input_shape=(image_size, image_size, 3), filters=128, kernel_
size=3, activation='relu'),  
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),  
    tf.keras.layers.Dense(units=512, activation=tf.keras.activations.relu),  
    tf.keras.layers.Dense(units=dataset_classes, activation=tf.keras.activations.softmax)  

We can use model.summary() to view the complete CNN model. In
addition to presenting different layers of neural networks, you can also view the total parameters; for this we have 2,621,507.

Now we can start the optimization process.

The optimizer we will use is RMSprop. After setting it up, we can add
the model to the compiler.
Here, we use the size of the training data set to control the speed of
our model training. In addition, we also calculate the test rounds for
verification training.
It is worth noting that we have a function for stopping early. This is so that if val_accuracy does not improve after training for more than 5 rounds, the model will stop training.

rmsprop_optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)  

              loss='sparse_categorical_crossentropy',                 metrics=['accuracy'])  
steps_per_epoch = train_size // batch_size  
validation_steps = test_size // batch_size  

print('steps_per_epoch:', steps_per_epoch)  
print('validation_steps:', validation_steps)  
early_stopping = tf.keras.callbacks.EarlyStopping(patience=5, monitor='val_

Now we can start the training process. Remember that it may not finish all of the epochs due to the “early_stopping” function:

training_history =,  

We can also observe the data loss and data accuracy over time.

def plot_training_history(training_history):  
    loss = training_history.history['loss']     val_loss = training_history.history['val_loss']
    accuracy = training_history.history['accuracy']     val_accuracy = training_history.history['val_accuracy']
    plt.figure(figsize=(18, 6))  

    plt.subplot(1, 2, 1)  
    plt.title('Training and Test Loss')  
    plt.plot(loss, label='Training set')  
    plt.plot(val_loss, label='Test set', linestyle='--')  
    plt.grid(linestyle='--', linewidth=1, alpha=0.5)  
    plt.subplot(1, 2, 2)  
    plt.title('Training and Test Accuracy')  
    plt.plot(accuracy, label='Training set')  
    plt.plot(val_accuracy, label='Test set', linestyle='--')  
    plt.grid(linestyle='--', linewidth=1, alpha=0.5)  


The chart on the left shows that the training loss reached its lowest point at round 15 and coincided with the test loss. The accuracy of the chart on the right also coincides with the accuracy of the training loss.

Finally, we are able to evaluate the final accuaracy of the model:

train_loss, train_accuracy = model.evaluate(dataset_train.batch(batch_size).
test_loss, test_accuracy = model.evaluate(dataset_test.batch(batch_size).

print('Training Loss: ', train_loss)  
print('Training Accuracy: ', train_accuracy)  
print('Test Loss: ', test_loss)  
print('Test Accuracy: ', test_accuracy)

6. Outputting the Model

After finishing the training, we can save the CNN model. Tflite files compatible with the VIA Pixetto are accessed here, and we can see that two models are used here. Briefly, here are the the differences:

  • “model.tflite” is the most primitive model, because it has 10.5MB so we have to shrink it
  • “quant_model.tflite” is a reduced version of model.tflite, which uses the Quantization algorithm to reduce the data format and achieves the storage memory optimization effect within the acceptable range. Here the file size is changed to 2.63MB
converter = tf.lite.TFLiteConverter.from_keras_model(model)  
tflite_model = converter.convert()  
with open('model.tflite', 'wb') as f:  

converter = tf.lite.TFLiteConverter.from_keras_model(model)  
converter.optimizations = [tf.lite.Optimize.DEFAULT]  
tflite_quant_model = converter.convert()  
with open('quant_model.tflite', 'wb') as f:  

The files will be saved back into our MLS notebook area:

Now we can download quant_model.tflite to our computer and prepare to upload it to the VIA Pixetto.

Step 3

First, connect the VIA Pixetto to your PC using a Micro USB 2.0 cable. When the green, blue, and red LEDs are lit, it means the VIA Pixetto is successfully connected.

Step 4

The steps to upload Tflite to VIA Pixetto are similar to the previous tutorials, but this time we will use the “Neural Network Identification” function. First, open VIA Utility and select “Neural Network Identification” in the function area.

Next, select the “model path” below and upload “quant_model.tflite”.
Remember to click the OK button when finished.

After the upload is complete, we need to change the label name.
Click “Label Edit” in “Tools” in the upper left corner, and enter rock,
paper, and scissors at indexes 0, 1, and 2.

We should now be able to point the VIA Pixetto at specific hand
gestures and have it recognize if it’s a rock, paper, or scissors.

Congratulations, we are done! Don’t forget to share your projects with us on social media using #VIAPixetto!

Share this blog post!

Share on linkedin
Share on twitter
Share on facebook

Leave a Reply