As you may know, the larger dataset you use as the input for an AI Deep Learning training process, the better the result. So, what would happen if you only have limited dataset? The thing is, you would get overfitting problem.
There are many approaches to augmenting data. The simplest approaches include adding noise and applying transformations on existing data. Imputation and dimensional reduction can be used to add samples in sparse areas of the dataset. More advanced approaches include simulation of data based on dynamic systems or evolutionary systems. Firstly, I’ll focus on the two simplest approaches: adding noise and applying transformations.
Regarding transformation, it helps you meet the requirement of huge dataset in case your original dataset is not enough. You don’t need to hunt for novel new images that can be added to your dataset. Why? Because, neural networks aren’t smart to begin with. For instance, a poorly trained neural network would think that these three tennis balls shown below, are distinct, unique images.
So, to get more data, we just need to make minor alterations to our existing dataset. Minor changes such as flips or translations or rotations. Our neural network would think these are distinct images anyway.
A convolutional neural network that can robustly classify objects even if its placed in different orientations is said to have the property called invariance. More specifically, a CNN can be invariant to translation, viewpoint, size orillumination (Or a combination of the above).
This essentially is the premise of data augmentation. In the real world scenario, we may have a dataset of images taken in a limited set of conditions. But, our target application may exist in a variety of conditions, such as different orientation, location, scale, brightness etc. We account for these situations by training our neural network with additional synthetically modified data.
Before we explore these techniques, for simplicity, let us make one assumption. The assumption is that, we don’t need to consider what lies beyond the image’s boundary. We’ll use the below techniques such that our assumption is valid.
What would happen if we use a technique that forces us to guess what lies beyond an image’s boundary? In this case, we need to interpolate some information. We’ll discuss this in detail after we cover the types of augmentation.
For each of these techniques, we also specify the factor by which the size of your dataset would get increased (aka. Data Augmentation Factor).
You can flip images horizontally and vertically. Some frameworks do not provide function for vertical flips. But, a vertical flip is equivalent to rotating an image by 180 degrees and then performing a horizontal flip. Below are examples for images that are flipped.
You can perform flips by using any of the following commands, from your favorite packages. Data Augmentation Factor = 2 to 4x
# NumPy.'img' = A single image. flip_1 = np.fliplr(img)
# TensorFlow. 'x' = A placeholder for an image. shape = [height, width, channels] x = tf.placeholder(dtype = tf.float32, shape = shape) flip_2 = tf.image.flip_up_down(x) flip_3 = tf.image.flip_left_right(x) flip_4 = tf.image.random_flip_up_down(x) flip_5 = tf.image.random_flip_left_right(x)
One key thing to note about this operation is that image dimensions may not be preserved after rotation. If your image is a square, rotating it at right angles will preserve the image size. If it’s a rectangle, rotating it by 180 degrees would preserve the size. Rotating the image by finer angles will also change the final image size. We’ll see how we can deal with this issue in the next section. Below are examples of square images rotated at right angles.
You can perform rotations by using any of the following commands, from your favorite packages. Data Augmentation Factor = 2 to 4x
# Placeholders: 'x' = A single image, 'y' = A batch of images # 'k' denotes the number of 90 degree anticlockwise rotations shape = [height, width, channels] x = tf.placeholder(dtype = tf.float32, shape = shape) rot_90 = tf.image.rot90(img, k=1) rot_180 = tf.image.rot90(img, k=2)
# To rotate in any angle. In the example below, 'angles' is in radians shape = [batch, height, width, 3] y = tf.placeholder(dtype = tf.float32, shape = shape) rot_tf_180 = tf.contrib.image.rotate(y, angles=3.1415)
# Scikit-Image. 'angle' = Degrees. 'img' = Input Image # For details about 'mode', checkout the interpolation section below. rot = skimage.transform.rotate(img, angle=45, mode='reflect')
The image can be scaled outward or inward. While scaling outward, the final image size will be larger than the original image size. Most image frameworks cut out a section from the new image, with size equal to the original image. We’ll deal with scaling inward in the next section, as it reduces the image size, forcing us to make assumptions about what lies beyond the boundary. Below are examples or images being scaled.
You can perform scaling by using the following commands, using scikit-image. Data Augmentation Factor = Arbitrary.
# Scikit Image. 'img' = Input Image, 'scale' = Scale factor # For details about 'mode', checkout the interpolation section below. scale_out = skimage.transform.rescale(img, scale=2.0, mode='constant') scale_in = skimage.transform.rescale(img, scale=0.5, mode='constant')
# Don't forget to crop the images back to the original size (for # scale_out)
Unlike scaling, we just randomly sample a section from the original image. We then resize this section to the original image size. This method is popularly known as random cropping. Below are examples of random cropping. If you look closely, you can notice the difference between this method and scaling.
You can perform random crops by using any the following command for TensorFlow. Data Augmentation Factor = Arbitrary.
# TensorFlow. 'x' = A placeholder for an image. original_size = [height, width, channels] x = tf.placeholder(dtype = tf.float32, shape = original_size)
# Use the following commands to perform random crops crop_size = [new_height, new_width, channels] seed = np.random.randint(1234) x = tf.random_crop(x, size = crop_size, seed = seed) output = tf.images.resize_images(x, size = original_size)
Translation just involves moving the image along the X or Y direction (or both). In the following example, we assume that the image has a black background beyond its boundary, and are translated appropriately. This method of augmentation is very useful as most objects can be located at almost anywhere in the image. This forces your convolutional neural network to look everywhere.
You can perform translations in TensorFlow by using the following commands. Data Augmentation Factor = Arbitrary.
# pad_left, pad_right, pad_top, pad_bottom denote the pixel # displacement. Set one of them to the desired value and rest to 0 shape = [batch, height, width, channels] x = tf.placeholder(dtype = tf.float32, shape = shape)
# We use two functions to get our desired augmentation
x = tf.image.pad_to_bounding_box(x, pad_top, pad_left, height + pad_bottom + pad_top, width + pad_right + pad_left)
output = tf.image.crop_to_bounding_box(x, pad_bottom, pad_right, height, width)
6. Gaussian Noise
Over-fitting usually happens when your neural network tries to learn high frequency features (patterns that occur a lot) that may not be useful. Gaussian noise, which has zero mean, essentially has data points in all frequencies, effectively distorting the high frequency features. This also means that lower frequency components (usually, your intended data) are also distorted, but your neural network can learn to look past that. Adding just the right amount of noise can enhance the learning capability.
A toned down version of this is the salt and pepper noise, which presents itself as random black and white pixels spread through the image. This is similar to the effect produced by adding Gaussian noise to an image, but may have a lower information distortion level.
You can add Gaussian noise to your image by using the following command, on TensorFlow. Data Augmentation Factor = 2x.
#TensorFlow. 'x' = A placeholder for an image. shape = [height, width, channels] x = tf.placeholder(dtype = tf.float32, shape = shape)
# Adding Gaussian noise noise = tf.random_normal(shape=tf.shape(x), mean=0.0, stddev=1.0, dtype=tf.float32) output = tf.add(x, noise)