In this post, we applied some regularization techniques: drop out and early stopping. The purpose is to prevent overfitting, in which the model fits the training data very well, but can not generalize to new data. When the model overfits, it has high accuracy on training set but low accuracy on test set. In addition to regularization techniques, we also mention how to save the model in order to use it later.
# First, load the model. Please check the previous post on how to save the MNIST data
import numpy as np
MNIST_data = np.load('/home/vietanh/data/MNIST/MNIST_data.npz')
train_data = MNIST_data['train_data']
train_labels = MNIST_data['train_labels']
validation_data = MNIST_data['validation_data']
validation_labels = MNIST_data['validation_labels']
test_data = MNIST_data['test_data']
test_labels = MNIST_data['test_labels']
print 'MNIST train data shape is: ', train_data.shape
print 'MNIST train data label shape is: ', train_labels.shape
print train_data[0,200:250]
We can see that there are 55000 training samples, every sample is an image of a number from 0 to 9. Each image is 28 pixels by 28 pixels. We can flatten this array into a vector of 28 x 28 = 784 numbers.
import tensorflow as tf
# Reset the graph
tf.reset_default_graph()
# Create new session
sess = tf.Session()
x = tf.placeholder(tf.float32, [None, 784],name ='x')
W = tf.Variable(tf.zeros([784, 10]),name = 'W')
b = tf.Variable(tf.zeros([10]),name ='b')
y = tf.nn.softmax(tf.matmul(x, W) + b, name = 'y')
# Drop out:
keep_prob = 1.0
y_drop_out = tf.nn.dropout(y,keep_prob, name = 'y_drop_out')
# Save model:
saver_best = tf.train.Saver(name='best_model')
# Early Stopping
epoch_early_stopping = 3
# y_ has shape [None,10] , which None is batch size, and 10 is label size.
# label is one hot vector, for example, if the number is 3 then the correspoding
# one hot vector is [0,0,0,1,0,0,0,0,0,0]
y_ = tf.placeholder(tf.float32, [None, 10], name = 'y_')
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_drop_out), reduction_indices=[1]))
train_op = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy, name = 'train_op')
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name = 'accuracy')
# Initialize the variables : sets the variable to its initial value
# In this case, variable W is set to be an zeros array with shape [784,10]
# variable b is an zeros array with shape [10]
init_op = tf.global_variables_initializer()
sess.run(init_op)
epoch = 100
batch_size = 5000
no_train_batch = len(train_labels)/batch_size
best_accuracy_dev = -5; best_epoch = 0
for i in range(epoch):
print '--------------'
print 'Epoch ', i
ptr = 0
for j in range(no_train_batch):
train_batch, train_label = train_data[ptr:ptr+batch_size,:],train_labels[ptr:ptr+batch_size,:]
feed_dict_train={x: train_batch, y_: train_label}
# We train the network by "run" the train_op, with input is declared in the feed_dict_train
sess.run(train_op,feed_dict_train)
accuracy_train = sess.run(accuracy, feed_dict_train)
ptr+=batch_size
# We measure the accuracy on the validation set every epoch
accuracy_dev = sess.run(accuracy, feed_dict={x: validation_data, y_: validation_labels})
print 'accuracy in dev set is', accuracy_dev
# Save the best epoch
if (accuracy_dev > best_accuracy_dev):
# Save the best model and the White noise related to this best model
saver_best.save(sess, '/home/vietanh/data/MNIST/model')
print 'Saved best model'
best_accuracy_dev = accuracy_dev
best_epoch = i
# Early stopping. If after a fixed number of epochs, the accuracy on the validation set
# not changed, then we stop.
if (i - best_epoch > epoch_early_stopping):
print 'Early Stopping'
# Break out from the for-loop.
break