Back to the table of contents

Previous      Next

Neural network examples

This document gives examples for writing code to use the neural network class in Waffles. Let's begin with an overview of the most important classes:


  • A GNeuralNet contains one or more GLayer objects.
  • Each GLayer contains one or more GBlock objects.
  • Each GBlock represents a block of network units, or artificial neurons. There are many types of blocks: GBlockLinear contains linear units. GBlockTanh contains tanh units. Etc.
  • GNeuralNet is also a type of GBlock, so you can nest entire neural networks inside of neural networks.
  • GNeuralNetOptimizer is the base class for methods that train the weights of a GNeuralNet. There are several types of GNeuralNetOptimizer. For example, GSGDOptimizer trains the neural net with stochastic gradient descent.
  • GNeuralNetContext holds all of the context buffers that a thread needs to use (or train) a GNeuralNet in a thread-safe manner. (For example, two threads may feed a vector through the neural net without stomping over the activations of the other one.)
  • GNeuralNetLearner is a wrapper around GNeuralNet that implements GIncrementalLearner (which implements GSupervisedLearner). This class enables using a GNeuralNet in place of any other learning algorithm in Waffles.



Logistic regression

Logistic regression is fitting your data with a single layer of logistic units. Here are the #includes that we are going to need for this example:

	#include <GClasses/GActivation.h>
	#include <GClasses/GHolders.h>
	#include <GClasses/GMatrix.h>
	#include <GClasses/GNeuralNet.h>
	#include <GClasses/GRand.h>
	#include <GClasses/GTransform.h>

We are going to need some data, so let's load some data from an ARFF file. Let's use Iris, a well-known dataset for machine learning examples:

	GMatrix data;
	data.loadArff("iris.arff");

"data" is a 150x5 matrix. Next, we need to divide this data into a feature matrix (the inputs) and a label matrix (the outputs):

	GDataColSplitter cs(data, 1); // the "iris" dataset has only 1 column of "labels"
	const GMatrix& inputs = cs.features();
	const GMatrix& outputs = cs.labels();

"inputs" is a 150x4 matrix of real values, and "outputs" is a 150x1 matrix of categorical values. Neural networks typically only support continuous values, but the labels in the iris dataset are categorical, so we will convert them to use a real-valued representation (also known as a categorical distribution, a one-hot representation, or binarized representation):

	GNominalToCat nc;
	nc.train(outputs);
	GMatrix* pRealOutputs = nc.transformBatch(outputs);

pRealOutputs points to a 150x3 matrix of real values. Now, lets further divide our data into a training portion and a testing portion:

	GRand r(0);
	GDataRowSplitter rs(inputs, *pRealOutputs, r, 75);
	const GMatrix& trainingFeatures = rs.features1();
	const GMatrix& trainingLabels = rs.labels1();
	const GMatrix& testingFeatures = rs.features2();
	const GMatrix& testingLabels = rs.labels2();
Now, we are ready to train a layer of logistic units that takes 4 inputs and gives 3 outputs. The activation function is specified as a separate layer:

	GNeuralNet nn;
	nn.add(new GBlockLinear(3));
	nn.add(new GBlockLogistic());
To train our model, we will create an optimizer. We wil use stochastic gradient descent (SGD). We also set the learning rate here:
	GSGDOptimizer optimizer(nn);
	optimizer.setLearningRate(0.05);
	optimizer.optimizeWithValidation(trainingFeatures, trainingLabels);

Let's test our model to see how well it performs:

	double sse = nn.measureLoss(testingFeatures, testingLabels);
	double mse = sse / testingLabels.rows();
	double rmse = sqrt(mse);
	std::cout << "The root-mean-squared test error is " << to_str(rmse) << "\n";

Finally, don't forget to delete pRealOutputs:

	delete(pRealOutputs);

Or, preferably:

	std::unique_ptr<GMatrix> hRealOutputs(pRealOutputs);


Classification

The previous example was not actually very useful because root-mean-squared only tells us how poorly the neural network fit to our continuous representation of the data. It does not really tell us how accurate the neural network is for classifying this data. So, instead of transforming the data to meet the model, we can transform the model to meet the data. Specifically, we can use the GAutoFilter class to turn the neural network into a classifier:

	GNeuralNetLearner learner;
	learner.nn().add(new GBlockLinear(3));
	learner.nn().add(new GBlockLogistic());
	GAutoFilter af(&learner, false); // false means the auto-filter does not need to "delete(&nn)" when it is destroyed.

Now, we can train the auto-filter using the original data (with nominal labels). We no longer need to explicitly train the neural network because when we train the auto-filter, it trains the inner model.

	af.train(inputs, outputs);

The auto-filter automatically filters the data as needed for its inner model, but ultimately behaves as a model that can handle whatever types of data you have available. In this case, it turns a neural network into a classifier, since "outputs" contains 1 column of categorical values. So, now we can obtain the misclassification rate as follows:

	double mis = af.sumSquaredError(inputs, outputs);
	std::cout << "The model misclassified " << to_str(mis)  <<
		" out of " << to_str(outputs.rows()) << " instances.\n";

Why does a method named "sumSquaredError" return the total number of misclassifications? Because it uses Hamming distance for categorical labels, which reports a squared-error of 1 for each misclassification.

(Note that if you are training a big network with big data, then efficiency may be critical. In such cases, it is generally better to use the approach of transforming the data to meet the model, so that it does not waste a lot of time transformaing data during training.)



Adding layers

Layers are added in feed-forward order. The first layer added is the input layer. The last layer added is the output layer. It is common to alternate between layers with blocks that have weights (such as GBlockLinear) and a layer with blocks that introduce non-linearity (such as GBlockTanh). Example:

	nn.add(new GBlockLinear(1000));
	nn.add(new GBlockTanh);
	nn.add(new GBlockLinear(300));
	nn.add(new GBlockTanh);
	nn.add(new GBlockLinear(90));
	nn.add(new GBlockTanh);
	nn.add(new GBlockLinear(10));
	nn.add(new GBlockTanh);

The layers may be resized as needed when the enclosing neural network is resized to fit the training data.

A GBlockLinear just produces linear combinations of its inputs with no activation function. To use an activation function, add a layer with some nonlinear block. Some examples include: GBlockRectifiedLinear, GBlockSoftPlus, GBlockGaussian, GBlockBentIdentity.



Blocks with weights

There are several types of weighted blocks that you can use in your neural networks. These include:

  • GBlockLinear - A traditional fully-connected feed-forward block of network units.
  • GBlockRestrictedBoltzmannMachine: A restricted boltzmann machine block.
  • GBlockConvolutional1D - A 1-dimensional block of convolutional units.
  • GBlockConvolutional2D - A 2-dimensional block of convolutional units.
  • (Other layer types are currently under development...)


Stopping criteria

The GDifferentiableOptimizer::optimizeWithValidation method divides the training data into a training portion and a validation portion. The default uses 65% for training and 35% for validation. Suppose you wanted to change this ratio to 60/40. This would be done as follows:

	optimizer.optimizeWithValidation(features, labels, 0.4);

By default, training continues until validation accuracy does not improve by 0.2% over a window of 100 epochs. If you wanted to change this to 0.1% over a window of 10 epochs, then you could do this prior to calling optimizeWithValidation:

	optimizer.setImprovementThresh(0.001);
	optimizer.setWindowSize(10);
You can also train for a set number of epochs instead of using validation. For example, to optimize for 1000 epochs:
	optimizer.setEpochs(1000);
	optimizer.optimize(features, labels);
By default, optimize will run stochastically through the entire set of training samples each epoch. To use a mini-match instead, set the batch size and (optionally) the number of batches per epoch before calling optimize:
	optimizer.setBatchSize(25);
	optimizer.setBatchesPerEpoch(4);


Training incrementally

Sometimes, it is preferable to train your neural network incrementally, instead of simply calling the "optimizeWithValidation" method. For example, you might want to use a custom stopping criteria, you might want to report validation accuracy before each training epoch, you might want to decay the learning rate in a particular manner, etc. The following example shows how such things can be implemented:

	nn.beginIncrementalLearning(trainingFeatures.relation(), trainingLabels.relation());
	GRandomIndexIterator ii(trainingFeatures.rows(), nn.rand());
	for(size_t epoch = 0; epoch < total_epochs; epoch++)
	{
		// Report validation accuracy
		double rmse = sqrt(nn1.sumSquaredError(validateFeatures, validateLabels) / validateLabels.rows());
		std::cout << to_str(rmse) << "\n";
		std::cout.flush();
	
		// Train
		ii.reset();
		size_t index;
		while(ii.next(index))
		{
			optimizer.optimizeIncremental(trainingFeatures[index], trainingLabels[index]);
		}
	
		// Decay the learning rate
		optimizer.setLearningRate(optimizer.learningRate() * 0.98);
	}


Serialization

You can write your neural network to a file:

GDom doc;
doc.setRoot(nn.serialize(&doc));
doc.saveJson("my_neural_net.json");

Then, you can load it from the file again:

GDom doc;
doc.loadJson("my_neural_net.json");
GLearnerLoader ll;
GNeuralNet* pNN = (GNeuralNet*)ll.loadLearner(doc.root());


MNIST

A popular test for a neural network is the MNIST dataset. (Click here to download the data.) And, here is some code that trains a neural network with this data:

#include <iostream>
#include <cmath>
#include <GClasses/GApp.h>
#include <GClasses/GError.h>
#include <GClasses/GNeuralNet.h>
#include <GClasses/GActivation.h>
#include <GClasses/GTransform.h>
#include <GClasses/GVec.h>
#include <GClasses/GHolders.h>

using namespace GClasses;
using std::cerr;
using std::cout;

int main(int argc, char *argv[])
{
	// Load the data
	GMatrix train;
	train.loadArff("/somepath/data/mnist/train.arff");
	GMatrix test;
	test.loadArff("/somepath/data/mnist/test.arff");
	GMatrix rawTestLabels(test, 0, test.cols() - 1, test.rows(), 1);

	// Preprocess the data
	GDataPreprocessor dpFeat(train,
		0, 0, // rowStart, colStart
		train.rows(), train.cols() - 1, // rowCount, colCount
		false, false, true, // allowMissing, allowNominal, allowContinuous
		-1.0, 1.0); // minVal, maxVal
	dpFeat.add(test, 0, 0, test.rows(), test.cols() - 1);
	GDataPreprocessor dpLab(train,
		0, train.cols() - 1, // rowStart, colStart
		train.rows(), 1, // rowCount, colCount
		false, false, true, // allowMissing, allowNominal, allowContinuous
		-1.0, 1.0); // minVal, maxVal
	dpLab.add(test, 0, test.cols() - 1, test.rows(), 1);
	GMatrix& trainFeatures = dpFeat.get(0);
	GMatrix& trainLabels = dpLab.get(0);
	GMatrix& testFeatures = dpFeat.get(1);
	GMatrix& testLabels = dpLab.get(1);

	// Make a neural network
	GNeuralNet nn;
	nn.add( new GBlockLinear(80), new GBlockTanh(),
		new GBlockLinear(30), new GBlockTanh(),
		new GBlockLinear(10), new GBlockTanh());

	// Prepare for training
	GSGDOptimizer optimizer(nn);
	optimizer.setLearningRate(0.01);
	nn.init(trainFeatures.cols(), trainLabels.cols(), optimizer.rand());
	cout << "% Training patterns: " << to_str(trainFeatures.rows()) << "\n";
	cout << "% Testing patterns: " << to_str(testFeatures.rows()) << "\n";
	cout << "% Topology:\n";
	cout << nn.to_str("% ") << "\n";
	cout << "@RELATION neural_net_training\n";
	cout << "@ATTRIBUTE internal_rmse_train real\n";
	cout << "@ATTRIBUTE internal_rmse_test real\n";
	cout << "@ATTRIBUTE misclassification_rate real\n";
	cout << "@DATA\n";

	// Train
	GRandomIndexIterator ii(trainFeatures.rows(), optimizer.rand());
	for(size_t epoch = 0; epoch < 10; epoch++)
	{
		// Validate
		double sseTrain = nn.measureLoss(trainFeatures, trainLabels);
		cout << to_str(std::sqrt(sseTrain / trainFeatures.rows())) << ", ";
		double sseTest = nn.measureLoss(testFeatures, testLabels);
		cout << to_str(std::sqrt(sseTest / testFeatures.rows())) << ", ";
		double mis = nn.measureLoss(testFeatures, rawTestLabels);
		cout << to_str((double)mis / testFeatures.rows()) << "\n";
		cout.flush();

		// Do an epoch of training
		ii.reset();
		size_t index;
		while(ii.next(index))
			optimizer.optimizeIncremental(trainFeatures[index], trainLabels[index]);
	}
	return 0;
}

Here are the results that I get:

% Training patterns: 60000
% Testing patterns: 10000
% Topology: 784 -> 80 -> 30 -> 10
% Total weights: 65540
@RELATION neural_net_training
@ATTRIBUTE internal_rmse_train real
@ATTRIBUTE internal_rmse_test real
@ATTRIBUTE misclassification_rate real
@DATA
1.0062320115962, 1.0062882869588, 0.9235
0.35827917057482, 0.35285602819834, 0.0588
0.29920038633219, 0.30503380076365, 0.0455
0.29897590956784, 0.30469283251474, 0.0436
0.27005888129929, 0.28255127487034, 0.0382
0.26209564354629, 0.28000584880966, 0.039
0.25383945740352, 0.27075626507427, 0.0361
0.23419057786692, 0.25915069511568, 0.0335
0.23568715771943, 0.26541501669647, 0.036
0.23326108174385, 0.26323053581819, 0.0332

The right-most column shows that we get 332 misclassifications after about 2 minutes of training. You can get much better accuracy using bigger layers, but then training will take longer too.


Previous      Next

Back to the table of contents