Back to the table of contents|
This file describes the most-significant changes in each major release.
No new major features--just a lot of performance work, bug fixes, and general code polishing.
Separated the automatic data filtering from the supervised learners. Now, use the GAutoFilter class for that functionality. Improved neural net documentation. Added support for CUDA GPU-parallelized neural network layers. Added several regularization method for neural networks, including drop-out, drop-connect, max-norm, L2, and L1 regularization. Added a convolutional neural network layers for 1D and 2D inputs. Added a method to initialize a neural network with the fast Fourier transform to model time-series data, amd added the fourier_net demo. Added a method to pretrain neural network layers by stacking autoencoders. Added softmax layers. Add support for heterogeneous activation functions within each neural network layer. Added a rectified linear activation function. Added the hybrid NLPCA, content-based filter, and content-boosted collaborative filter recommendation systems. Improved Mac support. Port to VC++2013. Drop support for VC++2010. target 64-bit builds on Windows by default. Improved CSV parsing.
Changed the license from LGPL to CC0. Added classes for stackable autoencoders and restricted boltzmann machines. Polished up the GBayesianNetwork class and add examples and unit tests. Added support for CMake. Made the build process also support clang, and be more mac-friendly. Simplified some important classes, including GMatrix and GNeuralNet. Enforced const correctness in more places. Nixed most uses of smart pointers. Made all learning algorithms thread-safe. Added thread-parallelism to several ensemble methods. Added support for binary division trees. Added some common activation functions. Added a tool to generate a vector of meta statistics about a dataset. Added several small-but-useful tools. Simplified the docs and web site.
Added a tool to regress arbitrary formulas to fit to a dataset. Use SVG format for most plots instead of PNG. Add support for grid of plots to GSVG class. Drop dependency on libpng and libz for the GClasses library. (It now only depends on standard libraries.) Add a tool to normalize all of the row-vectors in a dataset. Add a wavelet transform class. Improve compatability with R data formats. Fix some bugs and strengthen unit tests.
Added linear assignment solver and bipartite matching. Added a Gaussian Process model. Added a reservoid network model. Added a method to compute principal components with sparse matrices. Better support for const objects. Improved internal classes, such as the tokenizer and the to_str methods. Added Mersenne twister class. Switched from Subversion to Git for our code repository. Added feature to print random forests. Added a dynamic-survey demo app. Added Makefile support for lcov/gcov. Added ability to build a minimal version of the GClasses library with no dependencies and little cruft. Added multi-layer-perceptron weight averaging ensemble. Added a class for making SVG plots. Ported to VC++ 2010. (Dropped support for VC++ 2008.) Strengthened unit tests, rewrote some classes that were growing cob-webs, made some documentation improvements, and of course, fixed several bugs.
Added Bayesian Model Averaging ensembles. Added Bayesian Model Combination ensembles. Added AdaBoost. Added bipartite matching. Redesign Self-organaizing map. Added Wagging for multi-layer perceptrons. Redesign the socket classes. Some performance improvements. Port to FreeBSD. Port to g++ 4.6.
Added a tool to auto-tune the parameters of supervised learners. Added an app for sparse matrix learning tools. Added an app for dimensionality reduction tools. Added an app for tools that process audio signals. Added BASH command-completion for all of the apps. Added singular-value-decomposition to the sparse matrix class. Improved the performance of our LLE implementation. Added the matrix factorization and nonlinear PCA collaborative filters. Added bagging ensembles of collaborative filters. Added options to print confusion matrices. Added the fuzzy k-means clustering algorithm. Enabled k-means to utilize custom distance metrics. Switch to JSON for our serialization format (instead of TWT). Improved the wizard tool. Merged the GSup and GClasses libraries. Added a random forest class. Added automatic missing-value imputation to supervised learners. Added calibration for all supervised learning algorithms that can predict distributions.
Added Win64 as a build target. Made all learning models support multi-dimensional labels. Made learning algorithms automatically handle data type-conversion. (It is no longer necessary to wrap them in an appropriate filter.) Improved the wizard tool. (It is now web-based, and provides default values.) Add support to train k-NN with sparse matrices. Significantly improved the documentation. Added some new activation functions. Added a simple machine learning demo. Added LU matrix decomposition. Brought back the jumper demo. Some performance improvements. Added the NeuroPCA algorithm. Improved the way command-line tools display usage information. Added a recommender system command-line tool with a few collaborative filtering algorithms. Added the security demo app. Improved standards compliance. Removed several redundant and not-very-useful classes. Redesigned learning interfaces to be more developer-friendly. Added several new unit tests.
Fixed issues with OSX compatibility. Added lots of comments to make the API docs more complete. Made GDecisionTree more conformant with the random forest algorithm. Added a recommender system demo. Added a demo for unsupervised back-prop. Updated GNeuralNet to support the dynamic addition and removal of nodes. Also added support for custom activation functions. Added a switch to the learn tool so you can specify which attributes to use for labels, and which attributes to ignore. Threw out GQueue, GAVLTree, and other classes that were redundant with the STL. (Also threw out everything that depended on these classes, such as the path-search algorithms and GRelationalTable. Sorry if you were using those--I think this is good churn, though.) Moved all the classes into a GClasses namespace. Separated all of the classes not related to machine learning into a supplemental library called GSup. (That means you now have to link to two libraries, GClasses and GSup, to get all of my functionality. Again, sorry if this breaks your code, but I think this is good churn.) Renamed the "categorize" filter to "nominaltocat". Improved the plot overview tool wrt nominal features. Made GData::determinant more efficient. Added clarifications to the licenses of the demo apps. Restructured the demo apps to be entirely contained in a single folder. (This makes them easier to clone.) Rewrote GRand to be inherently 64-bit based. Also improved 64-bit compliance with GData, GTwt and other classes. Added a tool for aligning data. Added an uninstall option to the Makefile for Linux. Merged all of the Windows .sln files into a single monolithic solution that builds everything (to simplify testing). Added the hello_console and hello_web demo apps. Rewrote the interpolation demo app to show more interpolation and to support more algorithms. Added random and brute-force optimizers for baseline comparisons. Added a linear programming solver and a linear regression learner.
Some cosmetic changes to make the docs and some tools more friendly.
Added a graphical wizard tool to help build waffles command-line commands. Added a class for the Extended Kalman Filter. Added a sparse matrix class. Added an automatic attribute-selection tool. Added k-medoids and other clustering algorithms to the transform tool. Added code for converting text documents to sparse matrices. Added some text-mining tools. Added classic multidimensional scaling. Added the Isomap manifold learning algorithm. Added new features to the plot tool and improved how labels are drawn. Added support for missing neighbors in all the manifold learning algorithms. Added a feature to print decision trees. Added a tool for model visualization. Added a class for recurrent neural networks and other recurrent models. Added Backpropagation Through Time, MOSES, and other algorithms for training recurrent models. Enable filters to work in conjunction with incremental learners. Added simulated annealing. Switched to using C++ streams. Strengthened unit tests. Improved some interfaces. Fixed several bugs.
Added the Locally-Linear Embedding (LLE) to the transform tool and improved the Breadth First Unfolding manifold learning algorithm. Added the Kabsch algorithm for aligning data. Added singular value decomposition to the transform tool. Improved api docs. Further simplified the learning interface. Repaired some regressions with serialization. Added several unit tests.
Ported to 64-bit Linux. Ported to VC++ 2008. Added classes for hidden Markov Models, equation parsing, intelligent neighbor-finding, drawing random values from various distributions, function plotting, improved algorithms for computing principal components, pruning manifold shortcuts, significance testing, singular value decomposition, kernel machines, Moore-Penrose pseudo-inverse, Dijkstra's algorithm, Floyd Warshall, and Brandes' betweeness centrality. Improved the runtime performance of Manifold Sculpting. Added a tool for generating various datasets. Did a complete interface overhaul. (Yes, this will break your code when you upgrade. That's the price of moving forward.) Improved standards compliance and type safety. Added another transduction algorithm. Added a new demo for a machine learning journal site. improved plotting tools. Added support for measuring transductive accuracy. Dumped some demos that I grew tired of maintaining. Fixed a regression in the naive Bayes algorithm. Added several unit tests. Dumped some dead code.
Added a script-friendly command-line interface for all of the data mining tools. Converted to standard containers and did a whole lot of clean-up, maintenance, and polishing on the code.
Models can now be persisted to/from a text-based format. Added incremental kd-tree. Added calibrator. Restructured some interfaces. Added new modelers. Added incremental support to some modelers. Added significance testing. Added chess demo and evolutionary jumper demo. Improved api docs, threw out dead code, and of course fixed a lot of bugs.
Split the demos into separate apps. Added some Q-learning classes and a couple demos for it. Added several new supervised learning algorithms. Made a few GUI improvements. Redesigned the supervised learning interface to support output distributions instead of just classes or values, and to support semi-supervised learning. Added a semi-supervised learning algorithm. Added some code for Bayesian inference by MCMC using Metropolis and Gibbs sampling. Integrated a better pseudo-random-number-generator. Added code for doing Mixture of Gaussians by expectation maximization. Added code for Self Organizing Map. Added another hill-climbing algorithm. Added support for neural nets to the graphical data mining tool. Fixed 64-bit compatability issues. And fixed a lot of bugs.
Added a new unified data mining tool that replaces the rank tool, the charting tool, and the predictive accuracy tool. Added confidence estimates to all the learning algorithms. Added a tool to make precision/recall charts. Added a tool for augmenting data sets. Seriously improved the GUI. Added various tools for data mining. Redesigned the GSupervisdedLearner class. Made the charting tools smarter and more capable. Added support to run Waffles experiments on a cluster sans the GUI. Improved error checking, and of course, fixed a bunch of bugs.
Added some new image processing tools: a max-flow graph cut class, a region ajacency graph class, a video class, methods to compute gradient magnitude images, and a morphing class. Added a K-means clustering class, a couple new learning algorithms, some code for computing eigenvectors, a new tool for ranking learning algorithms, improved the documentation, and fixed many bugs.
Added a GBag class (for bagging ensembles), Random Forest, Arbitrary Arboretum, and PC Forest. Added some code for computing eigenvectors, an algorithm that computes principle components of data in many dimensions without needing to compute the covariance matrix, and code for generating random vectors (by generating random numbers with a gaussian distribution). Fixed some bugs in A-star search, the KNN algorithm, and the Decision Tree class.
Added A-star search, a relational table class, made the KNN instance learner work better incrementally, fixed several stability bugs in the socket and HTTP server classses, and added a face-sorting manifold learning demo.
Added a new clustering algorithm, a discreet path search algorithm, did a good deal of general code clean up, integrated some changes contributed by Roger Pack into various classes, and fixed about ten bugs. Also thanks to Kevin Kemp for getting it to build on Mac without having to include a special framework in the package. I also added some unpolished tools for making charts and added a link to the API docs from the main menu.
This is a bug fix release. If you were getting a build error in GKeyboard.cpp on Windows, that's fixed now.
Ported to Mac OSX Tiger. Thanks to Helaman Ferguson for much of this work. It now works on Mac, Linux, and Windows.
Added a new efficient neighbor-finding class. The KNN algorithm is much faster now. Fixed many bugs. Added a ray-tracing demo. This isn't really related to machine learning, but I have some plans in the future to try combining it with learning algorithms to do model reconstruction. The manifold learning demo demonstrates both unsupervised and semi-supervised manifold learning now.
Added a particle swarm algorithm and some other search algorithms. Fixed some issues with the genetic algorithm. The Neural Net interpolation demo now compares several search algorithms. (Backprop is the clear winner, but it's not a totally fair comparison because backprop is running in incremental mode while the others are doing batch mode.)
This is mostly a bug-fixing release.
Added automated tests for several of the classes.
The project is now released to the public. It builds in VS6 on Windows and with g++ on Linux.
Back to the table of contents