What is a neural network?
It’s a technique for building a computer program that learns from data, based very loosely on how brains work. Software “neurons” are connected together, allowing them to send messages to each other. The network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure. For a more detailed introduction to neural networks, Michael Nielsen’s Neural Networks and Deep Learning is a good place to start.
More about the datasets
The data points are scaled and dimensionless, so it might be hard to tell what they represent. So here's a brief description.
Classification datasets
Linear – A data set that is linearly seperable along the X1 dimension. You'll only need one feature and one neuron to learn this relationship.
Diagonal – The two classes are clustered along a diagonal, so you'll need more than one feature to build an adequate classifier. Note that the training data does not span the domain of the data, whereas the validation data does — so the variance is always high on this dataset.
Exclusive Or (XOR) – This is a classic problem in neural network research. It is the simplest non-linearly separable classification task that exists.
Gaussian – A classification problem with clusters (or blobs) each represented by a Gaussian distribution. Change the noise level to broaden or narrow the distribution.
Circle – The instances of one class encircle the instances of the other.
Spiral – Two classes spiraling around each other.
Moons – Two interleaving moons, or croissants if you prefer. Inspired by the sklearn
dataset.
Real data: sand vs shale – This is P-wave velocity (X1) and bulk density (X2) values for a selection of sandstones (cyan) and shales (dark blue) from the Rock Property Catalog. A standard scalar has been applied to the features, then multipled by 2.
Real data: poro-perm – Porosity (X1) and the base-10 logarithm of permeability (X2) of Oligocene sandstones. Medium-to-coarse grained sands are in cyan, fine-to-medium grained sands are in dark blue. Dataset is modified from Taylor et al. 1993, dataset number 64 in the USGS report 03-420 of Porosity and Permeability from Core Plugs in Siliclastic Rocks.
Regression datasets
Plane – Data coordinates sampling a dipping planar surface.
Multi-Gaussian – multiple clusters with Gaussian spatial distributions.
Real data: Porosity – This is a small dataset representing a porosity map. It is from Geoff Bohling at the Kansas Geological Survey, but we can no longer find the data online.
Real data: DTS from DTP and RHOB – Predicting S-wave sonic wireline measurements from compressional velocity (DTP) (X1) and bulk density (X2). Data has been downsampled from well R-39 offshore Nova Scotia, available from the CNSOPB.
How do I make my own datasets?
You will need to define a machine learning task with 2 features and one target. About 400 records is ideal — split 50% positive and 50% negative classes for a classification problem. The features will be scaled using Z-score standardization (multiplied by 2 for complicated reasons). The target should be -1 and 1 for classification, or in the range [-1, 1] for regression. Here's an example of how to prepare data.
Once you have a JSON file, you can upload it using the button with the file_upload icon (top left).
Credits
This fork of the TensorFlow Playground was adapted by Matt Hall and Evan Bianco at Agile Scientific.
Here's some of what we've added or changed:
- Added several new real and synthetic datasets, with descriptions.
- More activation functions to try, including ELU and Swish.
- Change the regularization during training and watch the weights.
- Upload your own datasets!
- See the expression of the network in Python code.
- Lots of small bug fixes and cosmetic changes.
The original app was was created by Daniel Smilkov and Shan Carter. It was a continuation of many people’s previous work — most notably Andrej Karpathy’s convnet.js demo and Chris Olah’s articles about neural networks. Many thanks also to D. Sculley for help with the original idea and to Fernanda Viégas and Martin Wattenberg and the rest of the Big Picture and Google Brain teams for feedback and guidance.
Several of our changes were inspired or pioneered by David Cato's fork of the project, especially the Swish function, the code expression, and the on-the-fly regularization tweaking. Kudos and thanks to him and his collaborators!