## Tuesday, July 31, 2012

### Sookie’s Complicated Love Life Or: An Absolute Beginner’s Guide to Connectionist Modeling, Part III: Deep Networks

Here I continue my effort to write an accessible introduction to Parallel Distributed Processing.  In Part I, I introduced units, connections, and connection strengths.  In Part 2, I introduced the concept of the activation function, and the difference between linear and saturating (e.g., sigmoidal) activation functions.  Here, in Part 3, I briefly describe the difference between shallow and deep networks at a level hopefully suitable for a beginner.  With vampires (and werewolves, etc)!

5.  Complication:  Deep Networks  At this point, the network is configured to tell Sookie how much danger—broadly categorized—she is in in any particular romantic configuration.  Say, however, that Sookie wants a more detailed breakdown of the peril that is likely to befall her than simply whether she will be in danger or not.  Perhaps she wants to know, for a given situation, what the likelihoods are that she will 1) have her blood drained 2) be shot by a werewolf 3) be shot by a human hunting the shapeshifter or 4) just be penetratingly chastised by the best friend of one of the humans.  We might imagine that the likelihood of each of these consequences is related to how much danger Sookie is in in totum.  Thus, we might extend our network to look more like this:

Figure 7 is a very simple example of what we would call a deep network.  It is deep because it has more than 2 layers.  “Deep” networks can be arbitrarily “deep”, that is, while Sookie’s network has 3 layers, there’s no reason it couldn’t have 4, or 5, or 100.  In deep networks, we will refer to the inputs still as inputs, the outputs still as outputs, but all non input/output layers as hidden.  In Sookie’s network, the inputs are still the suitor units, the outputs are now the danger constituent units, and the DANGER! unit is a hidden unit.  In what we read, the input and output units will usually (but not always) be more important to understand than the hidden units.  That is because, unlike in Sookie’s network, the hidden units won’t always refer to easily labeled quantities:  they will simply be a way of mathematically transforming the input in to the output.

The advantage of deep networks is that the deeper they get, the more fine-grained problems they can solve—in this example we have recoded simple “DANGER” in to some of its constituent parts.  However, deep networks come with a significant disadvantage:  the more layers that are added, the longer they take to train (exponentially so).  This is a consequence of back-propagation through time, the training algorithm that is typically used to train this kind of network (though there are other algorithms that are faster-- training routines for deep networks is an active area of research in machine learning).

All of the networks we will read about in this class will be deep, to a greater or lesser extent.  However, this shouldn’t trip you up:  deep networks work exactly the same as our original 2 layer network.  That is, activation starts in the input layer, and then flows to the second layer.  In Sookie’s case, activation begins with her paranormal suitors and flows to the danger unit.  Activation then flows in exactly the same way to subsequent layers.  Again in Sookie’s case, activation flows from the DANGER! unit into each of the constituent danger units in exactly the same manner that it go to the DANGER! unit in the first place.  The networks we read about may seem a lot more complicated than this due to the way they are described, but in reality activation flowing from one level to another is all that ever happens.