Tuesday, June 26, 2012

Sookie’s Complicated Love Life Or: An Absolute Beginner’s Guide to Connectionist Modeling, Part II: The Activation Function

Here I continue my effort to write an accessible introduction to Parallel Distributed Processing.  In Part I, I introduced units, connections, and connection strengths.  In Part 2, I introduce the concept of the activation function, and the difference between linear and saturating (e.g., sigmoidal) activation functions.  But with vampires!

3.  The Activation Function Remember, the goal of this network is to answer questions like “If I date young vampire and old vampire simultaneously, how much danger will I be in?” or “Is it more dangerous to date a human and an old vampire or a young vampire and a shapeshifter?”  In order to answer these questions with mathematical rigor, we need to find away to put numbers on our units and connections, in order to put a number on DANGER!. 

We already used our intuition to come up with some relative connection strengths:  these were reflected in the thicknesses of the connections in Figure 3.  As it would turn out, using intuition to come up with connection weights is, historically, the first method that was used to build this type of network.  However, it was quickly discovered that when you have networks that have, say, 100,000 connections, this is not a very practical plan.  What works better for most of the networks we will consider is using a mathematical rule called a training algorithm to set the weights.  You won’t ever have to know much about training rules, other than that they exist and they set weights automatically, without the modeler having to figure out 100,000 (or more!) weights. 

We definitely don’t need to think about training rules right now.  Instead, let’s use the intuitive pipe-thicknesses we came up with earlier to assign some numerical weights to the network:

Figure 4

In Figure 4, numbers are assigned to weights based on our intuitions about how dangerous each type of suitor is.  That is, we thought that we would rank these potential suitors as follows in order of dangerousness, from least to most:

1:  Human 2: Young Vampire 3(tie): Shapeshifter, Werewolf 4: Old Vampire

The weights in Figure 4 are simply the numbers from that ranking. 

Now, how do these numbers help us to find numerical solutions to Sookie's DANGER problem?  One way to think of them is as how much the DANGER unit would be activated for each type of suitor, if Sookie poured “1” drop of activation in to that suitor’s unit.  So, if in her simulations Sookie activates the “Young Vampire” unit to “1”, the DANGER unit would be activated to “2”. That number is, of course, meaningless by itself, and it only starts to make sense when we consider what would happen if Sookie activated the “Old Vampire” unit instead.  In that situation, based on these weights, the DANGER unit would be activated to “4”.  This reflects the idea that the old vampire is twice as dangerous as the young vampire.

In this example, the activation function for our network is the following:

Work out the examples from above for yourself, to convince yourself this is true.  Remember, Sookie is starting out with simple simulations, so she’s only pouring “1” of activation in to a single unit to start.   

Our activation function permits quantification of some more of our intuitions.  For example, if Sookie enters in to only a half-hearted relationship—pours in only “1/2” activation to a suitor—that is probably only half as dangerous as a more serious relationship.  This intuition is reflected by the fact that a weight has to be multiplied by its suitor’s activation before being “sent” to the DANGER unit.  Similarly, the more suitors Sookie engages in a relationship with, the more items will go in to the sum, resulting in more activation on the DANGER unit.  In fact, with this formula, we now have the mathematical machinery to answer questions of arbitrary complexity.  For example “How dangerous is it to date 2 humans very seriously (human unit activation “6”) while simultaneously dating a shapeshifter and werewolf half-heartedly (shapeshifter activation “.5” and werewolf activation .5) after having revoked young vampire’s invitation to my house (young vampire activation “-1”)?”  Check for yourself! 

Once again, exploring our intuitions about this model also reveals one of its flaws.  Consider the following:  In this model, every time Sookie adds a new suitor (either by increasing the activation in a pre-existing unit or adding some activation in a new unit—say she wants to start dating a necromancer, for some reason), the total amount of predicted DANGER will increase.  However, after a certain threshold of DANGER, Sookie will no longer be in DANGER at all, because she will simply be DEAD.  That is, DANGER is a quantity that saturates.  There’s no realistic situation where Sookie can be in infinite danger.  Mathematically, the problem here is that we have used a linear activation function, instead of a saturating activation function.  That is, the relationship between input (suitor) activation and output (DANGER) activation in our network looks like this:

When in reality it should look like this:

Figure 6
For essentially this very reason (though not usually spelled out so luridly), most of the networks we will study will use the sigmoidal activation shown in Figure 6. 

4.  Interim Summary At this point, in a very serious sense, we have learned everything there ever will be to know about connectionist networks.  Namely, there will be units.  They will be connected.  The connections will differ in their weights, and most of the time weights will be determined by a learning algorithm.  Information will flow between units at different levels as numerically specified by the activation function, which will usually saturate.  Your first step in understanding any model that we read about will be to identify what kind of units there are, and how they are connected.  If you can do that, you will be in good shape. 

However, the models we read about will be substantially more complicated than this one.  Next, I will talk about two of the most common complications we will encounter:  deep networks and networks with distributed representation. 

No comments:

Post a Comment