Here I continue my effort to write an accessible introduction to Parallel Distributed Processing. In Part I, I introduced units, connections, and connection strengths. In Part 2, I introduce the concept of the activation function, and the difference between linear and saturating (e.g., sigmoidal) activation functions. But with vampires!
3. The
Activation Function Remember,
the goal of this network is to answer questions like “If I date young vampire
and old vampire simultaneously, how much danger will I be in?” or “Is it more
dangerous to date a human and an old vampire or a young vampire and a
shapeshifter?” In order to answer these
questions with mathematical rigor, we need to find away to put numbers on our
units and connections, in order to put a number on DANGER!.
We already used our
intuition to come up with some relative connection strengths: these were reflected in the thicknesses of
the connections in Figure 3. As it would
turn out, using intuition to come up with connection weights is, historically,
the first method that was used to build this type of network. However, it was quickly discovered that when
you have networks that have, say, 100,000 connections, this is not a very
practical plan. What works better for
most of the networks we will consider is using a mathematical rule called a training algorithm to set the
weights. You won’t ever have to know
much about training rules, other than that they exist and they set weights
automatically, without the modeler having to figure out 100,000 (or more!)
weights.
We definitely don’t need
to think about training rules right now.
Instead, let’s use the intuitive pipethicknesses we came up with
earlier to assign some numerical weights to the network:

Figure 4 
In Figure 4, numbers are
assigned to weights based on our intuitions about how dangerous each type of
suitor is. That is, we thought that we
would rank these potential suitors as follows in order of dangerousness, from
least to most:
1: Human 2: Young Vampire 3(tie): Shapeshifter,
Werewolf 4: Old Vampire
The weights in Figure 4
are simply the numbers from that ranking.
Now, how do these
numbers help us to find numerical solutions to Sookie's DANGER problem? One way to think of them
is as how much the DANGER unit would be activated for each type of suitor, if
Sookie poured “1” drop of activation in to that suitor’s unit. So, if in her simulations Sookie activates
the “Young Vampire” unit to “1”, the DANGER unit would be activated to “2”.
That number is, of course, meaningless by itself, and it only starts to make
sense when we consider what would happen if Sookie activated the “Old Vampire”
unit instead. In that situation, based
on these weights, the DANGER unit would be activated to “4”. This reflects the idea that the old vampire
is twice as dangerous as the young vampire.
In this example, the activation function for our network is
the following:
Work out the examples
from above for yourself, to convince yourself this is true. Remember, Sookie is starting out with simple
simulations, so she’s only pouring “1” of activation in to a single unit to
start.
Our activation function
permits quantification of some more of our intuitions. For example, if Sookie enters in to only a
halfhearted relationship—pours in only “1/2” activation to a suitor—that is
probably only half as dangerous as a more serious relationship. This intuition is reflected by the fact that
a weight has to be multiplied by its suitor’s activation before being “sent” to
the DANGER unit. Similarly, the more
suitors Sookie engages in a relationship with, the more items will go in to the
sum, resulting in more activation on the DANGER unit. In fact, with this formula, we now have the
mathematical machinery to answer questions of arbitrary complexity. For example “How dangerous is it to date 2
humans very seriously (human unit activation “6”) while simultaneously dating a
shapeshifter and werewolf halfheartedly (shapeshifter activation “.5” and
werewolf activation .5) after having revoked young vampire’s invitation to my
house (young vampire activation “1”)?”
Check for yourself!
Once again, exploring
our intuitions about this model also reveals one of its flaws. Consider the following: In this model, every time Sookie adds a new
suitor (either by increasing the activation in a preexisting unit or adding
some activation in a new unit—say she wants to start dating a necromancer, for
some reason), the total amount of predicted DANGER will increase. However, after a certain threshold of DANGER,
Sookie will no longer be in DANGER at all, because she will simply be
DEAD. That is, DANGER is a quantity that
saturates. There’s no realistic situation where
Sookie can be in infinite danger.
Mathematically, the problem here is that we have used a linear activation function, instead of
a saturating activation
function. That is, the relationship
between input (suitor) activation and output (DANGER) activation in our network looks like this:
When in reality it
should look like this:

Figure 6 
For essentially this
very reason (though not usually spelled out so luridly), most of the networks
we will study will use the sigmoidal activation
shown in Figure 6.
4. Interim
Summary At this point, in a very
serious sense, we have learned everything there ever will be to know about
connectionist networks. Namely, there
will be units. They will be connected. The connections
will differ in their weights, and
most of the time weights will be determined by a learning algorithm. Information
will flow between units at different levels as numerically specified by the activation function, which will usually
saturate. Your first step in understanding any
model that we read about will be to identify what kind of units there are, and
how they are connected. If you can do
that, you will be in good shape.
However, the models we
read about will be substantially more complicated than this one. Next, I will talk about two of the most
common complications we will encounter:
deep networks and networks with distributed representation.