next up previous contents index
Next: 11.3 Summary Up: 11. Learning Equations Previous: 11.1 Learning in Rate

Subsections



11.2 Learning in Spiking Models

In the previous section we have seen that the evolution of synaptic weights under a rate-based learning rule depends on correlations in the input. What happens, if the rate-based learning rule is replaced by a spike-time dependent one?

In Section 11.2.1 we will derive an equation that relates the expectation value of the weight vector to statistical properties of pre- and postsynaptic spike trains. We will see that spike-time dependent plasticity is sensitive to spatial and temporal correlations in the input. In certain particularly simple cases spike-spike correlations can be calculated explicitly. This is demonstrated in Section [*] in the context of a linear Poisson neuron. This neuron model is also used in Section 11.2.3 for a comparison of spike-based and rate-based learning rules as well as in Section 11.2.4 where we revisit the static-pattern scenario of Section 11.1.2. Finally, in Section 11.2.5, we discuss the impact of stochastic spike arrival on the synaptic weights and derive a Fokker-Planck equation that describes the temporal evolution of the weight distribution.


11.2.1 Learning Equation

We will generalize the analysis of Hebbian learning that has been developed in Section 11.1 to spike-based learning rules based on the phenomenological model of Section 10.3.1. In this model the synaptic weight wij(t) is a piecewise continuous function of time with steps whenever a presynaptic spike arrives or when a postsynaptic action potential is triggered, i.e.,

\begin{multline}
\frac{{\text{d}}}{{\text{d}}t} w_{ij}(t) =
a_0
+
a_1^{\tex...
...+
S_i(t) \, \int_0^\infty W(-s) \, S_j(t-s) \; {\text{d}}s
\,,
\end{multline}

cf. Eqs. (10.14)-(10.15). As before we want to relate the synaptic weight change to the statistical properties of the input. Given the increased level of complexity, a few remarks about the underlying statistical ensemble are in order.

In the previous section we have considered presynaptic firing rates $ \nu_{j}^{}$ as random variables drawn from an ensemble of input patterns $ \xi_{j}^{\mu}$. The output rate, however, was a deterministic function of the neuronal input. In the context of spike-time dependent plasticity, we consider the set of presynaptic spike arrival times (tj1, tj2,...) as a random variable. The underlying `randomness' may have several reasons. For example, different stimulation paradigms may be selected one by one in very much the same way as we have selected a new input pattern in the previous section. In contrast to the rate model, we do not want to restrict ourselves to deterministic neuron models. Hence, the randomness can also be produced by a stochastic neuron model that is used in order account for noise; cf. Chapter 5. In this case, the output spike train can be a random variable even if the input spike trains are fixed. A simple example is the Poisson neuron model that generates output spikes via an inhomogeneous Poisson process with an intensity that is a function of the membrane potential. In any case, we consider the set of spike trains (S1,..., Si, Sj,..., SN), i.e., pre- and postsynaptic trains, to be drawn from a stochastic ensemble. The specific properties of the chosen neuron model are thus implicitly described by the association of pre- and postsynaptic trains within the ensemble. Note that this formalism includes deterministic models as a special case, if the ensemble contains only a single postsynaptic spike train for any given set of presynaptic spike trains. In the following, all averages denoted by $ \langle$ . $ \rangle_{E}^{}$ are to be taken relative to this ensemble.

For the time being we are interested only in the long-term behavior of the synaptic weights and not in the fluctuations that are caused by individual spikes. As in Section 11.1.2 we therefore calculate the expectation value of the weight change over a certain interval of time,

$\displaystyle \left<\vphantom{ w_{ij}(t+T) - w_{ij}(t)}\right.$wij(t + T) - wij(t)$\displaystyle \left.\vphantom{ w_{ij}(t+T) - w_{ij}(t)}\right>_{E}^{}$ = $\displaystyle \left<\vphantom{ \int_t^{t+T} \frac{{\text{d}}}{{\text{d}}t} w_{ij}(t') {\text{d}}t' }\right.$$\displaystyle \int_{t}^{{t+T}}$$\displaystyle {\frac{{{\text{d}}}}{{{\text{d}}t}}}$wij(t')dt'$\displaystyle \left.\vphantom{ \int_t^{t+T} \frac{{\text{d}}}{{\text{d}}t} w_{ij}(t') {\text{d}}t' }\right>_{E}^{}$ . (11.38)

With the abbreviation

$\displaystyle \left<\vphantom{ f(t) }\right.$f (t)$\displaystyle \left.\vphantom{ f(t) }\right>_{T}^{}$ $\displaystyle \equiv$ T-1 $\displaystyle \int_{t}^{{t+T}}$f (t') dt' (11.39)

we obtain from Eq. (11.42)

$\displaystyle {\frac{{\left < w_{ij}(t+T) - w_{ij}(t)\right >_E}}{{T}}}$ = a0 + a1pre $\displaystyle \langle$$\displaystyle \langle$Sj(t)$\displaystyle \rangle_{T}^{}$$\displaystyle \rangle_{E}^{}$ + a1post $\displaystyle \langle$$\displaystyle \langle$Si(t)$\displaystyle \rangle_{T}^{}$$\displaystyle \rangle_{E}^{}$    
           + $\displaystyle \int_{0}^{\infty}$W(s$\displaystyle \langle$$\displaystyle \langle$Si(t - sSj(t)$\displaystyle \rangle_{T}^{}$$\displaystyle \rangle_{E}^{}$  ds    
           + $\displaystyle \int_{{-\infty}}^{0}$W(s$\displaystyle \langle$$\displaystyle \langle$Si(tSj(t + s)$\displaystyle \rangle_{T}^{}$$\displaystyle \rangle_{E}^{}$  ds . (11.40)

If the time interval T is long as compared to typical interspike intervals then the time average is taken over many pre- or postsynaptic spikes. We can thus assume that the average $ \langle$$ \langle$Si(tSj(t + s)$ \rangle_{T}^{}$$ \rangle_{E}^{}$ does not change if we replace t by t - s as long as s $ \ll$ T. Furthermore, if W(s) decays to zero sufficiently fast as | s|$ \to$$ \infty$ then the integration over s in the last term of Eq. (11.45) can be restricted to a finite interval determined by the width of the learning window W . In this case it is possible to replace $ \langle$$ \langle$Si(tSj(t + s)$ \rangle_{T}^{}$$ \rangle_{E}^{}$ by $ \langle$$ \langle$Si(t - sSj(t)$ \rangle_{T}^{}$$ \rangle_{E}^{}$ and to collect the last two terms of Eq. (11.45) into a single integral, provided that the width of learning window is small as compared to T. With this approximation we find

$\displaystyle {\frac{{\left < w_{ij}(t+T) - w_{ij}(t)\right >_E}}{{T}}}$ = a0 + a1pre $\displaystyle \langle$$\displaystyle \langle$Sj(t)$\displaystyle \rangle_{T}^{}$$\displaystyle \rangle_{E}^{}$ + a1post $\displaystyle \langle$$\displaystyle \langle$Si(t)$\displaystyle \rangle_{T}^{}$$\displaystyle \rangle_{E}^{}$    
           + $\displaystyle \int_{{-\infty}}^{\infty}$W(s$\displaystyle \langle$$\displaystyle \langle$Si(t - sSj(t)$\displaystyle \rangle_{T}^{}$$\displaystyle \rangle_{E}^{}$  ds . (11.41)

The instantaneous firing rate $ \nu_{i}^{}$(t) of neuron i is the ensemble average of its spike train,

$\displaystyle \nu_{i}^{}$(t) $\displaystyle \equiv$ $\displaystyle \langle$Si(t)$\displaystyle \rangle_{E}^{}$ . (11.42)

Similarly, we define the joint firing rate $ \nu_{{ij}}^{}$ of neuron i and j as

$\displaystyle \nu_{{ij}}^{}$(t, t') $\displaystyle \equiv$ $\displaystyle \langle$Si(tSj(t')$\displaystyle \rangle_{E}^{}$ , (11.43)

which is the joint probability density to find both a spike at time t and at time t' in neuron i and j, respectively. Note that $ \nu_{{ij}}^{}$(t, t') is a probability density both in t and t' and thus has units of one over time squared.

Since averaging is a linear operations we can exchange ensemble average and time average. We obtain the following expression for the expected weight change in the interval from t to t + T as a function of the statistical properties of the spike trains,

$\displaystyle {\frac{{\left < w_{ij}(t+T) - w_{ij}(t)\right >_E}}{{T}}}$ = a0 + a1pre $\displaystyle \langle$$\displaystyle \nu_{j}^{}$(t)$\displaystyle \rangle_{T}^{}$ + a1post $\displaystyle \langle$$\displaystyle \nu_{i}^{}$(t)$\displaystyle \rangle_{T}^{}$    
           + $\displaystyle \int_{{-\infty}}^{\infty}$W(s$\displaystyle \langle$$\displaystyle \nu_{{ij}}^{}$(t - s, t)$\displaystyle \rangle_{T}^{}$  ds . (11.44)

The time average $ \langle$$ \nu_{{ij}}^{}$(t - s, t)$ \rangle_{T}^{}$ is the correlation function of pre- and postsynaptic spike train on the interval [t, t + T]. This function clearly depends on the actual value of the weight vector. In deriving Eq. (11.49) we already had to assume that the correlations are a slowly varying function of time. For the sake of consistency we thus have the requirement that the weight vector itself is a slowly varying function of time. If this is the case then we can exploit the self-averaging property of the weight vector and argue that fluctuations around the expectation value are negligible and that Eq. (11.49) is a good approximation for the actual value of the weight vector. We thus drop the ensemble average on the left-hand side of Eq. (11.49) and find for the time-averaged change of the synaptic weight the following learning equation,

\begin{multline}
\frac{{\text{d}}}{{\text{d}}t} \langle w_{ij}(t) \rangle_T = a...
... W(s) \, \langle \nu _{ij}(t-s,t) \rangle_T \; {\text{d}}s
\, ;
\end{multline}

cf. (Kistler and van Hemmen, 2000a; Kempter et al., 1999). As expected, the long-term dynamics of the synaptic weights depends on the correlation of pre- and postsynaptic spike train on the time scale of the learning window. In the following we will always use the smooth time-averaged weight vector $ \langle$wij(t)$ \rangle_{T}^{}$, but for the sake of brevity we shall drop the angular brackets.


11.2.2 Spike-Spike Correlations

It is tempting to rewrite the correlation term $ \langle$$ \nu_{{ij}}^{}$(t - s, t)$ \rangle_{T}^{}$ that appears on the right-hand side of Eq. (11.50) in terms of the instantaneous firing rates $ \langle$$ \nu_{i}^{}$(t - s$ \nu_{j}^{}$(t)$ \rangle_{T}^{}$. This, however, is only allowed, if the spike trains of neuron i and j were independent, i.e., if $ \langle$Si(t - sSj(t)$ \rangle_{E}^{}$ = $ \langle$Si(t - s)$ \rangle_{E}^{}$ $ \langle$Sj(t)$ \rangle_{E}^{}$. Such an approach would therefore neglect the specific spike-spike correlations that are induced by presynaptic action potentials.

Correlations between pre- and postsynaptic spike trains do not only depend on the input statistics but also on the dynamics of the neuron model and the way new output spikes are generated. The influence of a single presynaptic spike on the postsynaptic activity can be measured by a peri-stimulus time histogram (PSTH) triggered on the time of presynaptic spike arrival; cf. Section 7.4.1. The form of the PSTH characterizes the spike-spike correlations between presynaptic spike arrival and postsynaptic action potential. For high noise, the spike-spike correlations contain a term that is proportional to the time-course of the postsynaptic potential $ \epsilon$, while for low noise this term is proportional to its derivative $ \epsilon{^\prime}$; cf. Figs. 7.12.

In the following, we will calculate the spike-spike correlations in a particularly simple case, the linear Poisson neuron model. As we will see, the spike-spike correlations contain in this case a term proportional to the postsynaptic potential $ \epsilon$. The linear Poisson neuron model can therefore be considered as a reasonable approximation to spiking neuron models in the high-noise limit.

11.2.2.1 Example: Linear Poisson neuron model

As a generalization of the analog neuron with linear gain function discussed in Section 11.1.2 we consider here a linear Poisson neuron. The input to the neuron consists of N Poisson spike trains with time-dependent intensities $ \nu_{j}^{}$(t). Similar to the SRM0 neuron the membrane potential ui of neuron i is a superposition of postsynaptic potentials $ \epsilon$ with $ \int_{0}^{\infty}$$ \epsilon$(s) ds = 1,

ui(t) = $\displaystyle \sum_{{j}}^{}$wij $\displaystyle \int_{0}^{\infty}$$\displaystyle \epsilon$(sSj(t - s)  ds . (11.45)

In contrast to Section 4.2.3 we neglect refractoriness and external input.

Postsynaptic spikes are generated by an inhomogeneous Poisson process with an intensity $ \nu^{{\text{post}}}_{i}$ that is a (semi-)linear function of the membrane potential,

$\displaystyle \nu^{{\text{post}}}_{i}$(t| u) = [ui(t)]+ . (11.46)

Here, [ . ]+ denotes the positive part of the argument in order to avoid negative rates. In the following, however, we will always assume that ui(t)$ \ge$ 0. The notation $ \nu^{{\text{post}}}_{i}$(t| u) indicates that the output rate depends on the actual value of the membrane potential.

We thus have a doubly stochastic process (Bartlett, 1963; Cox, 1955) in the sense that in a first step, a set of input spike trains is drawn from an ensemble characterized by Poisson rates $ \nu^{{\text{pre}}}_{j}$. This realization of input spike trains then determines the membrane potential which produces in a second step a specific realization of the output spike train according to $ \nu^{{\text{post}}}_{i}$(t| u). It can be shown that, because of the finite duration of the postsynaptic potential $ \epsilon$, the output spike trains generated by this composite process are no longer Poisson spike trains; their expectation value $ \langle$Si(t)$ \rangle_{E}^{}$ $ \equiv$ $ \nu^{{\text{post}}}_{i}$(t), however, is simply equivalent to the expectation value of the output rate, $ \nu^{{\text{post}}}_{i}$(t) = $ \langle$$ \nu^{{\text{post}}}_{i}$(t| u)$ \rangle_{E}^{}$ (Kistler and van Hemmen, 2000a). Due to the linearity of the neuron model the output rate is given by a convolution of the input rates with the response kernel $ \epsilon$,

$\displaystyle \nu^{{\text{post}}}_{i}$(t) = $\displaystyle \sum_{{j}}^{}$wij $\displaystyle \int_{0}^{\infty}$$\displaystyle \epsilon$(s$\displaystyle \nu^{{\text{pre}}}_{j}$(t - s)  ds . (11.47)

The joint firing rate $ \nu^{{\text{post,pre}}}_{{ij}}$(t, t') = $ \langle$Si(tSj(t')$ \rangle_{E}^{}$ of pre- and postsynaptic neuron is the joint probability density to find an input spike at synapse j at time t' and an output spike of neuron i at time t. According to Bayes' Theorem this probability equals the probability of observing an input spike at time t' times the conditional probability of observing an output spike at time t given the input spike at time t', i.e.,

$\displaystyle \nu^{{\text{post,pre}}}_{{ij}}$(t, t') = $\displaystyle \langle$Si(t)| input spike att'$\displaystyle \rangle_{E}^{}$ $\displaystyle \langle$Sj(t')$\displaystyle \rangle_{E}^{}$ . (11.48)

In the framework of a linear Poisson neuron, the term $ \langle$Si(t)| input spike att'$ \rangle_{E}^{}$ equals the sum of the expected output rate (11.53) and the specific contribution wij $ \epsilon$(t - t') of a single (additional) input spike at time t'. Altogether we obtain

$\displaystyle \nu^{{\text{post,pre}}}_{{ij}}$(t, t') = $\displaystyle \nu^{{\text{post}}}_{i}$(t$\displaystyle \nu^{{\text{pre}}}_{j}$(t') + wij $\displaystyle \epsilon$(t - t'$\displaystyle \nu^{{\text{pre}}}_{j}$(t') . (11.49)

The first term on the right-hand side is the `chance level' to find two spikes at t and t', respectively, if the neurons were firing independently at rates $ \nu^{{\text{post}}}_{i}$(t) and $ \nu^{{\text{pre}}}_{j}$(t'). The second term describes the correlation that is due to synaptic coupling. If the presynaptic neuron has fired a spike at t' then the chance for the postsynaptic neuron to fire an spike at time t > t' is increased by wij $ \epsilon$(t - t'). Note that this expression respects causality: The probability to find first a postsynaptic spike and then a presynaptic spike is just chance level because $ \epsilon$(t - t') = 0 for t < t'.

11.2.2.2 Example: Learning equation for a linear Poisson neuron

If we use the result from Eq. (11.55) in the learning equation (11.50) we obtain

\begin{multline}
\frac{{\text{d}}}{{\text{d}}t} w_{ij}(t) = a_0
+
a_1^{\text...
... w_{ij}(t) \, \langle \nu_j^{\text{pre}}(t)\rangle_T\,
W_-
\,,
\end{multline}

with W- = $ \int_{0}^{\infty}$W(- s$ \epsilon$(s)ds.

In linear Poisson neurons, the correlation between pre- and postsynaptic activity that drives synaptic weight changes consists of two contributions. The integral over the learning window in Eq. (11.56) describes correlations in the instantaneous firing rate. The last term on the right-hand side of Eq. (11.56) finally accounts for spike-spike correlations of pre- and postsynaptic neuron.

If we express the instantaneous firing rates $ \nu_{i}^{}$(t) in terms of their fluctuations $ \Delta$$ \nu_{i}^{}$(t) around the mean $ \langle$$ \nu_{i}^{}$(t)$ \rangle_{T}^{}$,

$\displaystyle \nu_{i}^{}$(t) = $\displaystyle \Delta$$\displaystyle \nu_{i}^{}$(t) + $\displaystyle \langle$$\displaystyle \nu_{i}^{}$(t)$\displaystyle \rangle_{T}^{}$ , (11.50)

then we can rewrite Eq. (11.56) together with Eq. (11.53) as

\begin{multline}
\frac{{\text{d}}}{{\text{d}}t} w_{ij}(t) = a_0
+
a_1^{\text...
...(t)
+ w_{ij}(t) \, \langle \nu_j^{\text{pre}}(t)\rangle_T \, W_-
\end{multline}

with

Qkj(t) = $\displaystyle \int_{{-\infty}}^{\infty}$W(s$\displaystyle \int_{0}^{\infty}$$\displaystyle \epsilon$(s'$\displaystyle \langle$$\displaystyle \Delta$$\displaystyle \nu_{k}^{{\text{pre}}}$(t - s - s'$\displaystyle \Delta$$\displaystyle \nu_{j}^{{\text{pre}}}$(t)$\displaystyle \rangle_{T}^{}$ ds' ds . (11.51)

Here we have implicitly assumed that the temporal averaging interval T is much longer than the length of the learning window, the duration of a postsynaptic potential, or a typical interspike interval, so that $ \langle$$ \nu^{{\text{post}}}_{i}$(t - s)$ \rangle_{T}^{}$ $ \approx$ $ \langle$$ \nu^{{\text{post}}}_{i}$(t)$ \rangle_{T}^{}$ and $ \langle$$ \nu^{{\text{pre}}}_{j}$(t - s')$ \rangle_{T}^{}$ $ \approx$ $ \langle$$ \nu^{{\text{pre}}}_{j}$(t)$ \rangle_{T}^{}$.

The term containing Qkj(t) on the right-hand side of Eq. (11.58) shows how spatio-temporal correlations $ \langle$$ \Delta$$ \nu_{k}^{{\text{post}}}$(t'$ \Delta$$ \nu_{j}^{{\text{pre}}}$(t)$ \rangle_{T}^{}$ in the input influence the evolution of synaptic weights. What matters are correlations on the time scale of the learning window and the postsynaptic potential.


11.2.3 Relation of spike-based to rate-based learning

In Section 11.1.2 we have investigated the weight dynamics in the context of an analog neuron where the postsynaptic firing rate is an instantaneous function of the input rates. We have seen that learning is driven by (spatial) correlations within the set of input patterns. The learning equation (11.56) goes one step further in the sense that it explicitly includes time. Consequently, learning is driven by spatio-temporal correlations in the input.

In order to compare the rate-based learning paradigm of Section 11.1.2 with the spike-based formulation of Eq. (11.56) we thus have to disregard temporal correlations for the time being. We thus consider a linear Poisson neuron with stationary input rates, $ \langle$$ \nu_{j}^{}$(t)$ \rangle_{T}^{}$ = $ \nu_{j}^{}$(t) = $ \nu_{j}^{}$, and assume that the synaptic weight is changing slowly as compared to the width of the learning window and the postsynaptic potential. The weight dynamics is given by Eq. (11.56),

$\displaystyle {\frac{{{\text{d}}}}{{{\text{d}}t}}}$wij(t) = a0 + a1pre $\displaystyle \nu_{j}^{}$ + a1post $\displaystyle \nu_{i}^{}$ + $\displaystyle \bar{{W}}$ $\displaystyle \nu_{i}^{}$ $\displaystyle \nu_{j}^{}$  + W- wij(t$\displaystyle \nu_{j}^{}$  , (11.52)

with $ \bar{{W}}$ = $ \int_{{-\infty}}^{\infty}$W(s) ds and W- = $ \int_{0}^{\infty}$W(- s$ \epsilon$(s)ds. If we identify

c0(wij) = a0 ,    c1pre(wij) = a1pre + wij(tW- ,    c1post(wij) = a1post , (11.53)

and

c2corr(wij) = $\displaystyle \bar{{W}}$ , (11.54)

we recover the general expression for synaptic plasticity based on the rate description given in Eq. (10.2). The total area under the learning window thus plays the role of the correlation parameter c2corr that is responsible for Hebbian or anti-Hebbian plasticity in a rate formulation. The spike-spike correlations simply give rise to an additional weight-dependent term wij(tW- in the parameter c1pre(wij) that describes presynaptically triggered weight changes.

We may wonder what happens if we relax the requirement of strictly stationary rates. In the linear Poisson model, the output rate depends via Eq. (11.53) on the input rates and changes in the input rate translate into changes in the output rate. If the rate of change is small, we can expand the output rate

$\displaystyle \nu_{i}^{{\text{post}}}$(t - s) $\displaystyle \approx$ $\displaystyle \nu_{i}^{{\text{post}}}$(t) - s $\displaystyle {{\text{d}}\over {\text{d}}t}$$\displaystyle \nu_{i}^{{\text{post}}}$(t) + $\displaystyle \mathcal {O}$(s2) (11.55)

on the right-hand side of Eq. (11.56),
$\displaystyle {\frac{{{\text{d}}}}{{{\text{d}}t}}}$wij(t) = a0 + a1pre $\displaystyle \nu_{j}^{{\text{pre}}}$(t) + a1post $\displaystyle \nu_{i}^{{\text{post}}}$(t) + $\displaystyle \bar{{W}}$ $\displaystyle \nu_{i}^{{\text{post}}}$(t$\displaystyle \nu_{j}^{{\text{pre}}}$(t  
    + W- wij(t$\displaystyle \nu_{j}^{{\text{pre}}}$(t)  - $\displaystyle \nu_{j}^{{\text{pre}}}$(t$\displaystyle {{\text{d}}\over {\text{d}}t}$$\displaystyle \nu_{i}^{{\text{post}}}$(t)$\displaystyle \int\limits_{{-\infty}}^{{\infty}}$s W(s) ds .     (11.56)

Here, we have dropped the temporal averages because rates are assumed to change slowly relative to T.

As compared to Eq. (11.60) we encounter an additional term that is proportional to the first moment $ \int$s W(s) ds of the learning window. This term has been termed differential-Hebbian (Roberts, 1999; Xie and Seung, 2000) and plays a certain role in the context of conditioning and reinforcement learning (Rao and Sejnowski, 2001; Montague et al., 1995).


11.2.3.1 Stabilization of Postsynaptic Rates

Another interesting property of a learning rule of the form (10.2) or (11.60) is that it can lead to a normalization of the postsynaptic firing rate and hence to a normalization of the sum of the synaptic weights. This can be achieved even without including higher order terms in the learning equation or postulating a dependence of the parameters a0, a1pre/post, etc., on the actual value of the synaptic efficacy.

Consider a linear Poisson neuron that receives input from N presynaptic neurons with spike activity described by independent Poisson processes with rate $ \nu^{{\text{pre}}}_{}$. The postsynaptic neuron is thus firing at a rate $ \nu_{i}^{{\text{post}}}$(t) = $ \nu^{{\text{pre}}}_{}$ $ \sum_{{j=1}}^{N}$wij(t). From Eq. (11.56) we obtain the corresponding dynamics for the synaptic weights, i.e.,

$\displaystyle {\frac{{{\text{d}}}}{{{\text{d}}t}}}$wij(t) = a0 + a1pre $\displaystyle \nu^{{\text{pre}}}_{}$ + a1post $\displaystyle \nu^{{\text{pre}}}_{}$ $\displaystyle \sum_{{k=1}}^{N}$wik(t)    
           + ($\displaystyle \nu^{{\text{pre}}}_{}$)2 $\displaystyle \bar{W}$ $\displaystyle \sum_{{k=1}}^{N}$wik(t) + wij(t$\displaystyle \nu^{{\text{pre}}}_{}$ W- , (11.57)

with $ \bar{W}$ = $ \int_{{-\infty}}^{\infty}$W(s) ds and W- = $ \int_{0}^{{\infty}}$$ \epsilon$(sW(- s) ds. In this particularly simple case the weight dynamics is characterized by a fixed point for the sum of the synaptic weights, $ \sum_{j}^{}$wij, and, hence, for the postsynaptic firing rate, $ \nu_{i}^{{\text{post}}}$ = $ \nu_{{\text{FP}}}^{}$,

$\displaystyle \nu_{{\text{FP}}}^{}$ = - $\displaystyle {\frac{{ a_0 + a_1^{\text{pre}} \, \nu^{\text{pre}} }}{{ a_1^{\text{post}} + \nu^{\text{pre}} \, \bar W + N^{-1} \, W_- }}}$ . (11.58)

This fixed point is attractive if the denominator is negative. Since $ \nu_{i}^{{\text{post}}}$ is a firing rate we have the additional requirement that $ \nu_{{\text{FP}}}^{}$$ \ge$ 0. Altogether we thus have two conditions for the parameters of the learning rule, i.e., a1post + $ \nu^{{\text{pre}}}_{}$ $ \bar{W}$ + N-1 W- < 0 and a0 + a1pre $ \nu^{{\text{pre}}}_{}$$ \ge$ 0. Note that we would obtain a - apart from the term (N-1 W-) - completely analogous result from the rate formulation in Eq. (10.2) if we identify c2corr = $ \bar{W}$; cf. Eq. ([*]). Note further, that the linearity is not essential for the stabilization of the postsynaptic rate. Any model where the output rate is a monotonous function of the sum of the synaptic weights yields qualitatively the same result.


11.2.4 Static-Pattern Scenario

In order to illustrate the above results with a concrete example we revisit the static-pattern scenario that we have already studied in the context of analog neurons in Section 11.1.2. We consider a set of static patterns {$ \vec{\xi}^{\mu}_{}$;1 < $ \mu$ < p} that are presented to the network in a random sequence ($ \mu_{1}^{}$,$ \mu_{2}^{}$,...) during time steps of length $ \Delta$t. Presynaptic spike trains are described by an inhomogeneous Poisson process with a firing intensity that is determined by the pattern that is currently presented. Hence, the instantaneous presynaptic firing rates are piecewise constant functions of time,

$\displaystyle \nu_{j}^{{\text{pre}}}$(t) = $\displaystyle \sum_{k}^{}$$\displaystyle \xi_{j}^{{\mu_k}}$ $\displaystyle \Theta$[t - (k - 1) $\displaystyle \Delta$t$\displaystyle \Theta$[k $\displaystyle \Delta$t - t] . (11.59)

Due to the randomness by which the patterns are presented the input does not contain any no non-trivial temporal correlations. We thus expect to obtain the very same result as in Section 11.1.2, i.e., that the evolution of synaptic weights is determined by the correlation matrix of the input pattern set.

For linear Poisson neurons the joint firing rate of pre- and postsynaptic neuron is given by Eq. (11.55),

$\displaystyle \nu_{{ij}}^{}$(t - s, t) = $\displaystyle \nu_{i}^{{\text{post}}}$(t - s$\displaystyle \nu_{j}^{{\text{pre}}}$(t) + wij(t$\displaystyle \epsilon$(- s$\displaystyle \nu_{j}^{{\text{pre}}}$(t) . (11.60)

The postsynaptic firing rate is

$\displaystyle \nu_{i}^{{\text{post}}}$(t) = $\displaystyle \sum_{j}^{}$$\displaystyle \int_{0}^{{\infty}}$wij(t - s$\displaystyle \epsilon$(s$\displaystyle \nu_{j}^{{\text{pre}}}$(t - s) ds    
  $\displaystyle \approx$ $\displaystyle \sum_{j}^{}$wij(t)$\displaystyle \int_{0}^{{\infty}}$$\displaystyle \epsilon$(s$\displaystyle \nu_{j}^{{\text{pre}}}$(t - s) ds , (11.61)

where we have assumed implicitly that the synaptic weights are approximately constant on the time scale defined by the duration of the postsynaptic potential $ \epsilon$ so that we can pull wij in front of the integral.

As usual, we are interested in the long-term behavior of the synaptic weights given by Eq. (11.56). We thus need the time-average of $ \nu_{i}^{}$(t - s$ \nu_{j}^{}$(t) over the interval T,

$\displaystyle \langle$$\displaystyle \nu^{{\text{post}}}_{i}$(t - s$\displaystyle \nu^{{\text{pre}}}_{j}$(t)$\displaystyle \rangle_{T}^{}$ = $\displaystyle \sum_{k}^{}$wik(t$\displaystyle \int_{0}^{\infty}$$\displaystyle \epsilon$(s'$\displaystyle \langle$$\displaystyle \nu^{{\text{pre}}}_{k}$(t - s - s'$\displaystyle \nu^{{\text{pre}}}_{j}$(t)$\displaystyle \rangle_{T}^{}$ ds' . (11.62)

Due to the linearity of the neuron model, the correlation of input and output is a linear combination of the correlations $ \langle$$ \nu^{{\text{pre}}}_{k}$(t - s$ \nu^{{\text{pre}}}_{j}$(t)$ \rangle_{T}^{}$ in the input firing rates, which are independent from the specific neuron model. We assume that all patterns are presented once during the time interval T that defines the time scale on which we are investigating the weight dynamics. For s = 0 the time average corresponds to an ensemble average over the input patterns and the input correlation functions equals the correlation of the input pattern, $ \langle$$ \nu^{{\text{pre}}}_{k}$(t$ \nu^{{\text{pre}}}_{j}$(t)$ \rangle_{T}^{}$ = $ \langle$$ \xi_{k}^{\mu}$ $ \xi_{j}^{\mu}$$ \rangle_{\mu}^{}$. Here, $ \langle$ . $ \rangle_{\mu}^{}$ denotes an ensemble average over the set of input patterns. Since we have assumed that input patterns are presented randomly for time steps of length $ \Delta$t the correlation $ \langle$$ \nu^{{\text{pre}}}_{k}$(t - s$ \nu^{{\text{pre}}}_{j}$(t)$ \rangle_{T}^{}$ will be computed from two independent input patterns if | s| > $ \Delta$t, i.e., $ \langle$$ \nu^{{\text{pre}}}_{k}$(t - s$ \nu^{{\text{pre}}}_{j}$(t)$ \rangle_{T}^{}$ = $ \langle$$ \xi_{k}^{\mu}$$ \rangle_{\mu}^{}$ $ \langle$$ \xi_{j}^{{\mu}}$$ \rangle_{\mu}^{}$. For 0 < s < $ \Delta$t the input correlation is a linear function of s. Altogether we obtain

$\displaystyle \langle$$\displaystyle \nu^{{\text{pre}}}_{k}$(t - s$\displaystyle \nu^{{\text{pre}}}_{j}$(t)$\displaystyle \rangle_{T}^{}$ = $\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ $\displaystyle \langle$$\displaystyle \xi_{j}^{{\mu}}$$\displaystyle \rangle_{\mu}^{}$ + $\displaystyle \left(\vphantom{ \langle \xi_k^\mu \, \xi_j^\mu \rangle_\mu - \langle \xi_k^\mu \rangle_\mu \, \langle\xi_j^{\mu} \rangle_\mu }\right.$$\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$ $\displaystyle \xi_{j}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ - $\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ $\displaystyle \langle$$\displaystyle \xi_{j}^{{\mu}}$$\displaystyle \rangle_{\mu}^{}$$\displaystyle \left.\vphantom{ \langle \xi_k^\mu \, \xi_j^\mu \rangle_\mu - \langle \xi_k^\mu \rangle_\mu \, \langle\xi_j^{\mu} \rangle_\mu }\right)$ $\displaystyle \Lambda$(s/$\displaystyle \Delta$t) . (11.63)

Here, $ \Lambda$ is the triangular function

$\displaystyle \Lambda$(s) = (1 - | s|) $\displaystyle \Theta$(1 - | s|) ; (11.64)

cf. Fig. 11.9A. If we use this result in the learning equation (11.56) we find

$\displaystyle {\frac{{{\text{d}}}}{{{\text{d}}t}}}$$\displaystyle \bar{w}_{{ij}}^{}$(t) = $\displaystyle \sum_{k}^{}$wik(t$\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ $\displaystyle \langle$$\displaystyle \xi_{j}^{{\mu}}$$\displaystyle \rangle_{\mu}^{}$ $\displaystyle \bar{W}$ + $\displaystyle \sum_{k}^{}$wik(tQkj + wij(t$\displaystyle \langle$$\displaystyle \xi_{j}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ W- , (11.65)

with $ \bar{W}$ = $ \int_{{-\infty}}^{\infty}$W(s) ds, W- = $ \int_{0}^{{\infty}}$$ \epsilon$(sW(- s) ds, and

Qkj = $\displaystyle \left(\vphantom{ \langle \xi_k^\mu \, \xi_j^\mu \rangle_\mu - \langle \xi_k^\mu \rangle_\mu \, \langle\xi_j^{\mu} \rangle_\mu }\right.$$\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$ $\displaystyle \xi_{j}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ - $\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ $\displaystyle \langle$$\displaystyle \xi_{j}^{{\mu}}$$\displaystyle \rangle_{\mu}^{}$$\displaystyle \left.\vphantom{ \langle \xi_k^\mu \, \xi_j^\mu \rangle_\mu - \langle \xi_k^\mu \rangle_\mu \, \langle\xi_j^{\mu} \rangle_\mu }\right)$ $\displaystyle \int_{{-\infty}}^{\infty}$W(s)$\displaystyle \int_{0}^{\infty}$$\displaystyle \epsilon$(s'$\displaystyle \Lambda$$\displaystyle \left(\vphantom{ \frac{s+s'}{\Delta t} }\right.$$\displaystyle {\frac{{s+s'}}{{\Delta t}}}$$\displaystyle \left.\vphantom{ \frac{s+s'}{\Delta t} }\right)$ ds' ds . (11.66)

Here we have used $ \int_{0}^{\infty}$$ \epsilon$(s)ds = 1 and dropped all non-Hebbian terms ( a0 = a1pre = a1post = 0).

Figure 11.9: Static-pattern scenario. A. Temporal correlations in the firing rate of presynaptic neurons have a triangular shape $ \Lambda$(s/$ \Delta$t) (solid line). The correlation between pre- and postsynaptic neurons involves a convolution with the response kernel $ \epsilon$(s) (dashed line). B. The definition of the matrix Qkj in Eq. (11.74) contains the overlap of the learning window W(s) (dashed line) and the convolution $ \int$$ \epsilon$(s'$ \Lambda$[(s - s')/$ \Delta$t] ds' (solid line). If the duration of one presentation is long as compared to the width of the learning window and the response kernel $ \epsilon$ the the overlap equals approximately the area below the learning window $ \bar{W}$. If the presentation is short, as shown here, then the overlap may be different from zero, even if $ \bar{W}$ = 0.
\begin{minipage}{0.45\textwidth}
{\bf A}
\par\includegraphics[width=\textwidth]...
...
{\bf B}
\par\includegraphics[width=\textwidth]{stat_pat_b.eps}
\end{minipage}

In order to understand this result let us first consider the case where both the width of the learning window and the postsynaptic potential is small as compared to the duration $ \Delta$t of one pattern presentation. The integral over s' in the definition of the matrix Qkj is the convolution of $ \epsilon$ with a triangular function centered around s = 0 that has a maximum value of unity. Since $ \epsilon$ is normalized, the convolution yields a smoothed version of the originally triangular function that is approximately equal to unity in a neighborhood of s = 0; cf. Fig. 11.9B. If the learning window is different from zero only in this neighborhood, then the integral over s in Eq. (11.74) is just $ \bar{W}$, the area under the learning window. We can thus collect the first two terms on the right-hand side of Eq. (11.73) and obtain

$\displaystyle {\frac{{{\text{d}}}}{{{\text{d}}t}}}$wij(t) = $\displaystyle \sum_{k}^{}$wik(t$\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$ $\displaystyle \xi_{j}^{{\mu}}$$\displaystyle \rangle_{\mu}^{}$$\displaystyle \bar{W}$ + wij(t$\displaystyle \langle$$\displaystyle \xi_{j}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ W- . (11.67)

Apart from the non-Hebbian term wij(t$ \langle$$ \xi_{j}^{\mu}$$ \rangle_{\mu}^{}$ W- the weight dynamics is determined by the correlation matrix $ \langle$$ \xi_{k}^{\mu}$ $ \xi_{j}^{{\mu}}$$ \rangle_{\mu}^{}$ of the (unnormalized) input patterns. This is exactly what we would have expected from the comparison of rate-based and spike based learning; cf. Eq. (11.60).

More interesting is the case where the time scale of the learning window is of the same order of magnitude as the presentation of an input pattern. In this case, the integral over s in Eq. (11.74) is different from $ \bar{W}$ and we can choose a time window with $ \bar{W}$ = 0 so that the first term on the right-hand side of Eq. (11.73) vanishes. In this case, the weight dynamics is no longer determined by $ \langle$$ \xi_{k}^{\mu}$ $ \xi_{j}^{{\mu}}$$ \rangle_{\mu}^{}$ but by the matrix Qjk,

$\displaystyle {\frac{{{\text{d}}}}{{{\text{d}}t}}}$wij(t) = $\displaystyle \sum_{k}^{}$wik(tQkj + wij(t$\displaystyle \langle$$\displaystyle \xi_{j}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ W- , (11.68)

which is proportional to the properly normalized covariance matrix of the input patterns,

Qkj $\displaystyle \propto$ $\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$ $\displaystyle \xi_{j}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ - $\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$$\displaystyle \rangle_{\mu}^{}$ $\displaystyle \langle$$\displaystyle \xi_{j}^{{\mu}}$$\displaystyle \rangle_{\mu}^{}$ = $\displaystyle \langle$($\displaystyle \xi_{k}^{\mu}$ - $\displaystyle \langle$$\displaystyle \xi_{k}^{\mu}$$\displaystyle \rangle_{\mu}^{}$) ($\displaystyle \xi_{j}^{\mu}$ - $\displaystyle \langle$$\displaystyle \xi_{j}^{{\mu}}$$\displaystyle \rangle_{\mu}^{}$)$\displaystyle \rangle_{\mu}^{}$ . (11.69)

If we assume that all presynaptic neurons have the same mean activity, $ \langle$$ \xi_{k}^{\mu}$$ \rangle_{\mu}^{}$ = $ \langle$$ \xi_{j}^{{\mu}}$$ \rangle_{\mu}^{}$ $ \equiv$ $ \langle$$ \xi^{{\mu}}_{}$$ \rangle_{\mu}^{}$ then we can rewrite Eq. (11.76) as

$\displaystyle {\frac{{{\text{d}}}}{{{\text{d}}t}}}$wij(t) = $\displaystyle \sum_{k}^{}$wik(t) [Qkj + $\displaystyle \delta_{{kj}}^{}$ $\displaystyle \langle$$\displaystyle \xi^{\mu}_{}$$\displaystyle \rangle_{\mu}^{}$ W-] . (11.70)

The eigenvectors and the eigenvalues of the matrix in square brackets are (apart from a common additive constant $ \langle$$ \xi^{\mu}_{}$$ \rangle_{\mu}^{}$ W- for the eigenvalues) the same as those of the matrix Q. We have already seen that this matrix is proportional to the properly normalized covariance matrix of the input patterns. If the proportionality constant is positive, i.e., if the integral over s in Eq. (11.74) is positive, then the dynamics of the weight vector is determined by the principal component of the set of input patterns.


11.2.5 Distribution of Synaptic Weights

If spike arrival times are described as a stochastic process, the weight vector itself is also a random variable that evolves along a fluctuating trajectory. In Section 11.2.1, we have analyzed the expectation value of the synaptic weights smoothed over a certain interval of time. In the limit where the synaptic weights evolve much slower than typical pre- or postsynaptic interspike intervals, an approximation of the weight vector by its expectation values is justified. However, if the synaptic efficacy can be changed substantially by only a few pre- or postsynaptic spikes then the fluctuations of the weights have to be taken into account. Here, we are investigate the resulting distribution of synaptic weights in the framework of a Fokker-Planck equation (Rubin et al., 2001; van Rossum et al., 2000).

Figure 11.10: Transitions of weight values due to synaptic plasticity. The probability density P(w, t) increases if small weights increase, w' $ \longrightarrow$ w' + A+(w), or if large weights decrease, w'' $ \longrightarrow$ w'' - A-(w).
\hbox{\hspace{10mm}
\includegraphics[width=70mm]{Figs-ch-hebb-anal/Fig-FP-transitions.eps}
\hspace{25mm}
}

We consider a single neuron i that receives input from several hundreds of presynaptic neurons. All presynaptic neurons fire independently at a common constant rate $ \nu^{{\text{pre}}}_{}$. We are interested in the probability density P(w, t) for the synaptic weight of a given synapse. We assume that all weights are restricted to the interval [0, wmax] so that the normalization $ \int_{0}^{{w^{\rm max}}}$P(w, t) dw = 1 holds. Weight changes due to potentiation or depression of synapses induce changes in the density function P(w, t). The Fokker-Planck equation that we will derive below describes the evolution of the distribution P(w, t) as a function of time; cf. Fig. 11.10.

For the sake of simplicity, we adopt a learning window with two rectangular phases, i.e.,

W(s) = \begin{displaymath}\begin{cases}
A_+(w_{ij}) & \text{ for } -d<s<0 \\ A_-(w_{ij}) & \text{ for } 0<s<d \\ 0 & \text{ else} \end{cases}\end{displaymath} (11.71)

cf. Fig. 11.11A. Synapses are potentiated if the presynaptic spike shortly precedes the postsynaptic one. If the order of spike firing is reversed, the synapse is depressed.

There are basically two possibilities to restrict the synaptic weights to the interval [0, wmax]; we can either impose hard or soft bounds to the weight dynamics; cf. Section 10.2.1. Hard bounds means that the weights are simply no longer increased (decreased) if the upper (lower) bound is reached. Soft bounds, on the other hand, gradually slow down the evolution if the weight approaches one of its bounds. A simple way to implement soft bounds in our formalism is to define (Kistler and van Hemmen, 2000a)

A+(wij) = (wmax - wija+ , (11.72)
A-(wij) = - wij a- , (11.73)

with constants a+ and a-. The choice of how the bounds are implemented turns out to have an important influence on the weight distribution P(w, t) (Rubin et al., 2001; van Rossum et al., 2000).

Figure 11.11: A. Rectangular learning window W(tj(f) - ti(f)). LTP occurs if the presynaptic spike arrives before the postsynaptic one whereas LTD occurs if the order of timing is reversed. B. Whether LTP or LTD is dominant depends on the overlap between the learning window W(s) (dashed line) and the correlations (solid line) between pre- and postsynaptic spike firing. The correlations consist of a constant bias term and a time-dependent term with a peak at negative values of s; cf. Eq. (11.71).
\hbox{{\bf A} \hspace{60mm} {\bf B}} \hbox{\hspace{10mm}
\includegraphics[width...
...5mm}
\includegraphics[width=50mm]{Figs-ch-hebb-anal/Fig-FP-window-c_new.eps} }

In order to derive the evolution of the distribution P(w, t) we consider transitions in the `weight space' induced by pre- and postsynaptic spike firing. The evolution is described by a master equation of the form

$\displaystyle {\partial \over \partial t}$P(w, t) = - p+(wP(w, t) - p-(wP(w, t) (11.74)
           + $\displaystyle \int_{0}^{{w^{\rm max}}}$$\displaystyle \delta$[w - w' - A+(w')] p+(w', tP(w', t) dw'    
           + $\displaystyle \int_{0}^{{w^{\rm max}}}$$\displaystyle \delta$[w - w' + A-(w')] p-(w', tP(w', t) dw' ;    

cf. Fig. 11.10. Here p+ (or p-) is the probability that a presynaptic spike falls in the positive (or negative) phase of the learning window. Using the definition of the joint firing rate of pre- and postsynaptic neuron

$\displaystyle \nu^{{\text{post,pre}}}_{}$(t, t') = $\displaystyle \langle$Spost(tSpre(t')$\displaystyle \rangle_{E}^{}$ (11.75)

we have

p+(w, t) = $\displaystyle \int_{{-d}}^{0}$$\displaystyle \nu^{{\text{post,pre}}}_{}$(t, t - s) ds (11.76)
p-(w, t) = $\displaystyle \int_{0}^{d}$$\displaystyle \nu^{{\text{post,pre}}}_{}$(t, t - s) ds ; (11.77)

cf. Fig. 11.11B.

Equation (11.82) can be rewritten in the form of a Fokker-Planck equation if we expand the right-hand side to second order in the transition amplitudes A+ and A- (van Kampen, 1992),

$\displaystyle {\partial \over \partial t}$P(w, t) = - $\displaystyle {\partial \over \partial w}$[A(wP(w, t)] + $\displaystyle {\partial^2 \over \partial w^2}$[B(wP(w, t)] (11.78)

with

A(w, t) = p+(w, tA+(w) - p-(w, tA-(w) , (11.79)
B(w, t) = p+(w, tA+2(w) - p-(w, tA-2(w) . (11.80)

Figure 11.12: Stationary distribution of synaptic weights. A. With soft bounds, the distribution of weights P0(w) has a single peak. B. With hard bounds, the distribution peaks at the two boundaries w = 0 and w = wmax; (schematic figure)
\hbox{{\bf A} \hspace{60mm} {\bf B}} \hbox{\hspace{10mm}
\includegraphics[width...
...hspace{25mm}
\includegraphics[width=50mm]{Figs-ch-hebb-anal/Fig-FP-P0-b.eps} }

The Fokker-Planck equation (11.86) can be solved numerically to find stationary solutions. It turns out that the qualitative form of the distribution depends critically on how the bounds for the weights are implemented; cf. Rubin et al. (2001); van Rossum et al. (2000) for details. With soft bounds the distribution is unimodal whereas with hard bounds it peaks at both borders of the interval; cf. Fig. 11.12. Experimental data suggests a unimodal distribution, consistent with soft bounds (van Rossum et al., 2000).


next up previous contents index
Next: 11.3 Summary Up: 11. Learning Equations Previous: 11.1 Learning in Rate
Gerstner and Kistler
Spiking Neuron Models. Single Neurons, Populations, Plasticity
Cambridge University Press, 2002

© Cambridge University Press
This book is in copyright. No reproduction of any part of it may take place without the written permission of Cambridge University Press.