12.5 Transmission of Temporal Codes

While the relevance of spike timing in the millisecond range in cortical areas is a topic of intense debate, there are a few specific systems where temporal coding is generally accepted. One of the most prominent examples is the auditory system of the barn owl (Carr and Konishi, 1988,1990; Konishi, 1986; Carr, 1993,1995), and this is the system we will focus on in this section. Owls hunt at night. From behavioral experiments it is known that owls can locate sound sources even in complete darkness with a remarkable precision. To do so, the signal processing in the auditory pathway must achieve a temporal precision in the microsecond range with elements that are noisy, unreliable and rather slow. In this section, we use the results of previous chapters and show that spiking neurons that are driven in the sub-threshold regime are sensitive to temporal structure in the presynaptic input, in particular to synchronous spike arrival; cf. Sections 5.8 and 7.3. On the other hand, synchronous spike arrival is only possible if presynaptic transmission delays are appropriately tuned. A spike-time dependent Hebbian learning rule can play the role of delay tuning or delay selection (Eurich et al., 1999; Gerstner et al., 1996a,1997; Kempter et al., 1999; Senn et al., 2001a; Hüning et al., 1998).

We start this section with an outline of the problem of sound source localization and a rough sketch of the barn owl auditory pathway. We turn then to the problem of coincidence detection and the idea of delay selection by a spike-time dependent learning rule.

12.5.1 Auditory Pathway and Sound Source Localization

Barn owls use interaural time differences (ITD) for sound source localization (Carr and Konishi, 1990; Moiseff and Konishi, 1981; Jeffress, 1948). Behavioral experiments show that barn owls can locate a sound source in the horizontal plane with a precision of about 1-2 degrees of angle (Knudsen et al., 1979). A simple calculation shows that this corresponds to a temporal difference of a few microseconds (< 5 $\mu$ s) between the sound waves at the left and right ear. These small temporal differences must be detected and evaluated by the owl's auditory system; see Fig. 12.12.

**Figure 12.12:** A. Jeffress model. Sound waves (dashed circles) from a source located to the right of the owl's head arrive at the two ears where they excite neuronal activity. Neuronal signals travel along transmission lines to an array of coincidence detectors (grey filled circles). The coincidence-detecting neurons respond, if signals from both sides arrive simultaneously. Due to transmission delays, the position of the coincidence detector activated by the signals (black filled circle) depends on the location of the external sound source. B. Auditory pathway (schematic). At the cochlea a sound wave is separated into its frequency components. Phase locked spikes are transmitted along the auditory nerve to the nucleus magnocellularis (NM), an intermediate processing step. Action potentials at the output of the NM are phase locked as well. The signals from both ears meet in the nucleus laminaris (NL). Neurons in the NL are sensitive to the interaural time difference (ITD) and can be considered as the coincidence detectors of the Jeffress model. In further processing steps, the output of neurons with different frequencies is combined to resolve remaining ambiguities; taken from Gerstner et al. (1998).
${\bf A} \hspace{55mm} {\bf B} \par\hspace{5mm} \fbox{\includegraphics[width=45... ...space{10mm} \fbox{\includegraphics[width=65mm]{Figs-ch-hebbcode/pathway.eps} }$

The basic principle of how such a time-difference detector could be set up was discussed by Jeffress (1948) more than 50 years ago. It consists of delay lines and an array of coincidence detectors. If the sound source is on the right-hand side of the auditory space, the sound wave arrives first at the right ear and then at the left ear. The signals propagate from both ears along transmission lines towards the set of coincidence detectors. A signal originating from a source located to the right of the owl's head, stimulates a coincidence detector on the left-hand side of the array. If the location of the signal source is shifted, a different coincidence detector responds. The `place' of a coincidence detector is therefore a signature for the location of the external sound source; cf. Fig. 12.12). Such a representation has been called `place' coding (Carr and Konishi, 1990; Konishi, 1986).

Remarkably enough, such a coincidence detector circuit was found four decades later by Carr and Konishi (1990) in the nucleus laminaris of the barn owl. The existence of the circuit confirms the general idea of temporal difference detection by delayed coincidence measurement. It gives, however, no indication of how the precision of a few microseconds is finally achieved.

In order to better understand how precise spike timing arises, we have to study signal processing in the auditory pathway. Three aspects are important: frequency separation, phase locking, and phase-correct averaging.

The first few processing steps along the auditory localization pathway are sketched in Fig. 12.12B. The figure represents, of course, a simplified picture of auditory information processing, but it captures some essential ingredients. At both ears the sound wave is separated into its frequency components. Signals then pass an intermediate processing area called nucleus magnocellularis (NM) and meet at the nucleus laminaris (NL). Neurons there are found to be sensitive to the interaural time difference (ITD). Due to the periodicity of a sinusoidal wave, the ITD of a single frequency channel is really a phase difference and leaves some ambiguities. In the next processing step further up in the auditory pathway, information on phase differences from different frequency channels is combined to retrieve the temporal difference and hence the location of the sound source in the horizontal plane. Reviews of the basic principles of auditory processing in the owl can be found in Konishi (1993,1986).

Let us now discuss the first few processing steps in more detail. After cochlear filtering, different frequencies are processed by different neurons and stay separated up to the nucleus laminaris. In the following we may therefore focus on a single frequency channel and consider a neuron which responds best to a frequency of, say, 5kHz.

If the ear is stimulated with a 5kHz tone, neurons in the 5kHz channel are activated and fire action potentials. At first sight, the spike train looks noisy. A closer look, however, reveals that the pulses are phase locked to the stimulating tone: Spikes occur preferentially around some phase $\varphi_{0}^{}$ with respect to the periodic stimulus. Phase locking is, of course, not perfect, but subject to two types of noise; cf. Fig. 12.13. First, spikes do not occur at every cycle of the 5kHz tone. Often the neuron misses several cycles before it fires again. Second, spikes occur with a temporal jitter of about $\sigma$ = 40 $\mu$ s around the preferred phase (Sullivan and Konishi, 1984) [Sullivan and Konishi, 1984].

**Figure 12.13:** Spike trains in the auditory pathway show phase locking and can be described by a time dependent firing rate $\nu$ (t) [in kHz] which is modulated by the signal. Four samples of spike trains are shown at the bottom of the Figure.
$\hbox{\hspace{15mm} \includegraphics[width=80mm]{Figs-ch-hebbcode/fig0a.ps.bb} }$

For the sake of simplicity we describe the spike train by a Poisson process with a periodically modulated rate

Phase locking can be observed in the auditory nerve connecting the cochlea and the nucleus magnocellularis, in the nucleus magnocellularis, and also in the nucleus laminaris. The phase jitter $\sigma$ even decreases from one processing step to the next so that the temporal precision of phase locking increases from around 40 $\mu$ s in the nucleus magnocellularis to about 25 $\mu$ s in the nucleus laminaris. The precision of phase locking is the topic of the following subsection.

12.5.2 Phase Locking and Coincidence Detection

We focus on a single neuron i in the nucleus laminaris (NL). The neuron receives input from neurons in the nucleus magnocellularis through about 150 synapses. All input lines belong to the same frequency channel. The probability of spike arrival at one of the synapses is given by Eq. (12.13) where j labels the synapses and T = 0.2ms is the period of the signal.

As a neuron model for i we take an integrate-and-fire unit with membrane time constant $\tau_{m}^{}$ and synaptic time constant $\tau_{s}^{}$ . From experiments on chickens it is known that the duration of an EPSP in the NL is remarkably short (< 1ms) (Reyes et al., 1994,1996). Neurons of an auditory specialist like the barn owl may be even faster. In our model equations, we have set $\tau_{m}^{}$ = $\tau_{s}^{}$ = 0.1ms. These values correspond to an EPSP with a duration of about 0.25ms.

The short duration of EPSPs in neurons in the NL and NM is due to an outward rectifying current which sets in when the membrane potential exceeds the resting potential (Manis and Marx, 1991; Oertel, 1983). The purely passive membrane time constant is in the range of 2ms (Reyes et al., 1994), but the outward rectifying current reduces the effective membrane resistance whenever the voltage is above the resting potential. In a conductance-based neuron model (cf. Chapter 2), all membrane currents would be described explicitly. In our integrate-and-fire model, the main effect of the outward rectifying current is taken into account by working with a short effective membrane time constant $\tau_{m}^{}$ =0.1ms. A membrane constant of 0.1ms is much shorter than that found in cortical neurons where $\tau_{m}^{}$ $\approx$ 10 - 50ms seem to be typical values; see, e.g., [Bernander et al. 1991]. Note, however, that for temporal coding in the barn owl auditory system, $\tau_{m}^{}$ = 0.1ms is quite long as compared to the precision of phase locking of 25 $\mu$ s found in auditory neurons and necessary for successful sound source localization.

**Figure 12.14:** Phase Locking (schematic). Action potentials arrive periodically and are phase-locked to the stimulus in bundles of spikes (bottom). The postsynaptic potentials evoked by presynaptic spike arrival are summed and yield the total postsynaptic potential u(t) which shows a pronounced oscillatory structure. Firing occurs when u(t) crosses the threshold. The output spike is phase locked to the external signal, since the threshold crossing is bound to occur during a *rising* phase of u; adapted from Gerstner et al. (1998).
$\hbox{\hspace{15mm} \includegraphics[width=60mm]{Figs-ch-hebbcode/lock.eps} }$

To get an intuitive understanding of how phase locking arises, let us study an idealized situation and take perfectly coherent spikes as input to our model neuron; cf. Fig. 12.14. Specifically, let us consider a situation where 100 input lines converge on the model neuron. On each line, spike arrival is given by (12.13) with $\sigma$ $\to$ 0 and p = 0.2. If the delays $\Delta_{j}^{}$ are the same for all transmission lines ( $\Delta_{j}^{}$ = $\Delta_{0}^{}$ ), then in each cycle a volley of 20±5 synchronized spikes arrive. The EPSPs evoked by those spikes are added as shown schematically in Fig. 12.14. The output spike occurs when the membrane potential crosses the threshold $\vartheta$ . Note that the threshold must be reached from below. It follows that the output spike must always occur during the rise time of the EPSPs generated by the last volley of spikes before firing.

Since the input spikes are phase-locked to the stimulus, the output spike will also be phase-locked to the acoustic waveform. The preferred phase of the output spike $\varphi_{i}^{}$ will, of course be slightly delayed with respect to the input phase $\varphi_{0}^{}$ = $\Delta_{0}^{}$ (2 $\pi$ /T). The typical delay will be less than the rise time $\tau_{{\rm rise}}^{}$ of an EPSP. Thus, $\varphi_{i}^{}$ = ( $\Delta_{0}^{}$ +0.5 $\tau_{{\rm rise}}^{}$ ) (2 $\pi$ /T) will be a reasonable estimate of the preferred output phase.

**Figure 12.15:** A. Membrane potential u(t) of an integrate-and-fire neuron as a function of time. B. Rate $\nu_{j}^{}$ (t) of presynaptic firing during 5kHz stimulation and four samples of input spike trains (vertical bars). The model neuron receives input from 154 presynaptic neurons (Carr and Konishi, 1990; Carr, 1993) in volleys of phase-locked spikes with a jitter of $\sigma$ = 40 $\mu$ s driven by a 5kHz tone. Spikes are generated by a stochastic process with periodically modulated rate (solid line in B). C. Histogram of spike arrival times (number of spikes N_s in bins of 5 $\mu$ s) summed over all 154 synapses. Each input spike evokes an excitatory postsynaptic potential (EPSP) shown on an enlarged voltage scale (same time scale) in the inset of A. The EPSPs from all neurons are added linearly and yield the membrane voltage u (A, main figure). With the spike input shown in C the membrane voltage exhibits oscillations (solid line). The model neuron fires (arrow) if u reaches a threshold $\vartheta$ . Firing must always occur during the time when u increases so that, in the case of coherent input, output spikes are phase-locked as well (see also Fig. 12.16A). If input spikes arrive incoherently, u(t) follows a trajectory with stochastic fluctuations but no systematic oscillations (dashed line in A); see Fig. 12.16B. Voltage in a: arbitrary units; the threshold $\vartheta$ is 36 times the amplitude of a single EPSP. Rate in B in kHz. Taken from Gerstner et al. (1996a).
$\par\includegraphics[width=100mm]{Figs-ch-hebbcode/fig1a.ps.bb} \par\includegraphics[width=100mm]{Figs-ch-hebbcode/fig1bc.ps.bb}$

Can we transfer the above qualitative arguments to a more realistic scenario? We have simulated a neuron with 154 input lines. At each synapse spikes arrive a time-dependent rate as in Eq. (12.13). The temporal jitter has been set to $\sigma$ = 40 $\mu$ s. The delays $\Delta_{j}^{}$ (and hence the preferred phases) have a jitter of 35 $\mu$ s around some mean value $\Delta_{0}^{}$ . As before, p = 0.2 for all inputs.

A short interval taken from a longer simulation run with these input parameters is shown in Fig. 12.15. Part A shows the membrane potential u(t) as a function of time; Fig. 12.15B and C show the distribution of spike arrival times. Even though spike arrival is rather noisy, the trajectory of the membrane potential exhibits characteristic periodic modulations. Hence, following the same arguments as in Fig. 12.14 we expect the output spike to be phase-locked. Fig. 12.16A confirms our expectations: the distribution of output phases exhibits a pronounced peak. The width of the distribution corresponds to a temporal precision of $\sigma_{{\rm out}}^{}$ = 25 $\mu$ s, a significant increase in precision compared to the input jitter $\sigma$ = 40 $\mu$ s.

**Figure 12.16:** Phase histograms of output spikes. A. The input spikes from the 154 presynaptic neurons arrive *coherently* with the spiking statistics as shown in Figs. 12.15B and C In this case, the distribution of output spikes exhibits a pronounced maximum indicating a high degree of phase locking. The width of the peak corresponds to a temporal precision of 25 $\mu$ s. B. If input spikes arrive incoherently, the histogram of output phases has has no significant structure; taken from Gerstner et al. (1996a).
$\hbox{ \par {\bf A } \hspace{50mm} {\bf B} } \hbox{\hspace{5mm} \includegraphic... ...mm} \includegraphics[width=35mm]{Figs-ch-hebbcode/owl-psth.eps} } \vspace{0mm}$

So far we have assumed that the delays $\Delta_{j}^{}$ have a small variation of 35 $\mu$ s only. Hence the preferred phases $\varphi_{j}^{}$ = $\Delta_{j}^{}$ (2 $\pi$ /T) are nearly identical for all input lines. If the preferred phases are drawn stochastically from a uniform distribution over [0, 2 $\pi$ ], then spike arrival at the neuron is effectively incoherent, even though the spikes on each input line exhibit phase-locking. If input spikes arrive incoherently, the temporal precision is lost and the output spikes have a flat phase distribution; see Fig. 12.16B.

We conclude that spiking neurons are capable of transmitting phase information, if input spikes arrive with a high degree of coherence. If input spikes arrive incoherently, the temporal information is lost. As we will see in the following subsection, this observation implies that that the reliable transmission of temporal codes requires a mechanism for delay-line tuning.

12.5.3 Tuning of Delay Lines

Each neuron in the nucleus laminaris (NL) of the barn owl receives input from about 150 presynaptic neurons (Carr and Konishi, 1990; Carr, 1993) [Carr and Konishi, 1990; Carr, 1993]. The high degree of convergence enables the neuron to increase the signal-to-noise ratio by averaging over many (noisy) transmission lines. As we have seen in the preceding section, the temporal precision of phase locking is indeed increased from 40 $\mu$ s in the input lines to 25 $\mu$ s in the output of our model neuron in the NL.

Such an averaging scheme, however, can work only, if the preferred phases $\varphi_{j}^{}$ of all input lines are (nearly) the same. Otherwise the temporal precision is decreased or even lost completely as shown in Fig. 12.16B. To improve the signal-to-noise ratio, `phase-correct' averaging is needed. The question arises of how a neuron in the NL can perform correct averaging.

The total delay from the ear to the NL has been estimated to be in the range of 2-3ms (Carr and Konishi, 1990). Even if the transmission delays vary by only 0.1-0.2ms between one transmission line and the next, the phase information of a 5kHz signal is completely lost when the signals arrive at the NL. Therefore the delays must be precisely tuned so as to allow the neurons to perform phase-correct averaging.

Precise wiring of the auditory connections could be set up genetically. This is, however, rather unlikely since the owl's head grows considerably during development. Moreover, while neurons in the nucleus laminaris of the adult owl are sensitive to the interaural phase difference, no such sensitivity was found for young owls (Carr, 1995). This indicates that delay tuning arises only later during development. It is clear that there can be no external supervisor or controller that selects the appropriate delays. What the owl needs is an adaptive mechanism which can be implemented locally and which achieves a tuning of appropriate delays.

**Figure 12.17:** Delay selection (schematic). A. When pulses are generated at the ear, they are phase locked to the periodic stimulus (dashed). B. Several transmission lines converge on a single coincidence detector neuron in the nucleus laminaris. In order to achieve a high temporal resolution, pulses should *arrive* synchronously at the coincidence detector neuron. With a broad distribution of delays, spike arrival is effectively asynchronous (spike train at top). Spike-time dependent Hebbian learning selects and reinforces some of the transmission lines and suppresses other (black crosses). After learning, pulses arrive with a high degree of coherence (spike train, middle). The periodic stimulus is represented for comparison (dashed line, bottom).
$\hbox{{\bf A} \hspace{55mm} {\bf B}} \hbox{\hspace{5mm} \includegraphics[height... ...s} \hspace{3mm} \includegraphics[height=35mm]{Figs-ch-hebbcode/select-b.eps} }$

Tuning can be achieved either by selection of appropriate delay lines (Gerstner et al., 1996a) or by changes in axonal parameters that influence the transmission delay along the axon (Eurich et al., 1999; Senn et al., 2001a; Hüning et al., 1998). We focus on learning by delay selection; cf. Fig. 12.17. Immediately after birth a large number of connections are formed. We suppose that during an early period of post-natal development a tuning process takes place which selectively reinforces transmission lines with similar preferred phase and eliminates others (Gerstner et al., 1996a). The selection process can be implemented by a spike-time dependent Hebbian learning rule as introduced in Chapter 10.

**Figure 12.18:** Learning Window W as a function of the delay s between postsynaptic firing and presynaptic spike arrival. The graph on the right-hand side shows the boxed region around the maximum on an expanded scale. If W(s) is positive (negative) for some s, the synaptic efficacy is increased (decreased). The postsynaptic firing occurs at s = 0 (vertical dashed line). Learning is most efficient if presynaptic spikes arrive shortly before the postsynaptic neuron starts firing, as in synapse A. Another synapse B, which fires *after* the postsynaptic spike, is decreased. Taken from Gerstner et al. (1996a).
$\hbox{\hspace{0mm} \includegraphics[height=48mm]{Figs-ch-hebbcode/fig2d.ps.bb} \includegraphics[height=48mm]{Figs-ch-hebbcode/fig2d_2.ps.bb} }$

In order to illustrate delay selection in a model study, we assume that both ears are stimulated by a pure 5kHz tone with interaural time difference ITD=0. The effect of stimulation is that spikes arrive at the synapses with a periodically modulated rate $\nu_{j}^{}$ (t) as given by Eq. (12.13). During learning, synaptic weights are modified according to a spike-time dependent Hebbian learning rule

$\displaystyle {\frac{{{\text{d}}}}{{{\text{d}}t}}}$ w_ij(t) = S_j(t) $\displaystyle \left[\vphantom{ a_1^{\text{pre}} + \int_0^\infty W(s) \, S_i(t-s) \; {\text{d}}s }\right.$ a₁^pre + $\displaystyle \int_{0}^{\infty}$ W(s) S_i(t - s) ds $\displaystyle \left.\vphantom{ a_1^{\text{pre}} + \int_0^\infty W(s) \, S_i(t-s) \; {\text{d}}s }\right]$ + S_i(t) $\displaystyle \int_{0}^{\infty}$ W(- s) S_j(t - s) ds ,

(12.12)

The non-Hebbian term a₁^pre in Eq. (12.14) is taken as small but positive. The learning window W(s) is the one shown in Fig. 12.18. It has a negative integral $\overline{{W}}$ = $\int$ W(s) ds < 0 and a maximum at s^* = - 0.05ms. The choice s^* = - $\tau_{{\rm rise}}^{}$ /2 guarantees stable learning (Gerstner et al., 1996a). As we have seen in Section 11.2.3, the combination of a learning window with negative integral and a positive non-Hebbian term a₁^pre leads to a stabilization of the postsynaptic firing rate. Thus the postsynpatic neuron remains during learning in the subthreshold regime, where it is most sensitive to temporal coding; cf. Section 5.8. The rate stabilization induces in turn an effective competition between different synapses. Thus, we expect that some synapses grow at the expense of others that must decay.

The results of a simulation run are shown in Fig. 12.19. Before learning the neurons receives input over about 600 synapses from presynaptic neurons. Half of the input lines originate from the left, the other half from the right ear. The total transmission delays $\Delta_{j}^{}$ are different between one line and the next and vary between 2 and 3ms. At the beginning of learning all synaptic efficacies have the same strength w_ij = 1 for all j. The homogeneous weight distribution becomes unstable during learning (Fig. 12.19, Middle). The instability can been confirmed analytically using the methods developped in Chapter 11 (Kempter, 1997). After learning the synaptic efficacies have approached either the upper bound w_max = 3 or they have decayed to zero. The transmission lines which remain after learning have either very similar delays, or delays differing by a full period (Fig. 12.19, Bottom). Thus, the remaining delays form a consistent pattern that guarantees that spikes arrive with a high degree of coherence.

**Figure 12.19:** Development of tuning to a 5kHz tone. The left column shows the strength of synaptic efficacies w_ij of all synapses. Synapses are indexed according to the delay $\Delta_{j}^{}$ of the corresponding transmission line and are plotted as w_ij = w( $\Delta$ ). On the right, we show the vector strength (vs, solid line) and the output firing rate ( $\nu$ , dashed) as a function of the interaural time delay (ITD). **Top**. Before learning, there are 600 synapses (300 from each ear) with different delays, chosen randomly from a Gaussian distribution with mean 2.5ms and variance 0.3ms. All weights have unit value. The output is not phase-locked ( vs $\approx$ 0.1) and shows no dependence upon the ITD. **Middle**. During learning, some synapses are strengthened others decreased. Those synapses which increase have delays that are similar or that differ by multiples of the period T = 0.2ms of the stimulating tone. The vector strength of the output increases and starts to depend on the ITD. **Bottom**. After learning, only about 150 synapses ( $\approx$ 75 from each ear) survive. Both the output firing rate $\nu$ and the vector strength vs show the characteristic dependence upon the ITD as seen in experiments with adult owls [Carr and Konishi, 1990]. The neuron has the maximal response ( $\nu$ = 200Hz) for ITD=0, the stimulus used during the learning session of the model neuron. The vector strength at ITD=0 is vs $\approx$ 0.8 which corresponds to a temporal precision of 25 $\mu$ s. Taken from Gerstner et al. (1997).
$\includegraphics[height=80mm]{Figs-ch-hebbcode/fig-cns.eps}$

The sensitivity of the output firing rate to the interaural time difference (ITD) and the degree of phase locking were tested before, during, and after learning (right column in Fig. 12.19). Before learning, the neuron shows no sensitivity to the ITD. This means that the neuron is not a useful coincidence detector for the sound source localization task. During learning ITD sensitivity develops similar to that found in experiments (Carr, 1995). After learning the output rate is significantly modulated as a function of ITD. The response is maximal for ITD=0, the ITD used during learning. The form of the ITD tuning curves corresponds to experimental measurements.

To test the degree of phase locking in the output we have plotted the vector strength, vs, as a function of ITD. By definition the vector strength is proportional to the first Fourier component of the histogram of phase distributions; cf. Fig. 12.16. It is therefore a suitable measure of phase-locking. The vector strength is normalized so that vs = 1 indicates perfect phase locking (infinite temporal precision or $\sigma_{{\rm out}}^{}$ =0). Let us focus on the value of vs in the case of optimal stimulation (ITD=0). Before learning vs $\approx$ 0.1, which indicates that there is no significant phase locking. The value of vs $\approx$ 0.8 found after learning confirms that after the tuning of the synapses, phase locking is very pronounced.

To summarize, spike-time dependent Hebbian synaptic plasticity selects delay lines so that spikes arrive with maximal coherence. After learning the postsynaptic neuron is sensitive to the interaural time difference, as it should be for the neuron that are used for sound source localization. A temporal resolution in the range of a few microseconds can be achieved even though the membrane time constant and synaptic time constant are in the range of 100 microseconds.