# Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element

Geoffrey W. Burr, Senior Member, Robert M. Shelby, Severin Sidler, Carmelo di Nolfo, Junwoo Jang, Irem Boybat, Student Member, Rohit S. Shenoy, Member, Pritish Narayanan, Member, Kumar Virwani, Member, Emanuele U. Giacometti, Bülent Kurdi, and Hyunsang Hwang, Member

*Abstract*—Using 2 phase-change memory (PCM) devices per synapse, a 3–layer perceptron network with 164,885 synapses is trained on a subset (5000 examples) of the MNIST database of handwritten digits using a backpropagation variant suitable for nonvolatile memory (NVM) + selector crossbar arrays, obtaining a training (generalization) accuracy of 82.2% (82.9%). Using a neural network (NN) simulator matched to the experimental demonstrator, extensive tolerancing is performed with respect to NVM variability, yield, and the stochasticity, linearity and asymmetry of the NVM-conductance response. We show that a bidirectional NVM with a symmetric, linear conductance response of high dynamic range is capable of delivering the same high classification accuracies on this problem as a conventional, software-based implementation of this same network.

# I. INTRODUCTION

**D**ENSE arrays of nonvolatile memory (NVM) and selector device-pairs (Fig. 1) can implement neuro-inspired non-Von Neumann computing [1], [2], using pairs [2] of NVM devices as programmable (plastic) bipolar synapses. Work to date has emphasized the Spike-Timing-Dependent-Plasticity (STDP) algorithm [1], [2], motivated by synaptic measurements in real brains. However, experimental NVM demonstrations have been limited in size ( $\leq 100$  synapses), and few results have reported quantitative performance metrics such as classification accuracy. Worse yet, it has been difficult to be sure whether the relatively poor metrics reported to date might be due to immaturities or inefficiencies in the STDP learning algorithm (as it is currently implemented), rather than reflective of problems introduced by the imperfections of the NVM devices.

Unlike STDP, backpropagation is a widely-used, wellstudied method in training artificial neural networks, offering benchmark-able performance on datasets such as handwritten

G. W. Burr (e-mail:gwburr@us.ibm.com), R. M. Shelby, S. Sidler, C. di Nolfo, P. Narayanan, K. Virwani, and B. Kurdi are with IBM Research – Almaden, San Jose, CA 95120.

I. Boybat is now with EPFL, Lausanne, Switzerland.

J. Jang is with the Department of Creative IT Engineering, Pohang University of Science and Technology, Pohang 790-784, Korea.

R. S. Shenoy is now with Intel, Santa Clara, CA 95054.

E. U. Giacometti was an intern at IBM Research-Almaden, San Jose, CA in 2014.

H. Hwang is with the Department of Material Science and Engineering, Pohang University of Science and Technology, Pohang 790-784, Korea. Manuscript received March 9, 2015; revised May, 2015.



Fig. 1. Neuro-inspired non-Von Neumann computing [1], [2], in which neurons activate each other through dense networks of programmable synaptic weights, can be implemented using dense crossbar arrays of nonvolatile memory (NVM) and selector device-pairs.



Fig. 2. In forward evaluation of a multi-layer perceptron, each layer's neurons drive the next layer through weights  $w_{ij}$  and a nonlinearity f(). Input neurons are driven by pixels from successive MNIST images (cropped to  $22 \times 24$ ); the 10 output neurons identify which digit was presented.

digits (MNIST) [3]. Although proposed earlier, it gained great popularity in the 1980s [3], [4], and with the advent of GPUs, backpropagation now dominates the neural network field. In the present work, we use backpropagation to train a relatively simple multi-layer perceptron network (Fig. 2). During forward evaluation of this network, each layer's inputs  $(x_i)$  drive the next layer's neurons through weights  $w_{ij}$  and a nonlinearity f() (Fig. 2). Supervised learning occurs (Fig. 3) by backpropagating error terms  $\delta_j$  to adjust each weight  $w_{ij}$  as the second step. A 3–layer network is capable of accuracies, on previously unseen "test" images (generalization), of ~97% [3] (Fig. 4); even higher accuracy is possible by first "pre-training" the weights in each layer [5]. (Here we use tanh() as the



Fig. 3. In supervised learning, error terms  $\delta_j$  are backpropagated, adjusting each weight  $w_{ij}$  to minimize an "energy" function by gradient descent, reducing classification error between computed  $(x_l^D)$  and desired output vectors  $(g_l)$ .



Fig. 4. A 3–layer perceptron network can classify previously unseen ("test") MNIST handwritten digits with up to  $\sim$ 97% accuracy [3]. Training on a subset of the images sacrifices some generalization accuracy but speeds up training.

nonlinear function f(), and one bias (always-ON) neuron is added to each layer other than the output layer in addition to those neurons shown in Fig. 2. Like with STDP, low-power neurons should be achievable by emphasizing brief spikes [6] and local-only clocking. However, note that no CMOS neuron circuitry is built or even specified in this paper — our focus will be solely on the effects of the imperfections of the NVM elements.

We choose to work with phase-change memory (PCM) since we have access to large PCM arrays in hardware. We discuss the consequences of the fundamental asymmetry in PCM conductance response: the fact that small conductance increases can be implemented through "Partial–SET" pulses, but the RESET (conductance decrease) operation tends to be quite abrupt. However, we also discuss the use of bidirectional NVM devices (such as non-filamentary RRAM [7]). We show that such a bidirectional NVM with a symmetric, linear conductance response is fully capable of delivering the same high classification accuracies (on the problem we study, handwritten digit recognition) as a conventional, software-based implementation of the same neural network.



Fig. 5. By comparing total read signal between pairs of bitlines, summation of synaptic weights (encoded as conductance differences,  $w_{ij} = G^+ - G^-$ ) is highly parallel.



Fig. 6. Backpropagation calls for each weight updated by to be  $\Delta w_{ij}$ =  $\eta x_i \delta_j$ , where  $\eta$  is the *learning* rate. Colormap shows log(occurrences), in the  $1^{st}$ layer, during NN training (blue curve. Fig. 4); white contours identify the quantized increase in the integer weight.



Fig. 7. In a crossbar array, efficient learning requires neurons to update weights in parallel, firing pulses whose overlap at the various NVM devices implements training. Colormap shows log(occurrences), in the 1st layer, during NN training (red curve, Fig. 8); white contours identify the quantized increase in the integer weight.



Fig. 8. Computer NN simulations show that a crossbar-compatible weightupdate rule (Fig. 7) is just as effective as the conventional update rule (Fig. 6).



Fig. 9. The conductance response of an NVM device exhibits imperfections, including nonlinearity, stochasticity, varying maxima, asymmetry between increasing/decreasing responses, and non-responsive devices (at low or high G).

# II. CONSIDERATIONS FOR A CROSSBAR IMPLEMENTATION

By encoding synaptic weight in the conductance difference between a pair of nonvolatile memory devices,  $w_{ij} = G^+ - G^-$  [2], forward propagation simply compares total read signal on bitlines (Fig. 5). This can be performed by encoding x using some combination of voltage-domain or time-domain encoding (either number of read pulses or pulse duration). These CMOS circuitry choices are interesting and important topics, but are beyond the scope of this paper.

Any non-volatile memory device that can offer a nondestructive parallel read (as shown in Fig. 5) of memory states that can be smoothly adjusted up or down through a wide range of analog values could potentially be used in this application. Here we focus on NVM devices that offer a range of analog conductance states.

This paper is concerned with how real NVM devices will respond to programming instructions during in situ training of their artificial neural network. Unfortunately, the conventional backpropagation algorithm [4] calls for weight updates  $\Delta w_{ii} \propto x_i \delta_i$  (Fig. 6), which forces upstream i and downstream j neurons to exchange information uniquely for each and every synapse. This serial, element-by-element information exchange between neurons is highly undesirable in a crossbar array implementation. One alternative is to have each neuron, downstream and upstream, fire pulses based on their local knowledge of  $x_i$  and  $\delta_i$ , respectively. The presence of a nonlinear selector is critical to ensure that NVM programming occurs only when pulses from both the upstream and downstream neurons overlap. This allows neurons to modify weights in parallel, making learning much more efficient [1] (Fig. 7). (Note that to reduce peak power, one might choose to stagger these write pulses across the array, one sub-block at a time.) Fig. 8 shows, using a simulation of the neural network in Figs. 2,3, that this adaptation for nonvolatile memory implementation has no adverse effect on accuracy.

However, while modifying the update rule is clearly not a problem, the conductance response of any real nonvolatile memory device exhibits imperfections that can decidedly affect the neural network performance. These imperfections include nonlinearity, stochasticity, varying maxima, asymmetry between increasing/decreasing responses, and non-responsive devices at low or high conductance (Fig. 9). The initial version



Fig. 10. Bounding G values reduces NN training accuracy slightly, but unidirectionality and nonlinearity in G-response can strongly degrade accuracy. Figure insets map NVM-pair synapse states on a diamond-shaped plot of  $G^+$  vs.  $G^-$  (weight is vertical position) for a sampled subset of the weights.



Fig. 11. If G values can only be increased (asymmetric G-response), a synapse at point **A** ( $G^+$  saturated) can only increase  $G^-$ , leading to a low weight value (**B**). If response at small G values differs from that at large G (nonlinear G-response), alternating weight updates can no longer cancel. As synapses tend to get herded into the same portion of the G-diamond ( $\mathbf{C} \rightarrow \mathbf{D}$ ), the decrease in average weight can lead to network freeze-out.

of this work [8] was the first paper to study the relative importance of each of these factors. This expanded version adds significantly more explanatory details, as well as several new plots detailing paths for future improvement.

Bounding G values would appear to reduce neural network training accuracy slightly, as shown by the difference between the blue and red (top two) curves in Fig. 10. However,



Fig. 12. Synapses with large conductance values (inset, right edge of G-diamond) can be refreshed (moved left) while preserving the weight (to some accuracy), by RESETs to both G followed by a partial–SET of one. If such RESETs are too infrequent, weight evolution stagnates and NN accuracy degrades.



Fig. 13. Mushroom-cell [9], 1T1R PCM devices (180nm node) with 2 metal interconnect layers enable  $512 \times 1024$  arrays. A 1-bit sense amplifier measures *G* values, passing the data to software-based neurons. Conductances are increased by identical 25ns "partial–SET" pulses to increase  $G^+$  ( $G^-$ ) (Fig. 7), or by RESETs to both *G* followed by an iterative SET procedure (Fig. 12).

unidirectionality and nonlinearity in the *G*-response strongly degrade accuracy (bottom two curves, green and magenta). Figure insets (Fig. 10) map non-volatile memory pair synapse states on a diamond-shaped plot of  $G^+$  vs.  $G^-$  (weight is vertical position). In this context (Fig. 11), a PCM-based synapse with a highly *asymmetric G*-response (only partial–SET can be done gradually) moves only unidirectionally, from left-to-right. (Bipolar filamentary RRAM or CBRAM would have an identical problem, except that SET is the abrupt step and it is the RESET step which can be performed gradually.)

Once one G value is saturated, subsequent training can only increase the other G value, reducing weight magnitude. *Nonlinearity* in G-response further encourages weights of low value. If the response at small G values differs from that at large G, alternating weight updates no longer cancel. As synapses are herded into the same portion of the G-diamond (Fig. 11), the decrease in average weight can lead to network "freeze-out" (Fig. 10 inset). In such a condition, the network chooses to update very few if any weights, meaning that the network stops evolving towards higher accuracy. Worse yet, since the few weight updates that do occur are quite likely to lead to weight magnitude decay, previously trained information is steadily erased and accuracy can actually decrease (bottom two curves, Fig. 10).

One solution to the highly asymmetric response of PCM devices is occasional RESET [2], moving synapses back to the left edge of the "G-diamond" while preserving weight value (using an iterative SET procedure, Fig. 12 inset). However, if this is not done frequently enough, weight stagnation will degrade neural network accuracy (Fig. 12). (An analogous approach for bipolar filamentary RRAM or CBRAM would be occasional SET.)

#### **III. EXPERIMENTAL RESULTS**

We implemented a 3–layer perceptron of 164,885 synapses (Figs. 2,3) on a  $500 \times 661$  array of mushroom-cell [9], 1T1R PCM devices (180nm node, Fig. 13). While the weight update algorithm (Fig. 7) is fully compatible with a crossbar implementation, our hardware allows only sequential access to each



Fig. 14. Although G values are measured sequentially, weight summation and weight update procedures in software-based our neurons closely mimic the column (and row) integrations and pulse-overlap programming needed for parallel operations across а crossbar array. However, since occasional RESET triggered when is both  $G^+$  and  $G^-$  are serial device large, access is required to obtain and then re-program individual conductances.



Fig. 15. Training and test accuracy for a 3–layer perceptron of 164,885 hardware-synapses, with all weight operations taking place on a  $500 \times 661$  array of mushroom-cell [6] PCM devices (Fig. 13). Also shown is a matched computer simulation of this NN, using parameters extracted from the experiment.



Fig. 16. 50-point cumulative distributions of experimentally measured conductances for the 500  $\times$  661 PCM array, showing variability and stuck-ON pixel rate. Insets show the measured RESET accuracy, and the rate and stochasticity of *G*-response, plotted as a colormap of  $\Delta G$ -per-pulse vs. *G*.



Fig. 17. Fitted G-response vs. number of pulses (blue average, red  $\pm 1\sigma$  responses), obtained from our computer model (inset) for the rate and stochasticity of G-response ( $\Delta G$ -per-pulse vs. G) matched to experiment (Fig. 16).

PCM device (Fig. 13). For read, a sense amplifier measures G values, passing the data to software-based neurons. Although this measurement is performed sequentially, weight summation and weight update procedures in the software-based neurons closely mimic the column- and row-based integrations. (Again, since no particular CMOS circuitry has been specified, we assume that the 8-bit value of  $x_i$  is implemented completely accurately. Any problems introduced by inaccurate encoding of  $x_i$  values by real CMOS hardware could be easily assessed using our tolerancing simulator.)

Weights are increased (decreased) by identical "partial– SET" pulses (Fig. 7) to increase  $G^+$  (increase  $G^-$ ) (Fig. 14). The deviation from true crossbar implementation occurs upon occasional RESET (Fig. 12), triggered when either  $G^+$  or  $G^$ are large, thus requiring both knowledge of and control over individual G values. Serial device access is required, both to measure the G values (to determine which are in the "L– shaped" region at the right side of the G-diamond) and then to fire two RESET pulses (at both  $G^+$  and  $G^-$ ) followed by an iterative SET procedure to increase one of those two conductances until the correct synaptic weight is restored. Since the time and energy associated with this process are large, it is highly desirable to perform occasional-RESET as infrequently and as inaccurately as possible.

Fig. 15 shows measured accuracies for a hardware-synapse neural network, with **all weight operations taking place on PCM devices**. To reduce test time, weight updates for each *mini-batch* of 5 MNIST examples were applied together. Fig. 16 plots measured *G*-response, stochasticity, variability, stuck-ON pixel rate, and RESET accuracy. By matching all parameters including stochasticity (Fig. 17) to those measured during the experiment, our neural network computer simulation was able to precisely reproduce the measured accuracy trends (Fig. 15).

### IV. TOLERANCING AND POWER CONSIDERATIONS

We can now use this matched neural network simulation to explore the importance of nonvolatile memory imperfections. Fig. 18 shows final training (test) accuracy as a function of variations in nonvolatile memory and neural network parameters away from the conditions used in our hardware demo (green dotted line). NN performance is highly robust to stochasticity (Fig. 18(a)), variable maxima (c), the presence of non-responsive devices (d,e), and infrequent and inaccurate RESETs (f,g). A mini-batch of size 1 allows weight updates to be applied immediately (Fig. 18(h)). However, as mentioned earlier, nonlinearity and asymmetry in G-response (Fig. 18(b)) limit the maximum possible accuracy (here, to  $\sim 85\%$ ), and require precise tuning of the learning rate and neuron-response (f') (Fig. 18(i,j)). Too low a learning rate and no weight receives any update; too high, and the imperfections in the NVM response generate chaos. The narrow distribution of these parameters means that the experiment must be tuned very carefully. An extension of an existing neural network technique to a crossbar-based neural network has been found to provide a much broader distribution of the learning rate. This technique is currently under investigation and will be the subject of a future publication.

## V. DISCUSSION

While the asymmetric *G*-response of PCM makes it necessary to occasionally stop training, measure all conductances, and apply RESETs and iterative SETs, energy usage can be reasonable if RESETs are infrequent (Fig. 19, inset), and if learning rate is low (Fig. 19).

Neural network performance with bidirectional nonvolatile memory-based synapses can deliver high classification accuracy if G-response is linear and symmetric (Fig. 20, green curve) rather than nonlinear (red curve). Asymmetry in Gresponse (blue curve) strongly degrades performance. In Fig. 21, we further explore the trends with an ideal but nonlinear NVM, varying both the initial steepness of the Gresponse and the choice of "fully bidirectional" weight updates (when increasing weight, for instance, we both increase  $G^+$ and decrease  $G^-$  together) or "alternating bidirectional" (we choose one, but not both, of these two steps). Clearly, a lesssteep response is favorable, and the distinction between fully or alternating bidirectional has the most impact for steeply nonlinear G-responses.

The most ideal NVM, with a linear and symmetric conductance response in both directions, would result in more regularly distributed weight values and less freeze-outs, leading to higher accuracies. In Fig. 22 and Fig. 23, we show that a gentle linear response (e.g., a large number of identical pulses are needed to change the conductance from minimum to maximum conductance and vice-versa), is advantageous compared to a steep response. While both the alternating bidirectional and fully bidirectional update schemes deliver higher accuracies than an NVM with a nonlinear conductance response, only the fully bidirectional update scheme reaches the same high test accuracies exhibited by networks in which the NVM conductances are unbounded (Fig. 23, inset). Fig. 24 replots the same data from Fig. 23 on a logarithmic vertical scale, to accentuate the high accuracy region.

The reason for this difference is shown in Fig. 25, where one example of a desired sequence of weight updates is contrasted



Fig. 19. Despite the higher power involved in RESET rather than partial-SET (30pJ and 3pJ for highly-scaled PCM [1]), total energy costs of training can be minimized if RESETs are sufficiently infrequent (inset). Low-energy training requires low learning rates, which minimize the number of synaptic programming pulses. At higher learning rates, even a bi-directional, linear NVM requiring no RESET and offering low power (1pJ per pulse) can lead to large training energy.

0.5

2

5

0.2



Fig. 20. NN performance is improved if G-response is linear and symmetric (green curve) rather than nonlinear (red). However, asymmetry between the up- and down-going G-responses (blue), if not corrected in the weight-update rule (Fig. 7), can strongly degrade performance by favoring particular regions of the G-diamond (Figs. 10,11).

updates before applying them However. the nonlinear and asymmetric G-response limits accuracy to  $\sim 85\%$ , and requires learning rate and neuron-response (f') to be precisely tuned. 10 Linear, unbounded, symmetric 1% 5% 10% alternating bidirectional

Fig. 18.

robust to

the presence

variable

to

Matched

maxima,

of non-

stochasticity,

simulations show that an

NVM-based NN is highly

responsive devices, and

infrequent or inaccurate

RESETs. A mini-batch

size of 1 avoids the need accumulate weight



Fig. 21. NN performance with a bidirectional but nonlinear G-response (same basic shape as Fig. 20, red inset curve) improves as the response becomes more gentle and the initial slope is less steep. The choice between always updating both conductances when updating the weight ("fully bidirectional") and alternating between updating  $G^+$  or  $G^-$  but not both ("alternating" bidirectional"), has the most impact when the G-response is steeply nonlinear. (Note that due to changes in learning rate and the slope of the nonlinear function f(), the red curve in Fig. 20 is not duplicated here.)



Fig. 22. NN performance when alternating between updating  $G^+$  or  $G^-$  but not both ("alternating bidirectional"), with a linear G-response. This update method cannot reach the performance of a network with unbounded synaptic weights, even when the dynamic range of the linear response is large (e.g., when the change due to any one pulse is only a small fraction of the range between the minimum and maximum conductances).



Fig. 23. NN performance (classification accuracy during training) when updating both  $G^+$  and  $G^-$  ("fully bidirectional" scheme), with a linear *G*-response. The inset shows that, when the dynamic range of the linear response is large, the classification accuracy can now reach that of the original network (a *test* accuracy of 94% when trained with 5,000 images; 97% when trained with all 60,000 images).



Fig. 24. The same NN performance data for a linear, symmetric NVM under the fully bidirectional scheme (Fig. 23), here replotted on a logarithmic scale that accentuates the high accuracy region. Here only the classification accuracy on the training set is shown, which can reach close to 100%, at which point the accuracy on the test set of 10,000 images that the network has never seen before (Fig. 23) becomes the only relevant way to gauge the network performance.

to the actual weight updates that get implemented in these two update schemes. In Fig. 25(b), we show that when the state of the synapse is at the boundaries of the G-diamond, there is a significant chance that the next weight update using the alternating bidirectional scheme will have little or no impact, simply because a conductance that is already saturated cannot be increased (decreased) any further. In the fullybidirectional update scheme, some amount of weight update will still occur at the edges of the G-diamond, leading to smaller discrepancies between the desired and actual weight changes, and thus higher performance. In addition, because the weights only move "up" and "down" the G-diamond in the fully bidirectional scheme, the synapses stay in the center stripe of the G-diamond (Fig. 26(b)), where they have access to the full dynamic range available. In contrast, because each weight update in the alternating bidirectional scheme moves along a diagonal line, some number of synapses end up at the edges of the G diamond, where the effective dynamic range which they can access is significantly reduced (Fig. 26(a)).

These results demonstrate conclusively that NVM devices

should be fully capable of delivering the same classification accuracy on the MNIST handwritten digits as a conventional implementation of this artificial neural network. All that is required of the NVM device is that it offer a bidirectional, linear, and symmetric response in conductance with large dynamic range (e.g., the change due to any one pulse represents only a small fraction of the entire conductance range available).

Other future work will be needed to demonstrate a full crossbar-array implementation, including dedicated CMOS circuitry for summation of synaptic weights during both forward and backpropagation through nearly-identical high-performance nonlinear selector devices. The values of neurons (x) and backpropagated errors  $(\delta)$  will need to be stored in CMOS circuitry and presented to the crossbar, through some combination of analog voltage levels, number of read pulses, and/or duration of read pulses. The need to synchronize write pulse timing between upstream and downstream neurons, and techniques to disperse the high-energy writes in time (to reduce the load on write drivers and voltage supplies) must also be addressed in future work.

#### VI. CONCLUSIONS

Using 2 phase-change memory (PCM) devices per synapse, a 3–layer perceptron with 164,885 synapses was trained with backpropagation on a subset (5000 examples) of the MNIST database of handwritten digits to high accuracy of (82.2%, 82.9% on test set). A weight-update rule compatible for NVM+selector crossbar arrays was developed, and was shown to have no adverse effect on accuracy. A novel "*G*-diamond" concept (Fig. 11) was introduced to illustrate issues created by nonlinearity and asymmetry in NVM conductance response. Asymmetry can be mitigated by an occasional RESET strategy (Fig. 12), which can be both infrequent and inaccurate (Figs. 12,18(f,g)).

Using a neural network (NN) simulator matched to the experimental demonstrator, extensive tolerancing was performed (Fig. 18). Results show that network parameters such as learning rate and the slope of the nonlinear neuron-response function (Figs. 18(i,j)), and the nonlinearity, symmetry, and bounded nature of the conductance response (Figs. 9–11, 20–26), are critical to achieving high performance in an NVM-based neural network.

Our results show that all NVM-based neural networks (not just those based on PCM) can be expected to be **highly resilient to random effects** (NVM variability, yield, and stochasticity), but will be **highly sensitive to "gradient" effects that act to steer all synaptic weights**. A learning rate just high enough to avoid network "freeze-out" is shown to be advantageous for both high accuracy and low training energy. We also prove that a bidirectional NVM with a symmetric, linear conductance response of high dynamic range (each conductance step is relatively small) would be fully capable of delivering the same high classification accuracies on the MNIST handwriting digit database as a conventional, software-based implementation, ranging from >94% when trained on 5000 examples to >97% when trained on the full set of 60,000 training examples.



Fig. 25. Because of the finite bounds on conductance values, any desired sequence of weight changes (lefthand portion of (a)) will not be fully implemented in an NVM-based neuromorphic system. Parts (b) and (c) illustrate the actual weight updates that occur in (b) an "alternating bidirectional" update scheme, in which we alternate between updating  $G^+$  or  $G^-$  but not both, and (c) a "fully bidirectional" update scheme, in which we always update both  $G^+$  or  $G^-$ With the "alternating bidirectional" scheme, synapses whose conductance values are located at/near the boundaries of the G-diamond can potentially lead to a situation where a large weight update is completely ignored. In contrast, in the "bidirectional" scheme, such large weight updates are simply reduced in the magnitude of their effect, and synapses tend to remain in the center of the G-diamond.



Fig. 26. When the G-response is steeply nonlinear, a "fully bidirectional" scheme exhibits lower accuracy (see Fig. 21), because any single weight update could potentially make two overly large conductance changes instead of just one. However, the "fully bidirectional" scheme provides better performance for a linear response with high dynamic range (compare Figs. 22 and 23), because the small symmetric changes of each conductance move the synaptic weight up and down along the central vertical axis of the G-diamond. In contrast, the "alternating bidirectional" scheme can move some synapses to the left or right edges of the G-diamond, where the effective dynamic range (maximum weight magnitude) is significantly reduced.

#### REFERENCES

- [1] B. L. Jackson, B. Rajendran, G. S. Corrado, M. Breitwisch, G. W. Burr, R. Cheek, K. Gopalakrishnan, S. Raoux, C. T. Rettner, A. Padilla, A. G. Schrott, R. S. Shenoy, B. N. Kurdi, C. H. Lam, and D. S. Modha, "Nanoscale electronic synapses using phase change devices," ACM Journal On Emerging Technologies in Computing Systems, vol. 9, no. 2, p. 12, 2013.
- [2] M. Suri, O. Bichler, D. Querlioz, O. Cueto, L. Perniola, V. Sousa, D. Vuillaume, C. Gamrat, and B. DeSalvo, "Phase change memory as synapse for ultra-dense neuromorphic systems: application to complex visual pattern extraction," in IEDM Technical Digest, 2011, p. 4.4.
- [3] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, p. 2278, 1998.
- [4] D. Rumelhart, G. E. Hinton, and J. L. McClelland, "A general framework for parallel distributed processing," in Parallel Distributed Processing. MIT Press, 1986
- [5] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, p. 504, 2006.
- [6] B. Rajendran, Y. Liu, J.-S. Seo, K. Gopalakrishnan, L. Chang, D. J. Friedman, and M. B. Ritter, "Specifications of nanoscale devices and circuits for neuromorphic computational systems," IEEE Trans. Electr. Dev., vol. 60, no. 1, pp. 246-253, 2013.
- [7] J.-W. Jang, S. Park, G. W. Burr, H. Hwang, and Y.-H. Jeong, "Optimization of conductance change in  $Pr_{1--x}Ca_xMnO_3$ -based synaptic devices for neuromorphic systems," IEEE Electron Device Letters, vol. 36, no. 5, pp. 457-459, 2015.
- [8] G. W. Burr, R. M. Shelby, C. di Nolfo, J. W. Jang, R. S. Shenoy, P. Narayanan, K. Virwani, E. U. Giacometti, B. Kurdi, and H. Hwang, "Experimental demonstration and tolerancing of a large-scale neural

network (165,000 synapses), using phase-change memory as the synaptic weight element," in IEDM, 2014, p. 29.5.

- [9] M. Breitwisch, T. Nirschl, C. F. Chen, Y. Zhu, M. H. Lee, M. Lamorey, G. W. Burr, E. Joseph, A. Schrott, J. B. Philipp, R. Cheek, T. D. Happ,
  - S. H. Chen, S. Zaidi, P. Flaitz, J. Bruley, R. Dasaka, B. Rajendran, S. Rossnagel, M. Yang, Y. C. Chen, R. Bergmann, H. L. Lung, and
  - C. Lam, "Novel lithography-independent pore phase change memory," in Symposium on VLSI Technology, 2007, pp. 100-101.



Geoffrey W. Burr (S'87-M'96-SM'13) received the Ph.D. degree in electrical engineering from the California Institute of Technology, Pasadena, CA, USA, in 1996.

He joined IBM Research - Almaden, San Jose, CA, USA, where he is currently a Principal Research Staff Member, in 1996. His current research interests include nonvolatile memory and cognitive computing.

He joined IBM, Armonk, NY, USA, in 1978. He

Dr. Shelby is a fellow of the Optical Society of

Severin Sidler received the B.S. degree in electrical engineering from EPFL, Lausanne, Switzerland,

where he is continuing his studies in electrical en-

gineering. He expects to receive the M.S. degree in

2017. At present, he is an intern at IBM Research-







Carmelo di Nolfo received the B.S. and M.S. degrees in electrical and computer engineering from the University of Liège, Belgium, in 2012 and 2014, respectively. During his M.S. degree work, he studied at École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland and spent six months as an intern at IBM Research-Almaden, San Jose, CA, USA. He joined the SyNAPSE team at IBM Research-Almaden, San Jose, CA, USA in March 2015.



Junwoo Jang received the B.S. degree in electrical engineering from the Pohang University of Science and Technology (POSTECH), South Korea in 2012. He moved to the Department of Creative IT Engineering for his graduate studies, and expects to receive his Ph.D. degree in 2016. He spent four months visiting IBM Research–Almaden, San Jose, CA, USA in late 2013.

His current research interests include circuit design for cognitive computing systems.



**Irem Boybat** (S'15) received the B.S. degree in electronics engineering from Sabanci University, Istanbul, Turkey. She is continuing her studies in electrical engineering at EPFL, Lausanne, Switzerland, and is expected to receive the M.S. degree in 2015. She was an intern at IBM Research – Almaden, San Jose, CA, USA between September 2014 and February 2015.

Her current research interests include cognitive computing, semi-custom design and embedded systems.



**Rohit S. Shenoy** (M'04) received the B.Tech. degree in engineering physics from IIT Bombay, Mumbai, India, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA.

He was a Research Staff Member with IBM Research–Almaden, San Jose, CA, USA, from 2005 to 2014. He joined Intel, Santa Clara, CA, USA, as a Device Engineer, in 2014, where he is involved in NAND flash development.



**Pritish Narayanan** (M'14) received the Ph.D. degree in electrical and computer engineering from the University of Massachusetts Amherst, Amherst, MA, USA, in 2013.

He joined IBM Research – Almaden, San Jose, CA, USA, as a Research Staff Member. His current research interests include emerging technologies for logic, nonvolatile memory, and cognitive computing.



**Kumar Virwani** (S'05–M'10) received the Ph.D. degree from the University of Arkansas at Fayetteville, Fayetteville, AR, USA, in 2007.

He has been with IBM Research–Almaden, San Jose, CA, USA, since 2008. He has developed novel conducting scanning probe microscopy methods for electrical characterization of mixed ionic electronic conductor materials and devices. He is involved in projects on storage class memory, lithium-air batteries, low-k dielectrics, and photovoltaics.



**Emanuele U. Giacometti** received the B.S. in electronic engineering, and his International Masters in Nanotechnologies from the Politecnico di Torino, Torino, Italy in 2012 and 2014, respectively. After classes at the Institut Polytechnique de Grenoble (INP), France and École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland he spent six months as an intern at IBM Research–Almaden, San Jose, CA, USA. He will begin his Ph.D. studies in late 2015.





**Bülent N. Kurdi** received the Ph.D. degree from the Institute of Optics, University of Rochester, Rochester, NY, USA, in 1989.

After eleven years at IBM Research – Almaden, San Jose, CA, USA, he joined Wavesplitter Technologies, Inc., Fremont, CA, USA, in 2000. Then, he returned to IBM Research – Almadenin 2003. He is now Senior Manager of the Novel Device Prototyping & Characterization department.

Hyunsang Hwang received his Ph.D. in Materials Science from the University of Texas at Austin, Austin, Texas, USA in 1992. After five years at LG Semiconductor Corporation, he became a Professor of Materials Science and Engineering at Gwangju Institute of Science and Technology, Gwangju, South Korea in 1997. In 2012, he moved to the Materials Science and Engineering department at Pohang University of Science and Technology (POSTECH), South Korea.