Neuroprocessor based on combined memristor-diode crossbar
This paper presents the concept of an autonomous hardware – a neuroprocessor, on which both a neural networks on simple neurons used in information technologies and a biomorphic neural network can be based to simulate the work of a cortical column.
The input device is intended for primary processing of audio and video signals (for converting data obtained from the interface unit to the desired format), as well as for encoding any other information in the form of separate pulses and, if necessary, to convert these pulses into standard pulses with certain amplitudes and duration, similar to biomorphic impulses of the brain.
The storage matrix, in addition to storing information, performs a part of the processing operations for the neural network, performing a weighted summation of the input pulses by multiplying the voltage of the input signal by the resistance according to Ohm's law and summing the resulting currents according to the Kirchhoff's first law. The logical matrix simultaneously processes the digitized output pulses of the neurons of the memory matrix and commutes them to the synapses of other neurons. In this matrix, in addition to performing logical operations, the neural network problem of comparing the sum of signals with the threshold can be solved.
The full operation of the neural network is realized when the outputs of the logical and inputs of memory matrix are connected. Thus, neurons from the memory matrix can be combined into a single network using a logical matrix. The logical device must perform multiplication of the key state matrix by the vector of the input signals, providing switching of the signals of the memory matrix. Since this multiplication is used in signal processing with the help of the Fourier transform, such a matrix will become universal.
At the end of processing in the memory and logical matrices having a positive feedback and of receiving new data, the information is fed to the output device, where its final processing takes place (spectral analysis, image compression and convolution filtering). Further, the information prepared for transportation is transmitted to the interface unit.
In view of the scale of the architecture of the neuroprocessor and the large number of elements in the electrical circuit, the following general requirements are imposed on its nodes: high degree of integration of elements when combining them into an extremely large matrix; minimization of the area occupied by the cell of the matrix on the chip; high speed and energy efficiency.
3D MEMORY ARRAY WITH HIGH INTEGRATION OF ELEMENTS
The storage matrix based on the crossbar of complementary memristors , in contrast to the memory matrices used in information technologies, in addition to information storage allows the weighing and summation of the voltages of the input signals passing through the memristors. However, it can not be used as an ultra-large memory matrix of the neuroprocessor because of low energy efficiency during recording and high degradation of the output signal during reading because there is no nonlinear selective element in the storage cell.
The problem of energy efficiency of an ultra-large memory matrix is solved by using a complementary memristor-diode cell, which is a two-layered interconnection of complementary bipolar memristors and one Zener separating diode. The latter allows to reduce the degradation of the output signal when summing the input voltage pulses.
The paper  presents nanotechnology for the manufacture of a superlarge (more than 106 cells) multilayer storage array with non-volatile memory and a high degree of integration of elements based on a combined memristor-diode crossbar.
Technologically high degree of integration of elements can be achieved by sequential vertical build-up on a chip of planar two-layer memory matrices in a 3D structure of identical horizontally arranged and mirror-oriented combined crossbars (Fig.2). Due to the shortening of the length of the connecting conductors, the energy efficiency of the matrix is increased.
Two-layer combined crossbar can be considered as a separate functional layer. Fig.3 shows the electrical circuit, and Fig.4 presents the topology of the fragment of the three-dimensional memory matrix of the three combined crossbars explaining the principle of joining adjacent layers.
The memristor layer and semiconductor layers of the diode can be formed by a magnetron method. The layers of semiconductors with a donor or acceptor impurity and different levels of doping are created by the simultaneous sputtering of cathodes from a pure semiconductor and a dopant .
UNIVERSAL 3D LOGICAL MATRIX WITH HIGH INTEGRATION
The Akers logic array based on memristors  can be programmed to perform any logical function. However, in one array it is impossible to implement a combinatorial scheme for multiplying a vector by a matrix because there is only one output in it. Using multiple arrays to organize such an operation will lead to an increase in the number of elements and, correspondingly, to an increase in the size of the logical device. At the same Akers logic array has weak integration of elements, which is connected with a large number of transistors in the cell and high degradation of the output signal at a large matrix size. If you use the Akers array in a sequential scheme, then the performance will decrease due to the sequential calculation of each digit of the output vector.
The Hewlett-Packard (HP) planar matrix , designed for processing video signals multiplies the vector by a matrix in analog form. It can work as a digital logical matrix when input logic signals to the gates of transistors. But even this matrix can not be used as a logical super-large matrix of a neuroprocessor because of the low integration of elements: one transistor with a minimum size of 4F2 has only one memristor of 1F2 size. This matrix is not advisable to use also as an input device of a neuroprocessor, since at an extremely large size its energy efficiency is extremely small.
In , nanotechnology for the manufacture of an extremely large 3D logical matrix using logic gates and memristor switches with a high degree of integration of elements is presented. The electrical circuit of the cell of the logical matrix shown in Fig.5 is a combination of memristors with selective Zener diodes connected to one of the crossbar conductors. In turn, this conductor is connected to the gate of the CMOS inverter. The diode is part of the logic circuit, and also eliminates spurious currents during recording.
The combined memristor-diode crossbar is manufactured using the same vacuum nanotechnology as the crossbar of the memory matrix, while the known nanotechnology is also used in the manufacture of CMOS-inverter transistors.
A single functional layer is created on the chip, containing CMOS inverters in the lower layer and a combined memristor-diode crossbar in the upper layer. The outermost layer is oriented orthogonally to the lower one, which is a necessary condition for the formation of commutating memristor crossbars between the layers (Fig.6). This configuration of the layers is optimal, since it allows the use of output buses of the layer as conductors of the crossbar. The number of memristors electrically connected to one inverter is equal to the number of synapses (bonds) of one neuron in the memory matrix.
SPICE MODELING OF MEMORY AND LOGICAL MATRICES
Modeling of the matrix is performed in LTSpice version XVII. The process of recording to the selected cell of the memory matrix occurs with a consequent change in the resistance of its memristors. The Zener diode acts as a selective element and prevents parasitic recording into adjacent crossbar cells via adjacent buses. To eliminate this parasitic record in the matrix without diodes , half of the write voltage is applied to the unused buses, which leads to an increase in power consumption.
As can be seen from Fig.7, the energy costs for recording one cell of complementary memristors in a 100Ч100 matrix are reduced by a factor of 8 in case of addition of a Zener diode to each cell. In both cases, the worst possible recording conditions were used, in which all cells of the matrix were initially in the same state, and the ratio of the memristor resistances in the high-impedance and low-resistance states is R = (Roff – Ron) / Ron = 10.
In the extremely large matrix proposed in this paper, we weigh and sum individual pulses. Work with individual input pulses can be considered as a sequential reading. Fig.8 shows the degradation of the output voltage as a function of the size of the square matrix NЧN with a single pulse of 1 V amplitude for three values of R: 10, 100 and 1000.
Fig.8 shows that in the absence of a Zener diode in the cells, the output voltage decreases almost to zero already in the 3Ч3 matrix. When the Zener diode is added, the output voltage decreases from 50% to 70% in a matrix of the same size, and a further increase in the size of the matrix has little effect on the magnitude of the output signal. A slowly varying output voltage level (about 0.3 V) is sufficient to perform a further summation procedure.
To model the operation of the universal logical matrix, a scheme was chosen (Fig.9), which performs the multiplication of the 3Ч3 matrix by a vector of three components. Multiplication in several functional layers realizing a conjunction with inversion is possible when using positional number coding. Each input and output of the circuit is responsible for a particular numerical value. The blue block performs inversion of the input signals, light green – directly multiplication of the vector component by the matrix element by redirecting the pulse to the corresponding bus. The summation block consists of two parts: dark green blocks is a set of three-input AND-NOT elements corresponding to the unique combinations (sums) of the products obtained, and the lilac blocks transmit unique sums to the output.
The proposed scheme is combinational and performs matrix-vector multiplication in one clock cycle.
Fig.10 shows the switching of signals through different channels in a matrix of two functional layers. In the output of the first layer, inverted input signals are obtained, and the second layer performs their conjunction with inversion.
The matrix is programmed as follows: output y1 is connected with inputs x2 and x3, output y2 – with x1 and x3, output y3 – with x1 and x2. The corresponding memristor conductivity are shown on the diagram in color: red means high conductivity, blue indicates low conductivity.
Fig.11 shows the time dependence of the power level consumed for processing the input signal with a diode-memristor matrix of 12Ч12 cells in binary code and four cells of the HP matrix  with a possible number of states equal to 64, which is equivalent to 6 bits. It follows from Fig.11 that the energy consumption (area under the curve) of the diode-memristor matrix is 355 times less than that of the HP matrix. The main energy consumers in the HP matrix are operational amplifiers (2.7 mW for each op-amp), and in the developed matrix – CMOS-inverters, which consume energy mainly at switching. Thus, as the number of cells increases, the difference in the energy consumption of these matrices will increase.
The concept of hardware implementation of the neuroprocessor is presented and the main functional units of the device are described. Both neural networks used in information technologies and a biomorphic neural network for modeling the work of the cortical column can be based on the neuroprocessor.
Electric diagrams of super-large 3D storage and logical matrices designed with the help of a combined memristor-diode crossbar, in which a high integration of elements have been achieved, are presented. The conducted SPICE-modeling showed high energy efficiency of these matrices.
The proposed logical matrix is universal. As a programmable logical matrix, it performs matrix-vector multiplication by successive conjunctions with inversion; as a switch it directs the output pulses of neurons to the synapses of other neurons; as part of the input device of the neuroprocessor it performs the primary processing of the signal in the digital mode by multiplying the matrix by a vector, converting the input data into the desired format; as part of the output device it compresses the information with the same multiplication for transmission to the interface