Integrated Circuits and Materials

An Online Book, First Edition by Dr. Yougui Liao (2018)

Practical Electron Microscopy and Database - An Online Book

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

Error Correcting Code/Error Correction Codes (ECC) Circuits

Manufacturing a DRAM module, which is error free, is a very difficult process. In order to solve the problems of reliability and yield, two methods currently used:
          i) Error correction codes (ECCs),
          ii) Cell replacement.

Different from parity checking, ECCs are capable of not only detecting errors but also correcting them.

Error correcting code (ECC) are logic circuits in memory devices.More and more memory devices are gradually requiring the ECC circuits in the forms of on-chip integration or off-chip support to cope with the increasing bit error rate in the deeply scaled memory cells. For instance, in order to be able to run a code with the high density NAND Flash, an idea was developed so that the ECC circuit and the embedded SRAM are used to resolve the high bit error rate and the only sequentially accessible characteristic of a NAND Flash, respectively. [1] NOR Flash memory generally guarantees 100% good bits without any need for ECC or controllers. MLC technology can cause more strict reliability issues due to small Vth window, and thus it would require more ECC bits, intelligent wear-leveling scheme, restriction of program/erase cycling and so on. [1]

ECC is currently utilized to produce reliable memory to encode data before it is stored in the memory. The ECC take a set of information bits at the producer of the information and create a set of redundant bits based on the information bits. Such redundant bits are sent or stored with the original set of information bits. Then, the consumer of the information uses the redundant bits to determine if any errors have occurred in transmission or storage. In the case of memory, the redundant bits are calculated and stored along with the original bits and when they are read from the memory they are examined to determine if any errors have occurred between the time the information was stored and the time it was retrieved.

The cell replacement technique is used exclusively to solve the yield problem, while error correcting codes are used exclusively to solve the memory reliability problem. To reduce DRAM cost and to improve it reliability, two suggestions were given:
        i) Rather than using redundant rows and columns for cell replacement during manufacturing, one can uses redundant cells as extra space within the DRAM that can be used to implement an ECC with higher error coverage, which would require more redundant bits. Thus making it appear as if there are no errors by correcting them while the memory is in use.
        ii) Implement a replacement method that could be used in connection with an error correcting code to perform replacement at any point in the life of the memory cells, not just during prepackaged testing.

Such ECC circuits are normally implemented by the following common-sense process:
          i) determining the specification of the ECC,
          ii) Register transfer level coding to meet the specification,
          iii) verifying the behavior of the function level operations,
          iv) generating the hardware via logic synthesis,
          v) checking the timings and performance violations.

The limitation on the number of the program and erasing operations, and the ECC is indispensable for a NAND controller due to the relatively higher bit error rate of the cell. [1]

Many error correction codes have been discovered so that coverage for various levels of errors can be provided. Note that higher levels of error coverage require more redundant bits to implement the error correction code. On the other hand, higher defect levels could be accommodated in production and lower levels of testing would be needed, as errors will still be correctable. However, it would involve an error correction code, as errors would need to be detected, and thus the failing cells could be replaced.

 

 

 

 

 

 

 

[1] Krzysztof (Kris) Iniewski, CMOS Processors and Memories, (2010).