This is a purely informative rendering of an RFC that includes verified errata. This rendering may not be used as a reference.

The following 'Verified' errata have been incorporated in this document: EID 560
Network Working Group                                         R.  Braden
Request for Comments: 1071                                           ISI
                                                              D.  Borman
                                                           Cray Research
                                                            C. Partridge
                                                        BBN Laboratories
                                                          September 1988


                    Computing the Internet Checksum


Status of This Memo

   This memo summarizes techniques and algorithms for efficiently
   computing the Internet checksum.  It is not a standard, but a set of
   useful implementation techniques.  Distribution of this memo is
   unlimited.

1.  Introduction

   This memo discusses methods for efficiently computing the Internet
   checksum that is used by the standard Internet protocols IP, UDP, and
   TCP.

   An efficient checksum implementation is critical to good performance.
   As advances in implementation techniques streamline the rest of the
   protocol processing, the checksum computation becomes one of the
   limiting factors on TCP performance, for example.  It is usually
   appropriate to carefully hand-craft the checksum routine, exploiting
   every machine-dependent trick possible; a fraction of a microsecond
   per TCP data byte can add up to a significant CPU time savings
   overall.

   In outline, the Internet checksum algorithm is very simple:

   (1)  Adjacent octets to be checksummed are paired to form 16-bit
        integers, and the 1's complement sum of these 16-bit integers is
        formed.

   (2)  To generate a checksum, the checksum field itself is cleared,
        the 16-bit 1's complement sum is computed over the octets
        concerned, and the 1's complement of this sum is placed in the
        checksum field.

   (3)  To check a checksum, the 1's complement sum is computed over the
        same set of octets, including the checksum field.  If the result
        is all 1 bits (-0 in 1's complement arithmetic), the check
        succeeds.

        Suppose a checksum is to be computed over the sequence of octets

        A, B, C, D, ... , Y, Z.  Using the notation [a,b] for the 16-bit
        integer a*256+b, where a and b are bytes, then the 16-bit 1's
        complement sum of these bytes is given by one of the following:

            [A,B] +' [C,D] +' ... +' [Y,Z]              [1]

            [A,B] +' [C,D] +' ... +' [Z,0]              [2]

        where +' indicates 1's complement addition. These cases
        correspond to an even or odd count of bytes, respectively.

        On a 2's complement machine, the 1's complement sum must be
        computed by means of an "end around carry", i.e., any overflows
        from the most significant bits are added into the least
        significant bits. See the examples below.

        Section 2 explores the properties of this checksum that may be
        exploited to speed its calculation.  Section 3 contains some
        numerical examples of the most important implementation
        techniques.  Finally, Section 4 includes examples of specific
        algorithms for a variety of common CPU types.  We are grateful
        to Van Jacobson and Charley Kline for their contribution of
        algorithms to this section.

        The properties of the Internet checksum were originally
        discussed by Bill Plummer in IEN-45, entitled "Checksum Function
        Design".  Since IEN-45 has not been widely available, we include
        it as an extended appendix to this RFC.

     2.  Calculating the Checksum

        This simple checksum has a number of wonderful mathematical
        properties that may be exploited to speed its calculation, as we
        will now discuss.


   (A)  Commutative and Associative

        As long as the even/odd assignment of bytes is respected, the
        sum can be done in any order, and it can be arbitrarily split
        into groups.

        For example, the sum [1] could be split into:

           ( [A,B] +' [C,D] +' ... +' [J,0] )

                  +' ( [0,K] +' ... +' [Y,Z] )               [3]

   (B)  Byte Order Independence

        The sum of 16-bit integers can be computed in either byte order.
        Thus, if we calculate the swapped sum:

           [B,A] +' [D,C] +' ... +' [Z,Y]                   [4]

        the result is the same as [1], except the bytes are swapped in
        the sum! To see why this is so, observe that in both orders the
        carries are the same: from bit 15 to bit 0 and from bit 7 to bit
        8.  In other words, consistently swapping bytes simply rotates
        the bits within the sum, but does not affect their internal
        ordering.

        Therefore, the sum may be calculated in exactly the same way
        regardless of the byte order ("big-endian" or "little-endian")
        of the underlaying hardware.  For example, assume a "little-
        endian" machine summing data that is stored in memory in network
        ("big-endian") order.  Fetching each 16-bit word will swap
        bytes, resulting in the sum [4]; however, storing the result
        back into memory will swap the sum back into network byte order.

        Byte swapping may also be used explicitly to handle boundary
        alignment problems.  For example, the second group in [3] can be
        calculated without concern to its odd/even origin, as:

              [K,L] +' ... +' [Z,0]

        if this sum is byte-swapped before it is added to the first
        group.  See the example below.

   (C)  Parallel Summation

        On machines that have word-sizes that are multiples of 16 bits,
        it is possible to develop even more efficient implementations.
        Because addition is associative, we do not have to sum the
        integers in the order they appear in the message.  Instead we
        can add them in "parallel" by exploiting the larger word size.

        To compute the checksum in parallel, simply do a 1's complement
        addition of the message using the native word size of the
        machine.  For example, on a 32-bit machine we can add 4 bytes at
        a time: [A,B,C,D]+'... When the sum has been computed, we "fold"
        the long sum into 16 bits by adding the 16-bit segments.  Each
        16-bit addition may produce new end-around carries that must be
        added.

        Furthermore, again the byte order does not matter; we could
        instead sum 32-bit words: [D,C,B,A]+'... or [B,A,D,C]+'... and
        then swap the bytes of the final 16-bit sum as necessary.  See
        the examples below.  Any permutation is allowed that collects

        all the even-numbered data bytes into one sum byte and the odd-
        numbered data bytes into the other sum byte.


   There are further coding techniques that can be exploited to speed up
   the checksum calculation.

   (1)  Deferred Carries

        Depending upon the machine, it may be more efficient to defer
        adding end-around carries until the main summation loop is
        finished.

        One approach is to sum 16-bit words in a 32-bit accumulator, so
        the overflows build up in the high-order 16 bits.  This approach
        typically avoids a carry-sensing instruction but requires twice
        as many additions as would adding 32-bit segments; which is
        faster depends upon the detailed hardware architecture.

   (2)  Unwinding Loops

        To reduce the loop overhead, it is often useful to "unwind" the
        inner sum loop, replicating a series of addition commands within
        one loop traversal.  This technique often provides significant
        savings, although it may complicate the logic of the program
        considerably.

   (3)  Combine with Data Copying

        Like checksumming, copying data from one memory location to
        another involves per-byte overhead.  In both cases, the
        bottleneck is essentially the memory bus, i.e., how fast the
        data can be fetched. On some machines (especially relatively
        slow and simple micro-computers), overhead can be significantly
        reduced by combining memory-to-memory copy and the checksumming,
        fetching the data only once for both.

   (4)  Incremental Update

        Finally, one can sometimes avoid recomputing the entire checksum
        when one header field is updated.  The best-known example is a
        gateway changing the TTL field in the IP header, but there are
        other examples (for example, when updating a source route).  In
        these cases it is possible to update the checksum without
        scanning the message or datagram.

        To update the checksum, simply add the differences of the
        sixteen bit integers that have been changed.  To see why this
        works, observe that every 16-bit integer has an additive inverse
        and that addition is associative.  From this it follows that
        given the original value m, the new value m', and the old

        checksum C, the new checksum C' is:

                C' = C + (-m) + m' = C + (m' - m)


3. Numerical Examples

   We now present explicit examples of calculating a simple 1's
   complement sum on a 2's complement machine.  The examples show the
   same sum calculated byte by bye, by 16-bits words in normal and
   swapped order, and 32 bits at a time in 3 different orders.  All
   numbers are in hex.

                  Byte-by-byte    "Normal"  Swapped
                                    Order    Order

        Byte 0/1:    00   01        0001      0100
        Byte 2/3:    f2   03        f203      03f2
        Byte 4/5:    f4   f5        f4f5      f5f4
        Byte 6/7:    f6   f7        f6f7      f7f6
                    ---  ---       -----     -----
        Sum1:       2dc  1f0       2ddf0     1f2dc

                     dc   f0        ddf0      f2dc
        Carrys:       1    2           2         1
                     --   --        ----      ----
        Sum2:        dd   f2        ddf2      f2dd

        Final Swap:  dd   f2        ddf2      ddf2


        Byte 0/1/2/3:  0001f203     010003f2       03f20100
        Byte 4/5/6/7:  f4f5f6f7     f5f4f7f6       f7f6f5f4
                       --------     --------       --------
        Sum1:         0f4f7e8fa    0f6f4fbe8      0fbe8f6f4

        Carries:              0            0              0

        Top half:          f4f7         f6f4           fbe8
        Bottom half:       e8fa         fbe8           f6f4
                          -----        -----          -----
        Sum2:             1ddf1        1f2dc          1f2dc

                           ddf1         f2dc           f2dc
        Carrys:               1            1              1
                           ----         ----           ----
        Sum3:              ddf2         f2dd           f2dd

        Final Swap:        ddf2         ddf2           ddf2

   Finally, here an example of breaking the sum into two groups, with
   the second group starting on a odd boundary:


                   Byte-by-byte    Normal
                                    Order

        Byte 0/1:    00   01        0001
        Byte 2/ :    f2  (00)       f200
                    ---  ---       -----
        Sum1:        f2   01        f201

        Byte 4/5:    03   f4        03f4
        Byte 6/7:    f5   f6        f5f6
        Byte 8/:     f7  (00)       f700
                    ---  ---       -----
        Sum2:                      1f0ea

        Sum2:                       f0ea
        Carry:                         1
                                   -----
        Sum3:                       f0eb

        Sum1:                       f201
        Sum3 byte swapped:          ebf0
                                   -----
        Sum4:                      1ddf1

        Sum4:                       ddf1
        Carry:                         1
                                   -----
        Sum5:                       ddf2

4.  Implementation Examples

   In this section we show examples of Internet checksum implementation
   algorithms that have been found to be efficient on a variety of
   CPU's.  In each case, we show the core of the algorithm, without
   including environmental code (e.g., subroutine linkages) or speci