Post

Why is the first element of an array indexed at 0 and not 1?

An array is, arguably, the oldest data structure. It was originally (and, in most cases still is) a contiguous area of memory used to store a collection of values of the same type. Its memory layout makes it fast, simple and convenient to iterate though, and it was used extensively in older programming languages, way before the plethora of data-structures and abstractions we have available now.

But, why is their first element usually indexed at 0? This is a supposed idiosyncrasy that has regularly tripped up Computer Science students everywhere, and even been responsible for some serious security vulnerabilities.

I believe the reasons are largely historic and due to the way programmers would iterate through arrays and access array elements in lower level languages.

To begin with, take the table, below, representing a memory layout. There are five locations between addresses 0x100 and 0x104 which contain values 20 to 24. This is our array and its base address is the address of the first element (0x100).

AddressValue
0x10020
0x10121
0x10222
0x10323
0x10424

Now, say our CPU has three registers, B (base), I (index) and V (value).

We can access elements of the above array by loading the base address into the B register then adding n to access the address n elements from the base.

1
2
3
    mov b, 0x100        ; Load base address into B registers
    mov v, ptr [b+3]    ; Add three to base address and de-reference into the `v` register.
                        ; b+3 == 0x103, so `v` now contains `23`

We could also iterate through this array in some pseudo assembly language.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
start:
    mov b, 0x100         ; Move base address (first element) of array into `b`
    xor i,i              ; Zero out `i`
top_of_loop:
    mov v, ptr [b+i]     ; Dereference pointer at address (`b` + `i`) and move value into `v`
    ...
    <do something with the value in v>
    ...
    cmp i,9              ; Was that the last item in the array?
    je exit              ; If yes, jump to `exit` ...
    inc i                ; ... else, increment i to next value
    jmp top_of_loop      ; Go back to top of the loop
exit:
    ret                  ; We're done

So, knowing the first address of our array was loaded into a register called b we could re-write the table at the top of this post as…

AddressValue
b + 020
b + 121
b + 222
b + 323
b + 424

…or even…

AddressValue
b[0]20
b[1]21
b[2]22
b[3]23
b[4]24
This post is licensed under CC BY 4.0 by the author.