Why is the first element of an array indexed at 0 and not 1?
An array is, arguably, the oldest data structure. It was originally (and, in most cases still is) a contiguous area of memory used to store a collection of values of the same type. Its memory layout makes it fast, simple and convenient to iterate though, and it was used extensively in older programming languages, way before the plethora of data-structures and abstractions we have available now.
But, why is their first element usually indexed at 0
? This is a supposed idiosyncrasy that has regularly tripped up Computer Science students everywhere, and even been responsible for some serious security vulnerabilities.
I believe the reasons are largely historic and due to the way programmers would iterate through arrays and access array elements in lower level languages.
To begin with, take the table, below, representing a memory layout. There are five locations between addresses 0x100
and 0x104
which contain values 20 to 24. This is our array and its base address is the address of the first element (0x100).
Address | Value |
---|---|
0x100 | 20 |
0x101 | 21 |
0x102 | 22 |
0x103 | 23 |
0x104 | 24 |
Now, say our CPU has three registers, B (base), I (index) and V (value).
We can access elements of the above array by loading the base address into the B
register then adding n to access the address n elements from the base.
1
2
3
mov b, 0x100 ; Load base address into B registers
mov v, ptr [b+3] ; Add three to base address and de-reference into the `v` register.
; b+3 == 0x103, so `v` now contains `23`
We could also iterate through this array in some pseudo assembly language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
start:
mov b, 0x100 ; Move base address (first element) of array into `b`
xor i,i ; Zero out `i`
top_of_loop:
mov v, ptr [b+i] ; Dereference pointer at address (`b` + `i`) and move value into `v`
...
<do something with the value in v>
...
cmp i,9 ; Was that the last item in the array?
je exit ; If yes, jump to `exit` ...
inc i ; ... else, increment i to next value
jmp top_of_loop ; Go back to top of the loop
exit:
ret ; We're done
So, knowing the first address of our array was loaded into a register called b
we could re-write the table at the top of this post as…
Address | Value |
---|---|
b + 0 | 20 |
b + 1 | 21 |
b + 2 | 22 |
b + 3 | 23 |
b + 4 | 24 |
…or even…
Address | Value |
---|---|
b[0] | 20 |
b[1] | 21 |
b[2] | 22 |
b[3] | 23 |
b[4] | 24 |