Counting arrays from 0 simplifies the computation of the memory address of each element.
If an array is stored at a given position in memory (it's called the address) the position of each element can be computed as
element(n) = address + n * size_of_the_element
If you consider the first element the first, the computation becomes
element(n) = address + (n-1) * size_of_the_element
Not a huge different but it adds an unnecessary subtraction for each access.
Edit
The usage of the array index as an offset is not a requirement but just an habit. The offset of the first element could be hidden by the system and taken into consideration when allocating and referencing element.
Dijkstra published a paper "Why numbering should start at zero" (pdf) where he explains why starting with 0 is a better choice. Starting at zero allows a better representation of ranges.
None of the reasons you suggest really get to the heart of why we use zero-indexing in CS. Dijkstra's EWD831 explains why this convention works out the best. It comes down to the fact that we want to represent sequences of integers as half-open intervals that are inclusive on the start side.
To denote the subsequence of natural numbers 2, 3, ..., 12 without the pernicious three dots, four conventions are open to us
a) 2 ≤ i < 13 b) 1 < i ≤ 12 c) 2 ≤ i ≤ 12 d) 1 < i < 13
To paraphrase Dijkstra:
Because of the above, we want to write all of our intervals as (a).
Once you accept that (a) is the correct way of specifying intervals, indexing an array of length N as [0, N) is much nicer than [1, N+1).
Two additional notes in favor of using (a) for intervals.
The first point above is important because half-open intervals nicely decompose into other half-open intervals. This makes implementing divide-and-conquer algorithms like merge sort significantly less error prone.
For example:
It's also a bit jarring to specify intervals with (b), because it feels unnatural to skip the first element when you're writing a loop.
As for how to explain the above to your students, I think that they just need to accept that breaking sequences of integers in half is something they'll do later on, and then a few examples will quickly point to [,) intervals and zero indexing as the most natural choice.
It's about offsets. You have an address, which points to the location in memory where the array begins. Then to access any element, you multiply the array index by the size of the element and add it to the starting address, to find the address for that element.
The first element is at the starting point, so you multiply the size of the element by zero to get zero which is what you add to the starting address to find the location of the first element.
The convention spread because programmers started working in very low-level languages where memory addresses were directly manipulated and in most cases building up from there, maintaining the same convention at each step so that they wouldn't have to relearn or be prone to mistakes when switching between conventions. It's still important to understand how this addressing works especially when working with lower-level languages. I agree this can be a stumbling block for people who are first learning to program in a higher-level language.
The Wikipedia article on this topic also cites a common machine instruction used when working "backwards" and detecting the end of a loop, namely "decrement and jump if zero."
An exception: MATLAB and some other languages bucked the trend and went with an index starting at 1, apparently under the impression that it would be a first programming language for a lot of their target users and that for those folks, starting with 1 makes more intuitive sense. This causes some frustrations for the (relatively small subset of?) programmers who frequently switch between programming languages that start counting at different values.
Blind copying of C, just like ratchet freak said in his comment
The vast majority of "language designers" these days have never seen anything but C and its copies (C++, Java, Javascript, PHP, and probably a few dozen others I never heard of). They have never touched FORTRAN, COBOL, LISP, PASCAL, Oberon, FORTH, APL, BLISS, SNOBOL, to name a few.
Once upon a time, exposure to multiple programming languages was MANDATORY in the computer science curriculum, and that didn't include counting C, C++, and Java as three separate languages.
Octal was used in the earlier days because it made reading binary instruction values easier. The PDP-11, for example, BASICALLY had a 4-bit opcode, 2 3-bit register numbers, and 2 3-bit access mechanism fields. Expressing the word in octal made everything obvious.
Because of C's early association with the PDP-11, octal notation was included, since it was very common on PDP-11s at the time.
Other machines had instruction sets that didn't map well to hex. The CDC 6600 had a 60-bit word, with each word containing typically 2 to 4 instructions. Each instruction was 15 or 30 bits.
As for reading and writing values, this is a solved problem, with a well-known industry best practice, at least in the defense industry. You DOCUMENT your file formats. There is no ambiguity when the format is documented, because the document TELLS you whether you are looking at a decimal number, a hex number, or an octal number.
Also note: If your I/O system defaults to leading 0 meaning octal, you have to use some other convention on your output to denote hexadecimal values. This is not necessarily a win.
In my personal opinion, Ada did it best: 2#10010010#, 8#222#, 16#92#, and 146 all represent the same value. (That will probably get me at least three downvotes right there, just for mentioning Ada.)
Yes, lots. Fortran for example.
And then there are languages which allow array elements to start indexing at almost any integer. Fortran for example.