Naming Conventions (1)

2009-01-25

Nomen est omen. This old Latin proverb means something like “the name says it all”. The ancients were superstitious, and they believed that names carry special powers. Names were thought to predispose its subject to bring about certain fortunes or to have certain qualities. Today, we have largely done away with such superstitions. In the scientific worldview, names are nothing but symbolic artefacts without intrinsic powers.

However, there is one field where this Latin proverb still applies, and where it is indeed more true than ever. Oddly, this field wasn’t even known to the Romans. I am talking about software development, of course. Names have a special importance to software, or perhaps better, the practice of naming does. The first thing I do when looking at a piece of software written by somebody else is to look at the names given to variables and other program elements. My experience has shown that the quality of the identifier names corresponds directly to the overall quality of the program code.

Identifier names are a crucial part of any program. They provide clues about semantics and program logic. They make or break code readability. They determine whether code is self-documenting or not. So, the old Latin proverb “nomen est omen” can be applied as follows: If you read through a piece of code for the first time and you have no idea what the variables are supposed to represent, or what the methods are supposed to accomplish, then this is a bad omen. It suggests that the author was not quite sure how to formulate the problem (or didn’t care) and it can be expected that the other aspects of the program are at least as confusing.

If you read through a piece of code and the identifier names are easily comprehensible and fit together like the pieces of a jigsaw puzzle, then this is a good omen. It suggests that the author had a clear idea of the task at hand. Naturally, there are many intermediate levels between these two opposites. While contemporary code editors and IDEs are very powerful, identifier naming is one of the things that cannot currently be automated by these tools.

It is up to the programmer to choose identifier names. Since good naming practice is essential for code maintainability, we will first define what makes a good naming practice and then look at some concrete examples of good and bad strategies. There are three basic ingredients for a good naming practice: (1) semantic precision, (2) consistency, (3) the right amount of verbosity.

The first aspect is by far the hardest to get right. Semantic precision means that the chosen name is appropriate, unambiguous, well defined, and compliant with conventions. Consistency means that names are formed according to common patterns and that terms are used consistently throughout the program. The right amount of verbosity relates to identifier length. It means that names do not leave anything open to guesswork while avoiding redundancy.

An identifier usually consists of a single word or a combination of words. In case of the latter, the individual words are often set apart by using CamelCase or the “_” underscore character. One of the most commonly found questionable practice is the use of abbreviations instead of written out words, for example rptCount instead of repeatCount. The word count indicates that this variable is a counter, but what is counted? Repetitions, receptors, recipients, red points, or something else? Reptiles perhaps? By adding a mere three characters and writing out the first word, the ambiguity is eliminated.

This doesn’t mean that abbreviations are always bad. For example, nothing speaks against widely used acronyms like URI for UniqueResourceIdentifier, or LCD instead of LiquidCrystalDisplay. Likewise, domain-specific abbreviations are acceptable, if the program is written within that domain, for example FOB (free on board) in the shipping domain, or VAT (value added tax) in the accounting domain. By definition this also includes acronyms in the software domain, such as i18n for internationalisation, or ftp for file transfer protocol. In addition, there are a number of pre- and postfixes used ubiquitously in programming, such as min, max, fmt, pos, len, num, cnt, etc. which every programmer understands.

Generally speaking, abbreviations and acronyms should be used sparingly and only when they are common and free of ambiguity. This also means that one-letter or two-letter variable names, such as a, b, c, f1, x2, etc. are generally a bad idea, because they say nothing about the content of the variable. There is one exception to this rule: loop indices. Since loop indices (or iterator variables) are only used to to iterate through a collection of values, they don’t have any intrinsic meaning. So one might as well give them one-letter names. By convention, the letters i, j, k, etc. are used, whereas the alphabetic order corresponds to the loop nesting level. This mean i is used for the outermost loop, j for the second nested loop, k for the third, and so on.

This is standard practice for loop indices, but in other cases, index position corresponds to certain semantics. In this case, indices do have meaning. For example, one might define an array of counters, where counter[0] contains the number of students, counter[1] contains the number of passed tests and counter[2] contains the number of failed tests. Since the index numbers themselves don’t communicate any meaning, it is appropriate to define an enumerable type or a set of integer constants that conveys this meaning, for example STUDENTS=0, PASSED_TESTS=1, FAILED_TESTS=2, and so on.

This is all pretty much standard programming practice. Next time we will look at common identifier naming schemes, their merits and demerits, as well as language conventions.

Next