Naming Conventions (2)

2009-02-21

Language conventions are often omitted from formal programming conventions, which is a bit odd considering their importance. The most obvious language convention is the selection of the language itself. This point is so obvious, that it’s often missed. Mind you, not every programmer is fluent in English. Just a fraction are native English speakers. Hence, English should be chosen only if all members of the development team are comfortable with it. Otherwise, it is more appropriate to choose the native language of the developer team. If the developer team is international, English is likely to be used as the common basis. If the conventions prescribe English and if some team members are less fluent in English, identifier names must be treated with special care. I recently came across a piece of software developed in Germany where identifier names for an organisational structure where assigned as follows:

German Term What it means Identifier Name Used
Organisationseinheit organisational unit department
Unternehmen company division
Unternehmensbereich division divisionArea
Abteilung department section

The meaning of these identifiers doesn’t quite correspond to what an English speaker would expect them to mean, which is rather confusing. Yet, if you look up these terms in a dictionary, you find that all of them, with exception of divisionArea, are valid translations of the original German term. This is one example where it would have been better to use German identifiers instead. These terms appeared in source code that had English identifiers and German JavaDoc comments.

Perhaps as a rule of thumb, it’s better not to use English identifiers if the team is not comfortable with English documentation as well. Again, in an international environment this may not be an option. Therefore, it might be worthwhile to have an English speaker refactor code written by non-native speakers for the sake of clarity. Another aspect that pertains to language conventions is grammar. You might think that I am nitpicking when mentioning grammar in computer programs. The point is that proper grammar facilitates understanding. This is true for natural language as well as program code. The grammatical rules for identifier names are few and simple. Use nouns for class names and object references, because classes and objects abstract real world objects. In some cases, objects directly correspond to entities of the problem domain, such as Customer, Order, Department, User, etc. In other cases they do not.

It is important to qualify class names further if a single noun does not describe it sufficiently. Examples of qualified nouns are TransferredFunds, AppraisalStatistics, AvailableOptions. Use verbs for methods and functions, because methods abstract behaviour. Good examples for method names are addTax(), computeCRC(), checkAvailability(), getDepartment().

Avoid using nouns as method identifiers, such as department() instead of getDepartment() or adjectives such as available() instead of getAvailability(). An exception to this rule would be languages where methods directly represent properties, because property names should be either adjectives or nouns. Likewise, interfaces should be nouns, or in some special cases adjectives, such as Triangular, Centred, Deductible, etc. depending on their purpose. In the latter case, the adjective is derived from the verb deduct and by convention the interface defines a method with the same name, i.e. deduct(). Other examples for similarly named interfaces are Readable, Comparable, or Sortable.

Finally, we have identifiers for local variables, parameters, and fields. What should these look like? There are no hard rules for these program elements, since local variables, parameters and fields can represent anything from objects to properties, primitives, and even methods. It’s best if grammar conventions follow those of the type the identifier refers to.

Next I want to talk about identifier naming schemes. These are fixed sets of conventions for assigning names. Two widely used schemes are positional notation and Hungarian notation. Positional notation is typically used in legacy languages where identifier names are very short, as for example in older Fortran and Cobol programs. Another example is the 8.3 notation of MS-DOS file system. Identifiers that use positional notation are usually unintelligible without a key. An example would be APRPOCTT, where the first two characters AP stand for the main program module “Accounts Payable”, RP stands for the sub-module “reporting” and OCT1 means “operating costs total 1”.

Needless to say that this notation defies easy readability and should not be used unless absolutely necessary. A more widely used scheme is Hungarian notation. Hungarian notation is characterised by prefixes, and sometimes postfixes, which are added to the variable name and which carry type information. For example, in the original Hungarian notation suggested by Charles Simonyi for the C language, b stands for boolean, i for integer, p for pointer, sz for zero-terminated strings, and so on. These prefixes can be combined. The name arrpszMessages, for example, refers to an array of pointers to zero-terminated strings.

Sometimes Hungarian notation is applied to conceptual types rather than implementation types. For instance, the identifier arrstrMessages refers to an array of strings without saying anything about pointers or zero-termination. Hungarian notation is still used today with certain languages, such as C, C++ and Delphi. In C++ the following prefixes are often used to denote scope: g_ for global, m_ for class members (fields), s_ for static members, and _ for local variables. Delphi is unusual in the sense that Hungarian notation denote object types. The type names themselves are always prefixed with the letter T in Delphi, so a button class would be defined as TButton and an instance of that type would be prefixed with Btn or something similar, such as BtnOK, BtnCancel, BtnRetry, etc.

There is a lot of debate whether Hungarian notation is generally useful or not. The main criticism is that implementation types are somewhat irrelevant for the programmer writing code in a statically typed language, since types are checked by the compiler. When the programmer needs to know about the type of a variable or an object, modern IDEs can usually resolve it automatically. In this case, the prefixes just add unnecessary visual clutter. Hungarian notation still has its place in C, but it’s definitely not that useful with contemporary languages and development tools. The same can be said in principle about most other identifier naming schemes. Naming schemes generally attempt to add metadata to identifiers, or create artificial namespaces. Most modern languages provide better means to both ends.

For example, the Java language has annotations for metadata and packages for namespaces. There is usually no need to employ naming schemes with most modern languages. It’s probably more worthwhile to spent effort on finding intelligible and descriptive names rather than inventing clever naming schemes. Next time I will talk about the practical considerations in choosing good identifier names and provide some examples to illustrate best practices.

PreviousNext