Thursday, July 24, 2008

5 rules of variable naming.

When I was at uni some 10 years back now, I remember one of my lecturers telling me not to make variable names too long because you would get "pain in your fingers".

Well, rule #1 goes against that advice:

1. Make your variable names long and descriptive
Visual Studio has IntelliSense, Eclipse has its own code completion, and I'm sure whatever IDE you're using can finish your variable names off for you, too. Using long names prevents the ambiguity of short or cryptic names.

2. Put units in your variable names
If you are writing an engineering application you are going to be using variables with units. Embed the unit name in the variable, for example, distanceInMM.

3. If you are using Camel Case, don't capitalise commonly hyphened, or combined words.
Let me explain.

Callback is normally spelt as one word. So, pretty please, don't call your variable callBack.

4. Never, ever use the variable name temp. The only perfectly valid exception to this rule, is when you're writing a swap function.

5. int i is perfectly valid in a small loop. I've met programmers who would crucify me for saying this, but when your loop is half a dozen lines of code long or less, int i is perfectly valid as a loop counter. It's so widely used, it's almost expected.

55 comments:

Anonymous said...

Also, make sure your variable names are SPELLED correctly. For example:

// Good
int spelled = 20;

// Bad
int spelt = 0;


:-)

Barbara said...

What?? :) 'Spelt' is a word - maybe it's ... uhm ... some kind of bakery inventory app?

Scott Spear said...

I couldn't agree more with #5. I have talked with many people that don't allow the use of int i ever. You are right on in saying that it is so commonly used that it is expected. It is normal to see it and is understood to simply be a loop counter. I use it in my code as well. I love the list; nicely done.

Shams said...

anothr thing we do in our office is have separate prefixes for member variables, parameters and local variables. There is good IDE support for these in both eclipse and idea

Anonymous said...

Spelt and spelled are a terrible example. That just depends on which dictionary you use, American or English. And we all know how you Americans like to murder the language...


http://dictionary.reference.com/browse/spelt

Anonymous said...

Actually spelt was accidentally good example :)
You have to settle on american/british spelling too, to avoid confusion with color/colour.

Fleejay said...

Thanks for the comments.

Shams, that's a really good point about having different prefixes. When programming in c++ I've used m_ for m_myMemberVariable and p for pMyPointer for the past few years. I've found that to be very useful.

MikMak said...

The use of hungarian notation is common and I think a mistake worth noting. This commonly takes this form:

class Foo {
const char* m_pkcMyVar; // member variable - pointer to const char
};

At a company where I work, we use minimal HN like so:

class Foo {
const char* mMyVar; // member variable
};

This HN can be useful from time-to-time. While HN is not my favorite, this minimal form is livable and HN in some form is common enuf to add it to your list (or intentionally remove).

doswheeler said...

I think the HN is quite useful. Very good article.

JT
www.FireMe.To/udi

Anonymous said...

'i' is a good variable name ?? Burried for bad advice.

Anonymous said...

I like the discussion on variable naming ... now on to the discussion what is the ideal programming language ... ;-P

Anonymous said...

Descriptive variable names are good. Long variable names are bad.

Intellisense and code completion are a good tool to increase productivity but you shouldn't use it as a crutch.

Anonymous said...

I saw no mention of including the var type in the var name. For example iMiles. The i would represent that the var type is an int. Is this convention still used?

Anonymous said...

There is a very detailed analysis of variable naming issues at www.knosof.co.uk/cbook/sent792.pdf . It appears to be a very complicated problem with lots of trade offs involving the where the variable appears and the use to which it is put.

Rishi said...

int ii is slightly better because it is easy to trace in the code.

Anonymous said...

Using i in matlab loops is a killer, since i is a reserved for the square root of -1. I would assume using it in other languages when writing electrical engineering programs wouldn't be a good idea either due to a small possibility of confusion, or having to shift between thinking contexts.

Anonymous said...

>int ii is slightly better because it is easy to trace in the code.

If you need to trace it in the code, your problem is not the length of the variable name, but the length of the method.

Hungarian notation and other variable prefix notation are duplication of what your IDE is already telling you through formatting (or can find quite easily.) If you're not using an IDE or other smart development environment and you don't believe in decomposing your problem into small pieces that do one job well this doesn't apply to you. Nor do many other good practices.

Neuro said...

yes but realy long names cause problems by makeing code verbose and less easy to grasp at a glance.

And for dyslexic programmers who often have short term memory problems. RealyLongNameLikeThis cause more problems than its worth

And with the rise of less strictly typed languages like PHP its to easy to make subtle spelling errors that your ide won't pick up. That only bite you in odd edge conditions.

And a lot of PHP programmers employers wont spring for pro IDEs

Anonymous said...

"spelt" is ok in the UK.

Anonymous said...

Please never do what a co-worker used to do....he was a good scripter, but maintaining his scripts was hell on earth.

He tended to name all variables things like "buster", "bubba" and "roscoe", etc. Try understanding or maintaining THAT!

Anonymous said...

And who was the idiot that decided that variable names like
HereIsAVeryLongVariable were more readable than here_is_a_very_long_variable?

Liam said...
This comment has been removed by the author.
Liam said...

#1 - long variable names: I would simply say, "make your variable names descriptive." Making it long doesn't necessarily add to its meaning or your understanding of its use.

#2 - units in your variable names: putting units (or any other identifying information) in a variable's name ... I dunno about that. We've developed many systems and techniques over the decades for tiered analysis, design, and development: data hiding, process hiding, "black boxing," cloud computing, decoupling the data and presentation from the mechanics (MVC paradigm). I think tagging a variable name with, say, a unit of measurement immediately screams "inflexible!" What if the system grows to include other measurement systems? A length is a length; the unit of length (as well as what it measures) is contextual and should remain abstract.

Similarly, the idea of embedding a data type in a variable's name: int iLength; float fTotal; etc. - again: bad idea. You're entangling the "how" with the "what."

Again, I hate the "m_" and "m" tagging for member variable/attribute/property names. You're supposed to be "data hiding" and "encapsulating" yet ... all that tagging (as well as "unit" and "type" tagging) does nothing but expose implementation!

#3 - Camel Case: I grew up in the world of naming my variables something_like_this. No one likes that anymore - which is fine; times changes, techniques are improved.

These days, I prefer camel case and absolutely abhor first-letter capitalization - especially on methods/member functions. For tight loops, heavy mathematics/analysis, I must fight for my right to use a succinct single-letter variable name. If such naming was/is good enough for the greats of Knuth, Wirth, and Hoare, it's good enough for me. ;)

#4 - "temp" as a var name: I would generally agree; and I think similarly for the rampant use of 'i' 'x' 'y' 'z' - etc. as a "quick variable."

#5 - see #3 and balance that against #4.

Anonymous said...

Real programmers use whitespace; to which this advice does not apply ;)

Hal said...

My pet peeve is using "index" as the variable name in a for loop. for (index=0; index<max; index++). Sometimes used if the style guide forbids the use of i as the index variable.

The problem is on some architectures, index is a widely used string processing function, so if you later add certain include files, you now get an error.

hyp3r said...

I think one of the most important rules should be that you follow some sort of conventions and do it consistently. Don't use one convention half the time and another the other half.

Also, as for the naming_like_this versus namingLikeThis debate, when I was in college I used the former, but since I have started doing mainly php work, I use the latter. I do this because built-in php functions typically name_like_this, and namingLikeThis helps avoid redefining existing funtions on accident and eliminates the confusion as to whether a function is something built-in or something I have coded.

EclecticMix said...

The first version of Basic I learned allowed for up to 24 variables - each could only be a single character. Those days are past. Having worked as a professional programmer for the past 20+ years I can say with certainty that anything I write will have to be modified, due to the requester eventually changing their mind on something. Creating readable code allows one to look like a champ when it comes to making the changes quickly, so descriptive variable names are the beginning of this process.

Anonymous said...

I suggest using 'ii' instead of the single-character case; this allows for easy search & replace.

Ross said...

I think temp is a perfectly good variable name in lots of things, for example, linked list functions.

Ad Manager said...

We use $vTableName for local variables. $mCurrentIndex for internal properties of a class.

Dave said...

Agreed with most of the article with one exception being #1, "Make your variable names long and descriptive".

Most languages (and specifically the ones you mention as examples) use a variable lookup table for matching purposes, and the shorter the variable name the quicker the lookup occurs, hence shorter variable names = quicker processing time.

J. said...

Hate to be a stickler, but "Callback" is not camel case, "callback" is.

And anyone who ever tries to make any argument against long variable names is:
1. Lazy
2. A horrible programmer (I base this on the simple argument:
Your function is based on strong logic skills yet you can not get your brain around the notion: code is written once, it is read endlessly. Therefore, you suck).


The one thing I don't see in these discussions is nounAdj versus adjNoun. Think about the following:

firstName
lastName
dependantName

Now, compare that to
nameFirst
nameLast
nameDependant

Again, quit the "oh...but then I might have to actually type...or even think...for the money I get paid" and think about maintaining that a year or two.

Fleejay said...

"'i' is a good variable name ?? Burried for bad advice." I knew that would be controversial.

Liam, interesting point about entangling the how with the what. I think the lower you go in your software (ie closer towards hardware, or hidden away in the fathoms of library code) the more important my point becomes.

For example, if your robot moves distances in millimetres it makes it easier for the code maintainer if you include that in the variable name.

ian said...

1) Descriptive yes, long, not necessarily. In fact I'd go short and descriptive by choice. If a variable relates to a concept so complex it needs to be a page long, you need to refactor or at least comment.

2) I think this was mentioned above, but surely this is going to stuff you when you decide that inches should be an option as well as mm.

3) I guess it depends on the language, where CallBack is a type, I'm going to call the variable callBack or even callback. I mostly code C# or Java, so you understand my current experience.

4) Agreed, it's far too long, use tmp instead. There are plenty of entirely viable places where tmp is acceptable, it should indicate to the user that this is a temporary place holder for a bit of data. I do know where you're coming from here, but I've rarely had problems with the tmp bit, just the weird rubbish on the right hand side of the assignment.

5) Agreed. i or c are used extensively in pretty much every bit of software I've worked on or with. I'm also happy with x,y,z and r,c. Beyond those obvious conventions I'd need a bit of convincing. I've never looped on b for example :)

Other than that, I like to prefix private members with _, my parameters all have a p prefix, method variables start with lower case. This all allows me to use intellisense. My head thinks like this. "Set private member variable lemons to "mouse"". My fingers interpret that as "Press underscore, then l and I'm probably there. It's the same with parameters, I know if I press p I'm probably going to get a parameter first go. I find Microsoft's naming recommendations counterproductive for this exact reason.

To whom ever above suggested that data hiding was a valid reason to forego these types of conventions, you've missed what data hiding is attempting to achieve, it isn't about hiding the internals from the author of that class.

HN is outmoded, and I can't stand working on code that uses it any more. What really annnoys me though is working on a project where two or three different conventions have been used willy nilly. That is hard hard work.

Actually, just previewed and then spotted someone arguing that long variable names are the only way forward:

"And anyone who ever tries to make any argument against long variable names is:
1. Lazy
2. A horrible programmer (I base this on the simple argument:
Your function is based on strong logic skills yet you can not get your brain around the notion: code is written once, it is read endlessly. Therefore, you suck).
"

The fact it is read endlessly means it should not be overly verbose. Comically your argument could be accurate if by long variable names, you mean sufficiently long to convey clearly the intention of the programmer. However you didn't say that, so I'm chucking you back in the pond with your long variables and poorly constructed logic ;)

Seriously though, I saw something like the below not that long ago.

int theSumOfTwoOrMoreScoresDividedByTheNumberOfScores;

instead of:

int averageScores;

Or in my parlance

int avgScores;

Anonymous said...

what's wrong with TextBox1, TextBox2?

LOL

Anonymous said...

Most of what i have read here relates to this one question: who are you writing for: yourself (or company or whatever). If so, long v names are the rule: hide it and make 'em work; for other programmers (read forever), long can work against. be clear, concise, use basic 'readability' rules in American English (if brits think they have a higher level of understanding then they should understand us low lifes) with proper capitalization–explain if necessary- as in

firstName
lastName
dependantName

Now, compare that to
nameFirst
nameLast
nameDependant;


or for computers, then long names don't matter.

'Long names' is a matter of knowing your audience.

Anonymous said...

you should never use 'i', you should use "i_am_usingThisAsAnIndexIntoSomething"

Anonymous said...

1. if the scope of a variable is small, then a small name will suffice. if the scope is large, as in global variable, then a long wordy name is needed.

2. spelt is a kind of wheat, nothing to do with grammar and syntax.

-bobmc

ShahG said...

I follow a new naming convention that I find better to be used with VS2005.

instead of using

lblNameFirst, txtNameFirst
lblNameLast, txtNameLast

I find it better to use

NameFirstLabel, NameFirstTextbox
NameLastLabel, NameLastTextbox

Advantages:

1. It gives you complete name of the control with type.

2. With Intellisense when you press "dot" followed by "n" you'll get all controls starting with "Name" in this case.

3. VS2005's intellisense remembers that last used item for you. If you are working with "Name" controls on pressing "dot" you will be taken to last item used.

In short it saves time and saves the trouble of finding correct control names specially if you are reviewing the code after some time.

Comments????

Anonymous said...

I disagree completely with #2.

Putting units in your variable name does nothing to increase readability, and encourages the use of multiple units throughout the application. distanceInMM * distanceInInches? Who would do that?

Have one set of units throughout the class, and if you really need to, have your getters able to take another argument specifying the type of unit it should be returned as. Example in Java, with Unit as an enum/class:
public double getLength(Unit unitType)

Much better. A little more verbose*, but it will save you headaches later.

*remember that this is code for a distance measure that can output in multiple formats. Most things that store a distance would only need one format (my original point).

Anonymous said...

NEVER have units in varable names. Use a defined unit set (typically SI) and stick with it. This in effect makes your code _unitless_ which everyone will benefit from in the long run.

arslanali said...

While developing web in application and i use following naming convention

for variables i use camel notation with variable type in name like

intCount, strFirstName , arrUsers and objContact

and for function names i use descriptive name with underscores

function fetch_user_list
function get_all_users_count

etc

For me this naming convention is pretty useful

fragglet said...

I don't use an IDE, you inconsiderate clod! But you can autocomplete in vim using ctrl-p.

I normally use "i, j, k" as counters. I don't think it really adds anything to name it something like "loopCounter".

akuhn said...

PERSONALLY, I take particular care when naming methods. An often neglected detail is that an Object's public method names are almost ever written following the variable name holding an instance to the object. Thus it is rather pointless to start all methods of eg a Factory class with the verb "create", rather we can omit it and stick to the convention that its instances must be named create!

Factory create = new Factory();
Node n = create.Element();
Edge e = create.Edge();
...

In lack of a better name, I am calling this the Read-Aloud naming convention.

akuhn said...

As a search for Spartan Programming does not yield an matches on the comments so far, please let me add this

1) as already mentioned, long and descriptive are not the same. I prefer concise variable names over long ones. "The Elements of Style" by William Strunk applies to source code as well!

4 + 5) In methods shorter than a dozen lines, ALL my local variables use one or two letter identifiers. Also I use the convention of naming a newly creates return value simply $ (Yes, that is valid even in Java), which has the advantages that its always clear if and which is the to be creates object of a method.

For more on this, please refer to Yossi Gil's awesome Spartan Programming conventions...

akuhn said...

I replied in more detail on my blog
The 6-th Rule of Variable Naming

PascalAschwanden said...

Java variables tend to be to long though there's something to be said for self-commenting code.

http://www.codesplunk.com/?s=c
Programming Language Questions & Review

Anonymous said...

float averageScore is better than
float avgScores.

Why? because someone is gonna write
avrgScore and someone else is gonna write avScore and someone else is gonna write aveScore and someone else is gonna write avrgScore etc.

That's not the best example, but consider if you deal with say, localization. You need to track a location and you need to track a localized string.

Say you have a you have Location class, so you have
Location loc(USA);

Then somewhere you have the localized string and you do locStr = LocalizeString(loc(USA), myString).

But then the calling function, not written by you uses strLoc to define the location name (USA). (because it's of type string, and it defines the location, and they like abbreviations)

She leaves to work at google and now you have to maintain her code.
So your code has locStr to mean the localized string, and she has strLoc to mean the name of the place.

Yuck!

If she had used location or strLocation or LocationString and you had used localizedString or strLocalized (yuck, but ok) then it would be clear what's what.

loc, ref (reference, referrer), rel (relationship, relative), rec (receipt, receptor, receiver), etc.

I can think of tones of these from real experience.

Anonymous said...

"I suggest using 'ii' instead of the single-character case; this allows for easy search & replace."

If you need to search & replace your short index variable, you're doing something wrong.

Anonymous said...

I'd name temporary files or folders temp/tmp too. Typically in shell scripts:

DB_TMP=`mktemp`
echo .dump | sqlite3 database > $DB_TMP

Voltaire said...

A word about rule #5. I prefer to add "th" to the end of loop indices. Thus in my code you'll see variables named "ith", "jth", and so on.

This is because loop indices are used to specify "which one" and in English numbers followed by "th" are use to specify which one, except for numbers less than 4. If someone asks you which beer you're working on and you say "my fourth", that means you've already worked on beers 1 through 3 and haven't gotten to beer number 5.

This is very similar to the use of a loop index. If the code is executing and it's worked on array elements 1 through 3 and hasn't gotten to element 5 yet, the loop index contains 4 to indicate you're working on the fourth element of the array. Naming the loop index with a "th" suffix is just generalizing as is done in mathematics when you say things about the ith integer of a sequence or set.

Adding "th" to loop indices also makes it a lot easier to search for uses of loop indices... unless you've been working on your beers at the same time.

Anonymous said...

Hungarian notation

mikeborozdin said...

I love your points. I also use hungarian notations on occasions, especially if I use a dynamically-typed language like PHP.

neekey said...

Yeah.It is a good idea

Anonymous said...

i j k l as integer variables in loops were started in the 60s. i forget the language, it was pre-fortran.