Function Points - Numerology for Software Developers

Archived from http://www.hacknot.info/hacknot/action/showEntry?eid=59, which is sadly no longer available.


"Where else can one get such a marvelous return in conjecture from such a modest investment of fact?" - Mark Twain

Numerology is the study of the occult meanings of numbers and their influence on human life1. Numerologists specialize in finding numeric relationships between otherwise disparate figures, and attributing to them some greater significance.

For instance, some claim that by adding up the component numbers in your birth date, together with the numeric equivalent of your name (where A=1, B=2 etc) then a figure is derived that, if properly interpreted, can yield insight into your personality.1

Others consider that the reoccurrence of the number 19 in Islamic texts is evidence of their authorship by a higher being2. The Koran has 114 (6 x 19) chapters and 6346 verses (19 x 334) and 329,156 (19 x 17,324) letters. The word "Allah" appears 2,698 (19 x 142) times. The sum of the verse numbers that mention Allah is 118,123 (19 x 6,217).

Pyramids are a favorite topic for numerologists, and there are dozens of "meaningful" numeric relationships to be found in their dimensions. For instance, the base perimeter of the Great Pyramid of Cheops is 36,515 inches - 100 times the number of days in the solar year. And so on.

We can laugh at such desperate searches for meaning, but before we laugh too hard we should consider that software development has its own brand of numerology, which we have given the grand name of Function Point Analysis (FPA).

Overview of Function Points

FPs were proposed in 1979 as a way of finding the size of a piece of software given only its functional specification. It was intended that the FP count of an application would be independent of the technology, people and methods eventually used to implement the application, focusing as it did upon the functionality the application provided to the user. Broadly speaking, basic FPs are calculated by following these steps:

  1. Divide a functional view of the system into components
  2. Classify each component as being one of five types - external input, external output, external inquiry, internal logical file or external interface file
  3. Classify the complexity of each component as low, average or high. The rules for performing this classification vary by component type.
  4. For each type of component, multiply the number of components of that type by a numeric equivalent of the complexity e.g. low = 3, average = 4, high = 6. The numeric equivalents that apply vary by component type.
  5. Sum the results of step 4 across all five component types. The total is a figure called Unadjusted Function Point count (UFP)

You can then multiply the UFP by a Value Adjustment Factor (VAF) which is based on consideration of 14 general system characteristics, to yield the final Function Point count .

I won't bore you with the excruciating specifics of the component calculations. The above gives you some idea of the nature of FP counting and it's reliance upon subjective judgments. Specifically, the placement of component boundaries and the values chosen for the many weighting factors and characteristics are all determined on a subjective basis. Some of that subjectivity has been embodied in the standardized FP counting rules that are issued by the International Function Point Users Group (IFPUG).

So lacking have FPs been found, that there has been a steady stream of proposed improvements and alternatives to them since 1979. But none of these have challenged the basic FP ethos of modeling functional size as a weighted sum of arbitrarily selected attributes. They simply change the number and definition of those attributes, and the means by which they are mangled together into a final figure. The basic chronology of the FP family tree has been:

To understand why the FP and its many variants are fundamentally flawed, it is first necessary to understand the difference between measuring and rating.

Measurement vs. Rating

To measure an attribute of something is to assign numbers to it on an objective and empirical basis, so that the relationships between the numbers preserve any intuitive notions and empirical observations about that attribute.5 For example, the metric meter is a measure, which implies:

To rate an attribute of something is to assign numbers to it on a subjective and intuitive basis. The relationships between the numbers do not preserve the intuitive and empirical observations about the attribute. In contrast to the above example, consider the rating out of 10 that a reviewer gives a movie:

To clarify, suppose a reviewer expresses their assessment of a movie in words rather than numbers. Instead of rating a movie from 1 - 10, they rate it from "abysmal" to "magnificent". We might be tempted to think a movie that gets an 8 is twice as good as a movie that gets a 4, but we would surely not conclude that "very good" is twice as good as "disappointing". We can express a rating using any symbols we want, but just because we choose numbers for our symbols does not mean that we confer the properties of those numbers upon the attribute we are rating.

In summary:

Function Points Are a Rating, Not a Measurement

From the above, it is clear that FPs are a rating and not a measurement, due to the subjective manner in which they are derived. Hence, they cannot be manipulated mathematically. And yet the software literature is rife with examples of researchers attempting to do just that. Many researchers and reviewers continue to ignore the fundamental implications of the non-mathematical nature of the FP, such as:

Given such limitations, there are very few valid uses of an application's FP count. If the FP counts of two applications differ markedly, and their contexts are sufficiently similar, then you may be justified in saying that one is functionally bigger than the other, but not by how much.3 The notion that FPs can participate in mathematical calculations, and thereby be used for scheduling, effort and productivity measures, is without theoretical or empirical basis.

Why are Function Points so Popular?

Although their use may have declined in recent years, Function Points are still quite popular. There are several factors which might account for their continued usage, despite their essential invalidity:

Function Point's True Believers

FPs have attracted their own league of True Believers - like many technical schools whose tenets, lacking an empirical basis, can only be defended by the emotional invective of their adherents. I encountered one such adherent recently in David Anderson, author of "Agile Project Management." Anderson made somerather pompous observations 9 on his blog as to how surprising it was that people should express disbelief regarding his claims to 5 and 10-fold increases in productivity using TDD, AM and FDD. I replied that their incredulity might stem from the boldness of his claims or the means by which he collected his data, rather than an inherently obstreperous attitude. He indicated his productivity data was expressed in FPs per unit time! I tried explaining to him that FPs cannot be used to measure productivity, because not all FPs are created equal, as explained above. He wasn't interested. That discussion has now been deleted from his blog. He also denied me permission to reproduce that portion of it which occurred in email.

Such is the attitude I typically encounter when dealing with self-styled gurus and experts. There is much talk of science and data, but as soon as you express doubt regarding their claims, there is a quick resort to insult and posture. Ironic, given that doubt and criticism are the basic mechanisms that give science the credibility that such charlatans seek to cloak themselves in.

Why Must Functional Size be a Single Number?

The appeal, and hence the popularity, of FPs is their reduction of the complex notion of software functional size to a single number. The simplicity is attractive. But what basis is there for believing that such a single-figure expression of functional size is even possible?

Consider this analogy3. When you walk into a clothing store, you characterize your size using several different measures. One figure for shirt size, another for trouser size, another for shoe size and another for hat size. What if, by way of misguided reductionism, we were to try and concoct a single measure of clothing size and call it Clothing Points. We could develop all sorts of rules and regulations for counting Clothing Points, including weighting factors accounting for age, diet, race, gender, disease and so on. We might even find that if we sufficiently controlled the influence of external factors, given the limited variations of the human form, we might eventually be able to find some limited context in which Clothing Points were a semi-reasonable assessment of the size of all items of clothing. We could then walk into a clothing store and say "My size is 187 Clothing Points" and get a size 187 shirt, size 187 trousers, size 187 shoes and size 187 hat. The items might even fit, although we would likely sacrifice some comfort for the expediency and convenience of having reduced four dimensions down to a single dimensionless number.

The search for a grand unified "measure" of functional size may be just as foolhardy as the quest for uni-metric clothing.

Conclusion

The continued use and acceptance of Function Point Analysis in software development should be a source of acute embarrassment to us all. It is a prime example of muddle-headed, pseudo-scientific thinking, that has persisted only because of the general ignorance of measurement theory and valid experimental methodology that exists in the development community. We need to stop fabricating and embellishing arbitrary sets of counting rules. In doing so, we are treating these formulae as if they were incantations whose magic can only manifest when precisely the correct wording has been discovered, but whose inner workings must forever remain a mystery. Rather, we need to go back to basics and work towards understanding the fundamental technical dimensions that contribute to the many and varied notions of an application's functional size. How can we hope to measure something when we can't even precisely define what that something is? Empiricism holds some promise as a means to improve software development practices, but the pseudo-empiricism of Function Point Analysis is little more than numerological voodoo.

References

  1. The Skeptic's Dictionary - R. Carroll, Wiley & Sons, 2003
  2. Did Adam and Eve Have Navels? - Martin Gardner, W.W. Norton & Company, 2000
  3. The Problem with Function Points - B. Kitchenham, IEEE Software, March / April 1997
  4. Preliminary Guidelines for Empirical Research in Software Engineering - B. Kitchenham et. al
  5. Software Measurement: A Necessary Scientific Basis - N. Fenton, IEEE Trans. Software Eng., Vol. 20 No. 3, 1994
  6. A Critique of Software Defect Prediction Models - Fenton, Neil
  7. Statistical Guidelines for Contributors to Medical Journals - Altman, Gore, Gardner, Pocock, 1983, British Medical Journal, vol. 286
  8. Measurement and Estimation - Burris
  9. World Class Velocity - David Anderson, blog entry, Friday June 4, 2004
  10. Why We Should Use Function Points - S. Furey, IEEE Software, March / April 1997
  11. Comparison of Function Point Counting Techniques - J. Jeffery, G. Low, M. Barnes, IEEE Trans. Software Eng., Vol. 19 No. 5, 1993