Archived from http://www.hacknot.info/hacknot/action/showEntry?eid=59, which
is sadly no longer available.

*"Where else can one get such a marvelous return in conjecture from
such a modest investment of fact?" - Mark Twain*

*Numerology* is the study of the occult meanings of numbers and
their influence on human life^{1}. Numerologists specialize in
finding numeric relationships between otherwise disparate figures, and
attributing to them some greater significance.

For instance, some claim that by adding up the component numbers in your
birth date, together with the numeric equivalent of your name (where A=1,
B=2 etc) then a figure is derived that, if properly interpreted, can yield
insight into your personality.^{1}

Others consider that the reoccurrence of the number 19 in Islamic texts
is evidence of their authorship by a higher being^{2}. The Koran has
114 (6 x 19) chapters and 6346 verses (19 x 334) and 329,156 (19 x 17,324)
letters. The word "Allah" appears 2,698 (19 x 142) times. The sum of the
verse numbers that mention Allah is 118,123 (19 x 6,217).

Pyramids are a favorite topic for numerologists, and there are dozens of "meaningful" numeric relationships to be found in their dimensions. For instance, the base perimeter of the Great Pyramid of Cheops is 36,515 inches - 100 times the number of days in the solar year. And so on.

We can laugh at such desperate searches for meaning, but before we laugh too hard we should consider that software development has its own brand of numerology, which we have given the grand name of Function Point Analysis (FPA).

FPs were proposed in 1979 as a way of finding the size of a piece of software given only its functional specification. It was intended that the FP count of an application would be independent of the technology, people and methods eventually used to implement the application, focusing as it did upon the functionality the application provided to the user. Broadly speaking, basic FPs are calculated by following these steps:

- Divide a functional view of the system into components
- Classify each component as being one of five types - external input, external output, external inquiry, internal logical file or external interface file
- Classify the complexity of each component as low, average or high. The rules for performing this classification vary by component type.
- For each type of component, multiply the number of components of that type by a numeric equivalent of the complexity e.g. low = 3, average = 4, high = 6. The numeric equivalents that apply vary by component type.
- Sum the results of step 4 across all five component types. The total is a figure called Unadjusted Function Point count (UFP)

You can then multiply the UFP by a Value Adjustment Factor (VAF) which is based on consideration of 14 general system characteristics, to yield the final Function Point count .

I won't bore you with the excruciating specifics of the component calculations. The above gives you some idea of the nature of FP counting and it's reliance upon subjective judgments. Specifically, the placement of component boundaries and the values chosen for the many weighting factors and characteristics are all determined on a subjective basis. Some of that subjectivity has been embodied in the standardized FP counting rules that are issued by the International Function Point Users Group (IFPUG).

So lacking have FPs been found, that there has been a steady stream of proposed improvements and alternatives to them since 1979. But none of these have challenged the basic FP ethos of modeling functional size as a weighted sum of arbitrarily selected attributes. They simply change the number and definition of those attributes, and the means by which they are mangled together into a final figure. The basic chronology of the FP family tree has been:

- 1979 - Function Points (Albrecht)
- 1986 - Feature Points (Jones)
- 1988 - Mark II Function Points (Symons)
- 1989 - Data Points (Sneed)
- 1991 - 3-D Function Points (Boeing)
- 1994 - Object Points (Sneed)
- 1997 - Full Function Points (St. Pierre et. al)
- 1999 - COSMIC Full Function Points (International FP Users Group)

To understand why the FP and its many variants are fundamentally flawed,
it is first necessary to understand the difference between *measuring*
and *rating*.

To *measure* an attribute of something is to assign numbers to it
on an objective and empirical basis, so that the relationships between the
numbers preserve any intuitive notions and empirical observations about that
attribute.^{5} For example, the metric meter is a measure, which
implies:

- 4 meters is twice as long as 2 meters, because 4 is twice 2
- The difference between 9 and 10 meters is the same as the difference between 1 and 2 meters, because 10-9 = 2-1
- If you moved 4 meters in 2 seconds (at constant velocity) then you moved 2 meters in the first second and 2 meters in the last second.
- If two different people measure the same length to the nearest meter, they will get the same number

To *rate* an attribute of something is to assign numbers to it on
a subjective and intuitive basis. The relationships between the numbers do
*not* preserve the intuitive and empirical observations about the
attribute. In contrast to the above example, consider the rating out of 10
that a reviewer gives a movie:

- A movie that gets a 4 is not twice as good as a movie that gets a 2
- The difference between movies that get 9 and 10 is not the same as the difference between movies that get 1 and 2
- A 2 hour movie that gets a 6 did not rate 3 for the first hour and 3 for the second hour
- Two different people rating the same movie may award different ratings

To clarify, suppose a reviewer expresses their assessment of a movie in words rather than numbers. Instead of rating a movie from 1 - 10, they rate it from "abysmal" to "magnificent". We might be tempted to think a movie that gets an 8 is twice as good as a movie that gets a 4, but we would surely not conclude that "very good" is twice as good as "disappointing". We can express a rating using any symbols we want, but just because we choose numbers for our symbols does not mean that we confer the properties of those numbers upon the attribute we are rating.

In summary:

- A
*measurement*is objective and can be manipulated mathematically - A
*rating*is subjective and cannot be manipulated mathematically

From the above, it is clear that FPs are a rating and not a measurement, due to the subjective manner in which they are derived. Hence, they cannot be manipulated mathematically. And yet the software literature is rife with examples of researchers attempting to do just that. Many researchers and reviewers continue to ignore the fundamental implications of the non-mathematical nature of the FP, such as:

*You cannot measure productivity using FPs*^{3}- If a team completes an application of 250 FP in 10 weeks, their productivity is not 25 FP/week. The figure "25" has no meaning. Similarly, a given team need not take 50% longer to write a 1800 FP application as they will a 1200 FP application.*You cannot compare FP counts numerically*^{3}- An application of 1000 FP is not twice as big, complex or functional as an application of 500 FP. The first application is not "twice" the second in any meaningful sense.- Y
*ou cannot compare FPs from disparate sources*^{3}- The subjectivity of FP analysis makes it sensitive to contextual variations in application domain, technology, organization and counting method.

Given such limitations, there are very few valid uses of an application's
FP count. If the FP counts of two applications differ markedly, and their
contexts are sufficiently similar, then you *may* be justified in
saying that one is functionally bigger than the other, but not by how much.^{3}
The notion that FPs can participate in mathematical calculations, and
thereby be used for scheduling, effort and productivity measures, is without
theoretical or empirical basis.

Although their use may have declined in recent years, Function Points are still quite popular. There are several factors which might account for their continued usage, despite their essential invalidity:

- The fact that other organizations use FPs is enough to encourage
some to follow suit. However, we should be aware that an
*argument from popularity*has no logical basis. There are many beliefs that are both widely held and false. The popularity of FPs may only be indicative of how desperately the industry would like there to be a single measure of functional size that can be calculated at the specification stage. It certainly would be desirable for such a measure to exist, but we cannot wish such a metric into existence, no matter how many others have the same wish. - Some researchers claim to have validated function points (in their
original form, or some later variant thereof). However, if you examine
the details of these experiments, what you will find is pseudo-science,
ignorance of basic measurement theory
^{ 5}and statistics, and much evidence of "fishing for results". There is a lot of fitting of models to historical data, but not a lot of using those models to predict future data. This is not so surprising, for the general standard of experimentation in software is very poor, as Fenton observes.^{5}Altman makes an observation^{7}about the legion of errors that occur in medical experimentation that could apply equally well to software development:*"The main reason for the plethora of statistical errors is that the majority of statistical analyses are performed by people with an inadequate understanding of statistical methods. They are then peer reviewed by people who are generally no more knowledgeable."* - Hope springs eternal. Rather than concede that efforts to embody functional size in a single number are misguided, it is consoling to think that FPs are "nearly there", just a few more tweaks away from being useful. Hence the many FP variants that have sprung up.
- FP enthusiasts selectively quote the "research" that is in their
favor, and ignore the rest. For example, the variance between FP counts
determined by different analysts is often quoted as "plus or minus 11
percent."
^{10}However other sources^{11}have reported worse figures, such as a 30% variation*within*an organization, rising to more than 30 percent*across*organizations. - Some choose to dismiss the theoretical invalidities of FPs as
irrelevant to their practical worth. Their excuses may have some appeal
to the average developer, but don't withstand scrutiny. Examples of such
excuses are:
o

*As long as FPs work, who cares what basis they have or don't have?*- The problem is that in general, FPs*don't*work. Even FP adherents will admit to the numerous shortcomings of FPs, and the need to constrain large numbers of contextual factors when applying them. Witness the various mutations of FP that have arisen, each attempting to address some subset of the numerous failings of FPs.o

*It doesn't matter if you're wrong, as long as you're wrong consistently*^{ 8}- Unfortunately, unless you know*why*you're wrong, you have no way of knowing if you are indeed being*consistently*wrong. FPs are sensitive to a great many contextual factors. Unless you know what they are and the precise way they effect the resulting FP count, you have no way of knowing the extent to which your results have been influenced by those factors, let alone whether that influence has been consistent.

FPs have attracted their own league of True Believers - like many
technical schools whose tenets, lacking an empirical basis, can only be
defended by the emotional invective of their adherents. I encountered one
such adherent recently in David Anderson, author of "Agile Project
Management." Anderson made somerather
pompous observations ^{9} on his blog as to how surprising it
was that people should express disbelief regarding his claims to 5 and
10-fold increases in productivity using TDD, AM and FDD.
I replied that their incredulity might stem from the boldness of his claims
or the means by which he collected his data, rather than an inherently
obstreperous attitude. He indicated his productivity data was expressed in
FPs per unit time! I tried explaining to him that FPs cannot be used to
measure productivity, because not all FPs are created equal, as explained
above. He wasn't interested. That discussion has now been deleted from his
blog. He also denied me permission to reproduce that portion of it which
occurred in email.

Such is the attitude I typically encounter when dealing with self-styled gurus and experts. There is much talk of science and data, but as soon as you express doubt regarding their claims, there is a quick resort to insult and posture. Ironic, given that doubt and criticism are the basic mechanisms that give science the credibility that such charlatans seek to cloak themselves in.

The appeal, and hence the popularity, of FPs is their reduction of the complex notion of software functional size to a single number. The simplicity is attractive. But what basis is there for believing that such a single-figure expression of functional size is even possible?

Consider this analogy^{3}. When you walk into a clothing store,
you characterize your size using several different measures. One figure for
shirt size, another for trouser size, another for shoe size and another for
hat size. What if, by way of misguided reductionism, we were to try and
concoct a single measure of clothing size and call it *Clothing Points*.
We could develop all sorts of rules and regulations for counting Clothing
Points, including weighting factors accounting for age, diet, race, gender,
disease and so on. We might even find that if we sufficiently controlled the
influence of external factors, given the limited variations of the human
form, we might eventually be able to find some limited context in which
Clothing Points were a semi-reasonable assessment of the size of all items
of clothing. We could then walk into a clothing store and say "My size is
187 Clothing Points" and get a size 187 shirt, size 187 trousers, size 187
shoes and size 187 hat. The items might even fit, although we would likely
sacrifice some comfort for the expediency and convenience of having reduced
four dimensions down to a single dimensionless number.

The search for a grand unified "measure" of functional size may be just as foolhardy as the quest for uni-metric clothing.

The continued use and acceptance of Function Point Analysis in software development should be a source of acute embarrassment to us all. It is a prime example of muddle-headed, pseudo-scientific thinking, that has persisted only because of the general ignorance of measurement theory and valid experimental methodology that exists in the development community. We need to stop fabricating and embellishing arbitrary sets of counting rules. In doing so, we are treating these formulae as if they were incantations whose magic can only manifest when precisely the correct wording has been discovered, but whose inner workings must forever remain a mystery. Rather, we need to go back to basics and work towards understanding the fundamental technical dimensions that contribute to the many and varied notions of an application's functional size. How can we hope to measure something when we can't even precisely define what that something is? Empiricism holds some promise as a means to improve software development practices, but the pseudo-empiricism of Function Point Analysis is little more than numerological voodoo.

*The Skeptic's Dictionary*- R. Carroll, Wiley & Sons, 2003*Did Adam and Eve Have Navels?*- Martin Gardner, W.W. Norton & Company, 2000*The Problem with Function Points*- B. Kitchenham, IEEE Software, March / April 1997*Preliminary Guidelines for Empirical Research in Software Engineering*- B. Kitchenham et. al*Software Measurement: A Necessary Scientific Basis*- N. Fenton, IEEE Trans. Software Eng., Vol. 20 No. 3, 1994*A Critique of Software Defect Prediction Models*- Fenton, Neil*Statistical Guidelines for Contributors to Medical Journals*- Altman, Gore, Gardner, Pocock, 1983, British Medical Journal, vol. 286*Measurement and Estimation*- Burris*World Class Velocity*- David Anderson, blog entry, Friday June 4, 2004*Why We Should Use Function Points*- S. Furey, IEEE Software, March / April 1997*Comparison of Function Point Counting Techniques*- J. Jeffery, G. Low, M. Barnes, IEEE Trans. Software Eng., Vol. 19 No. 5, 1993