Cohesion

Cohesion is a quantative name for how well the individual lines of code within a function (subroutine, module, method, etc) belong together. If every line of code within a function belong there and work towards a single purpose or a single responsibility, then the module is cohesive, or can be said to have a high level of cohesion. For each line of code that does not advance the purpose of the function as described by its name, the level is cohesion is reduced by some amount.

Why is Cohesion important>
Lets start with the Single Responsibility Principle. There is a Wikipedia web page on this here with a description. In short, it says that a function or class should have one and only one responsibility. For simplicity this discussion will remain at the function level and will not delve into class responsibilities.

People can probably come to a reasonable consensus, but on first consideration it seems that we must understand the purpose of the function and the significance of each SLOC within the function before we can rate it. The idea of writing code that can analyze a function and determine if each line of code belongs in that appears problematic.

However, take a look here and at some of the articles linked there. An analysis of the variables used by each SLOC and checking how many others use those items can produce a metric for relatedness, one measure of cohesion. If the first 30 SLOCs use variables from set A and the second use variables from set B, and if there is no overlap between A and B, they the code is not likely to be cohesive. Even if human analaysis reveals that they all work to the same essential purpose, if the two sets of code don't share variables, then one of the set should be factored out into another module.

On initial thought, it seems that we can say a module is cohesive or its not. While true, there are varying levels or types of cohesion. Yourdon and Constatine use an entire chapter to discuss Cohesion reference. They provide the following levels of cohesion in decreasing order of validity:

Functional
This is the ideal. Every line of code is directly related to completing the purpose as stated in the function name. To quote from the book:
"If the only reasonable way of describing the module's operation is a compound sentence, or a sentence containing a comma, or a sentence containing more than one verb, then the module is probably less than functional.:
As noted in my page on naming procedures, cohesion and naming are closely related.
Sequential
One line follows another such as get this from the user, check it for validity, save it in a variable, write it to a database, write it to a log file.
Communicational
Lines of code that seem to communicate with each other. Maybe they share common data, but not necessarily a common purpose.
Procedural
Lines of code that seem to follow an order. They don't fufill a single responsibility, but someone thought that A should follow B should follow C so they were lumped together.
Temporal
Associated in time. Code lumped together because someone thinks they need to be run at the same time, such as initialize everything.
Logical
The lines of code might be logicall grouped together. For example, they might be collections of code to display or get data from the user, but not functionally belong together.
Coindidental
There is little to no cohesion. Maybe this was just a convenient place to put some code. Maybe it was a last minute thought and just stuck here for the moment and forgotten.

Please note that the progression from Functional (best) to Coincidental (worst) is not linear. Functional ranks far above all others while Logical and Coincidental rank far below the remainder. Wikipedia has an entry here .

STRUCTURED DESIGN Fundamentals of a Disciple of Computer Program and Systems Design Edward Yourdon and Larry L. Constantine, Yourdon Press Computering Series, 1979, ISBN 0-13-854471-9, Chapter 7, page 105.

December 2007