A P A R A D I G M FOR PROGRAMM~[NG STYLE R E S E A R C H Paul W. Oman Computer Science Dept. University of Idaho Moscow,...
22 downloads
603 Views
849KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
A P A R A D I G M FOR PROGRAMM~[NG STYLE R E S E A R C H Paul W. Oman Computer Science Dept. University of Idaho Moscow, Idaho 83843 (208) 885-6589
Curtis R. Cook Computer Science Dept. Oregon State University Corvallis, Oregon 97331 (503) 754-3273
Abstract
Programming style guidelines and automated coding style analyzers have been developed without a solid experimental or theoretical basis. In this paper we make a distinction between typographic style characteristics and underlying structural style content and show that this distinction aids in assessing the influence of style factors. This distinction permits straightforward identification of specific style factors and a better understanding of their effect on program comprehension. The results of our studies have a direct impact on automated coding style assessment programs, programming standards, program maintainability, and code formatting tools. INTRODUCTION Programming style is an intuitive and elusive concept. It is highly individualistic and easily recognized, yet difficult to define or quantify. The goal of programming style is to make a program clear, easy to understand and thereby easy to work with. Publications on programming style often present either a set of rules for good programming [7,8,9] or descriptions of programs for automated style analysis [2,12,13]. Books on programming style usually contain a set of style rules and guidelines based on analogous writing style guidelines for English. Kernighan and Plauger's The Elements of Programming Style [7] is the best known of these books. Automated style analyzers are essentially "style grading" programs that quantify program characteristics thought to represent programming style and compute a style score as a weighted sum of the factors. Although the authors of style guides and style analyzers may have difficulty in defining style, they have no difficulty enumerating style rules and identifying what they believe to be important programming style characteristics. Authors choose stylistic factors commonly thought to influence style with little or no supporting evidence demonstrating the importance or effect of that characteristic on program comprehension and maintainability. Typically their rules are general, and occasionally contradictory, with no guidelines on how to resolve conflicts between rules. In automated style analyzers the choice of factors and corresponding weights is entirely subjective. It is not clear that these programs are measuring program style. For example, one would expect some relation between good programming style and program errors. Harrison and Cook [6] found no relation between Berry and Meeking's style score [2] and the error proneness of a collection of C modules. Furthermore, although many style scores are based (in part) on software complexity metrics, there is no clear understanding of the relationship between style and complexity. For example, Evangelist [5] demonstrated that application of Kernighan and
69
SIGPLAN Notices, Vol. 23, No. 12
Plauger's style rules had unpredictable affects on five common software complexity metrics. His results contradict Arthur's later claims [1,p. 175] that application of the Kernighan and Plauger rules always reduces complexity. We believe that programming style is a multi-faceted concept that is not captured by a collection of rules or by a single style score. Instead we propose that programming style is best analyzed by separating stylistic factors into classes and then studying the affects of these factors on program comprehension and maintainability. We have found it useful to break programming style into two classes: those pertaining to typographic arrangement and those measuring the structural content of the code. Factors belonging in the typographic category include level and method of indentation, line length, detail and placement of comments, placement of blank lines, use of embedded spaces, identifier length, module length, and formats for type and data declarations. The structural style category includes factors for modularity, use of labels and gotos, use of constant definitions, use of included files, use of literals, methods of type and data declarations, use of fibrary functions, level of nesting, control flow, information flow, operator and operand usage, and other factors related to program complexity. Conspicuously absent from this paradigm are the more nebulous characteristics of style -the evaluation of economy, simplicity, readability, and other such traits. We recognize the importance and impact of these characteristics and suggest that controlled studies need to be directed at each of these factors in turn. We do not claim that our operational classification and corresponding list of factors completely captures programming style. Rather, our paradigm represents a modest start at formalizing an approach and methodology used for research on programming style. The primary focus of the research reported in this paper is the isolation of stylistic factors and their effects on program comprehension. We present results from a series of controlled experiments that show the benefits of our approach. We then discuss several direct applications of stylistic concerns to programming tools such as language-directed editors and pretty- printers. W h a t is " p r o g r a m m i n g style" ? There is general agreement that programming style is an elusive concept with intuitive elements that make it difficult to define and quantify. The following illustrate the diversity in attempts to define programming style. "The 'elegance' or 'style' of a program is sometimes considered a nebulous attribute that is somehow unquantifiable; a programmer has an instinctive feel for a 'good' or 'bad' program ..." [2,p.80] "Programming style brings to mind the ways that a creative programmer brings clarity, maintainability, testability, reliability, and efficiency to the coding of a module." [1,p. 173] Programming style shares many similarities with literary writing style. Effective writing is more than observing established conventions for spelling, grammar, or sentence structure. It is the "perception and judgment" a writer exercises in selecting "from equally correct expressions, the one best suited to his material, audience, and intention" [3]. Many books about programming style have rules based on English literature style rules. For example, the form and approach used in Kernighan and Plauger's The Elements of Programming Style, is based on Strunk and White's Elements of Style [16].
70
Programming style rules are goal directed; one rule may stress code efficiency while another addresses code readability. Often these rules are at odds with each other; but, for the most part, the underlying purpose is to produce code that is clear and easily understood without sacrificing performance. Since the purpose of employing programming style standards is to promote clarity and simplify structure, stylistic factors should be investigated with a view toward determining how specific style factors affect comprehension and complexity. Hence, there are four progressive stages of programming style analysis: (1) determining the affects of stylistic factors on comprehension and complexity, (2) establishing programming style standards (relative to local constraints) based on the results from 1, (3) gauging the adequacy and completeness of the standards, and (4) measuring the degree of conformity between the code and the standards. Note that it is only the last stage that suggests the need for automated style analyzers, and yet this is the area where much work has been concentrated. Also note that language-directed editors and pretty-printers operate on formatting rules for arranging code, although there are no established standards for doing so. It appears that only the first and last stages are being addressed without proper attention to the middle steps. Previous studies on programming style. Two types of studies dominate programming style research: (1) investigations into the affects of program style on program comprehension, and (2) development of automated program style analyzers. There are several good representative studies looking at the effects of stylistic variants on program comprehension. Miara, et al, [11] investigated the affect of different indentation levels on students ability to read programs and answer short questions about the programs. They concluded that two to four space indentation was optimal, while six space indentation actually interfered with comprehension. Woodfield, Dunsmore, and Shen [17] conducted a study on the affects of comments and modularity on program comprehension. Their results show that comments and method of modularity both affect students' ability to comprehend a program as measured by a 20 question objective test. The above studies are cited as examples demonstrating the utility of studying the impact of specific stylistic factors on program comprehension. The accepted paradigm for this type of research is controlled administration of a comprehension test across treatment groups receiving a variety of program styles. However, results from many previous studies have been clouded by improper methodology [4]. We suggest that such studies have inadequately controlled stylistic variants within and across treatment groups, and therefore have had difficulty establishing the true cause of observed differences. This may partially explain why some earlier studies on indentation, as reported in [15], are at odds with Miara's later findings. Contradictory evidence on the usefulness of mnemonic names, chunking, nesting, commenting, and other style factors, can also be found in the programming style literature. Proper control can be achieved by recognizing different classes of style factors and then manipulating specific factors across treatment groups. In the following section we present two straightforward experiments where different code styles were given to groups of computer science students and significant comprehension differences were obtained. The other major line of programming style research has concentrated on the development of automated program style analyzers. These style "graders', calculate a single style score that is a weighted sum of the counts of various program characteristics. The counts used to compute the sum are either fixed or specified by the operator. All programs combine measures of
71
typographic characteristics (e.g. indention, blank lines) with those of structure (e.g. control flow, modularity). While some of these studies have informally recognized a distinction between classes of style factors, none have studied them separately. One of the first automated style analyzers was proposed by Rees [13] who described a Pascal source code style grader based on ten factors: average line length, comments, indentation, blank lines, embedded spaces, modularity, variety of reserved words, identifier length, variety of identifier names, and the use of labels and gotos. Each factor was quantified and assigned a positive or negative weight with the sum of the weighted factors giving a percent score. The weights and trigger-points (maximum and minimum values that trigger the assignment of weighted scores for each factor) were established by adjusting the parameters until the automated style analyzer awarded "A" grades to good programs. ees style analysis methodology was implemented on a Unix system by Rosenthal, who distributed his source code through the SIGPLAN Notices correspondence column [ 14]. His style grader is essentially the same as Rees'. Meekings later published a modified version of the style grader [10] and then Berry and Meekings adapted it to work on C source code [2]. The Berry and Meekings style analyzer calculates measurements on the same factors used by Rees ,except the "variety of identifiers" measure was replaced by counts of included files and the percentage of constant definitions" measure has been added. Other minor changes were made to the manner in which the metrics were calculated, but for the most part, the Berry and Meekings style analyzer perpetuates the same style assessment methodology established by Rees. e
T
•
•
•
In their implementation of a FORTRAN source code style analyzer, Redish and Smyth used 33 factors for the automated evaluation of students programs [12]. The 33 measures can be grouped as follows: commenting (4), indentation (1), block sizes (2), statement labels and formats (7), counts of names and statements (6), array declarations (2), control flow and nesting measures (7), blank lines (1), operator count (1), operand count (1), and parameterization (1). Their AUTOMARK program, similar in purpose to those described by Rees and Berry and Meekings, calculates a score and percent for each of thirty factors summed into a final score. Additionally, their ASSESS program is used to obtain nonnumeric evaluation of ten style factors plus comments on indentation, commenting, and label usage. Although the list of factors used by ASSESS and AUTOMARK represents the most complete assessment of style used to date, the scoring system is primarily an adaptation of the Rees maximum and minimum trigger-point methodology with factor weights assigned by the instructor.
Classes of Programming Style Factors. In the analysis of previous research on programming style several things become apparent: (1) "Style" is a multi- dimensional assessment, with little agreement on definition, effects, and characteristics, (2) Some stylistic factors are easy to measure while others can only be operationally defined and approximated. (3) Style characteristics affect program comprehension and therefore impact program maintenance. This includes both structural and typographic factors. To better understand programming style we have found it helpful to distinguish between typographic style and the underlying structural style of the algorithm. The typographic characteristics represent the physical layout of the code. They do not, in any way, affect the performance of the code but may affect the maintainability of the code. On the other hand, the structural style characteristics are dominated by the programming constructs (e.g. looping and branching structures, modularity) and affect both performance and maintainability.
72
STYLE AND C O M P R E H E N S I O N EXPERIMENTS The importance and influence of structural style characteristics on program comprehension is widely accepted and has been demonstrated by many experiments. Typographic style characteristics are considered much less important and influential. Further, the relationship between structural and typographic style has been obscured by results from studies that failed to properly isolate relevant stylistic factors [4,15]. The following two experiments demonstrate the necessity and utility of studying stylistic characteristics with this distinction in mind. Comparing Kernighan & Plauger's nested IF statements. We conducted an experiment comparing three versions of nested IF statements taken from Kernighan & Plauger's Elements of Programming Style, [7,p. 123]. The three versions are listed in Figure 1. Kernighan & Plauger present version 1 as an example of an "ill-chosen layout," and suggest that simple reformatting can improve understandability. Version 2 is their typographic rearrangement of version 1; the only changes are method of indentation and use of embedded spaces. They then argue that a further transformation yields a better version analogous to a CASE statement. By combining logical conditions they eliminate one nested IF-THEN-ELSE and derive version 3. Versions 2 and 3 are written in the same typographic style (i.e. the use of indentation and embedded spaces is the same). Hence, the difference between versions 1 and 2 is purely typographic, while the difference between 2 and 3 is predominately structural. A simple comprehension test, consisting of five short answer and two subjective questions, was designed and administered to 36 junior-level computer science students. The experiment was conducted during the first ten minutes of class. Subjects were randomly assigned into three treatment groups of 12 students; each group receiving one version of the nested IF statements. They were asked to read the IF statements, answer the five questions that followed, record the time necessary to do so, and then subjectively rate the indentation and structure of the code. Thus, the independent variable was the "style" of the nested IF statements (Kemighan and Plauger's three versions). Dependent measures for each subject were: score (0 to 5 points), time required to answer the five questions (0 to 5 minutes), and subjective ratings for indentation and structure on a 5 point forced-choice scale (1-very poor, 5-very good). The materials distributed to subjects consisted of a page of instructions followed by a listing of the IF statements and the test questions. Average scores, times, and ratings for all three groups are shown in Table 1. In support of Kernighan & Plauger, both scores and times improve across the three versions. This is illustrated by the plot of scores and times in Figure 2. Interestingly, the subjective ratings show virtually no change across the styles. Univariate and multivariate analysis of variance showed a significant difference for the time measure (F=4.99, p