SUN, Code Conventions for the Java Programming Language, 1999 - DRAFT Annotations

What follows are my draft notes on the applicability of SUN's Code Conventions for the Java Programming Language (1997, 1999) with regard to enterprise standards and guidelines for Java development.
SUN's Code Conventions for the Java Programming Language is certainly an influential work in the area of Java standards and guidelines but it hasn't been updated since 1997-1999.  As a result, it contains a mix of enduring fundamentals for sound Java development but also includes conventions that are obsolete. Using obsolete Java conventions is akin to calculating your deductions based upon obsolete tax law - it not only isn't a good idea but is actually a very bad idea. So, writing a Java standards document based upon this then presents the challenge of winnowing out the enduring good material from the obsolete bad material.  
The other obstacle is that some of this material is really more descriptive and not really suitable for enterprise guidance on how to use Java.  In these cases the conventions are simply telling the reader technically how Java works and not providing any guidance about one practice as better than another. This information doesn't make sense for a an enterprise programming standard intended to provide meaningful guidance.
What follows is an outline of the SUN conventions with my annotations about what in each section should, or should not, be included in modern enterprise guidelines on Java.  I'll attempt to point out cases where one should use, or discard, the SUN convention (i.e. almost no debate among Java professionals on this point) but also note where there is some open debate about what is the "best practice".  Where one of the 1997-1999 conventions is obsolete, I'll also try to point out the modern, replacement, practice and provide a reference if possible.  
I can detail these recommendations if needed - including providing code examples if that becomes necessary.
General Comments
  • Many of these conventions are a bit vague or arbitrary and need to be re-worded to make good enterprise guidance or something that can be enforced in a work order of some kind.  For example, the statement "...avoid the complexity of using more than three variables." in the section on the for statement is arbitrary. It is also hard to say if a developer avoided this but did it anyway because it was necessary or if they were just being sloppy.  One way to work with this kind of language is to re-word it to say something like "In general, for statements must be limited to three variables.  By exception, they may use more than three variables subject to peer review and approval."  A little nuance here is a valuable thing - it is a good use of the enterprise effort to discriminate between what requires peer review, what requires some kind of ERB/ECB review and what requires a project/program waiver.  This is a general point and the language in this document as a whole should be revised in this way for use in an enterprise guideline or standard. 
  • In general it should not be sufficient to simply "document" variations from the conventions discussed here.  That variance should be justified, reviewed (at the appropriate scale) and then documented.
2. File Names 
    2.1 File Suffixes 
  • As written, this provides a description (not guidance) on the minimum, and somewhat archaic, technical implementation of Java - source code in ".java" files and bytecode in ".class" files. 
  • Source code managed as individual ".java" files loses lots of information that is valuable to the enterprise. Using raw ".java" files to manage source code should be prohibited.
  • Enterprise guidance on this point should require that developers use the organization's standard development tools (ClearCase, Rational Application Developer, etc. ) for storing source code and preparing Java bytecode. 
  • At a minimum, the enterprise should require source code be managed as part of IDE "projects" that can be accessed using one of the free and publicly available IDE tools such as Eclipse or NetBeans.
  • Also, people rarely deliver Java bytecode as individual ".class" files anymore.   Using ".class" files as the organizing mechanism for Java bytecode should be discouraged or prohibited. Instead zip ".class" files together into ".jar", ".war" or ".ear" files as described in the SUN's J2EE Tutorial
  • This guidance should also direct those tools be used to prepare ".jar", ".war" and ".ear" files (not ".class" files) according to the the standard development roles for enterprise Java development.  
    2.2 Common File Names 
  • This comment about using GNUmake is out of date.  Ant and commercial tools with Ant support dominate Java build practices. SUN projects (e.g. PetStore and NetBeans) use Ant now.
  • Enterprise guidance on this should direct the use of approved enterprise Java development tool (i.e. Rational Application Developer - which includes Ant) for builds or accept the use stand-alone use of Ant as an alternative. "make" and others should be prohibited.
  • The suggestion to use a README file is too vague and somewhat at odds with enterprise development needs.  Guidance in this area should direct the use of the enterprise tools for requirements management (ReqPro), system analysis and design (Rose), change management (ClearCase), etc. and for information about the built software to be put in those information stores where they have more visibility and benefit to the enterprise. 
  • The use of a README should only be for specific information that can't fit into one of the enterprise systems development tools - if that can't be defined then the README should be prohibited.
3. File Organization 
  • The recommendation for files longer than 2000 lines is arbitrary and out of date.  This restriction is also inconsistent with the scale of enterprise Java design and programming. Modern enterprise tools for developing Java (e.g. Rational Application Developer) make older guidelines like this (e.g. favoring two files of 2000 lines over one file of 4000 lines) obsolete. This should be discarded. The standard enterprise tools manage the presentation of code to the programmer - files over 2000 lines are not hard to read as they were with simple text editors.
    3.1 Java Source Files 
  • The phrase "...each Java source file contains a single public class or interface..." is obsolete.  Java now supports inner classes.  There is some debate about when they are appropriate to use (and when not) which should be addressed in an enterprise programming guide but it remains this convention is obsolete.
  • While it is possible to put an interface class and an implementation class in the same file (as described here), most enterprises should (and do) prohibit this.  Enterprise guidance should mandate the opposite - interfaces and implementation classes in separate files - since this gives the enterprise a better level of coupling between the detailed design and the implementation.
        3.1.1 Beginning Comments 
  • Having a comment that includes the class name is redundant with Java requiring every class to have a name - see "public class Blah extends SomeClass" in the example (section 11.1).  Since having this as a comment is redundant, a needless effort, and may contribute confusion - this type of comment should be prohibited in enterprise standards.
  • Version information should be required but it should be stored in the enterprise change management tools (e.g. ClearCase) and should not be manually maintained in the source code.  This information should only be included in the source as it is revealed thru the IDE tools (e.g. Rational Developer) or can be expanded automatically in the source code (i.e. as CVS or SCCS source code tools do).
  • Date information is the same as version information.  Tracking dates of changes should be required but that information stored in the CM repository - storing it in the source code is redundant, waste of effort, and potential source of confusion.  Should only be included in source if the enterprise CM tool can insert it there automatically.
  • Enterprise guidance should include or cite specific copyright notices that are required.  These copyright notices should explicitly describe and limit the rights of the contracted developer.  They should also clearly assert the rights of the acquisition organization. Any software developed for the gov't should cite appropriate sections of the the Federal Aquisition Regulations here.
        3.1.2 Package and Import Statements 
  • The package statement should not only be listed first but should be required.  Code should not be allowed which is not assigned a package - i.e. that pollutes the global namespace. 
  • Convention on using import should make clear that only explicit, class-level, imports are allowed (e.g. "import packagename.ClassName") and that importing whole packages ("import packagename.*") is not allowed since it creates ambiguous and difficult to trace dependencies.
  • Enterprise standards should explicitly describe the package naming conventions to be used and follow the reverse URL convention (e.g. work done for the IRS whose website is at "www.irs.gov" would all fit within the  package "gov.irs".  This direction needs to be consistent with the direction on copyright markings - e.g. one reason for to be in the package "gov.irs.projectname" rather than the package "com.ibm.projectname" is because different rights apply. The enterprise standard should also include the general convention for assigning sub-packages to programs and projects - e.g. "gov.irs.program.project" or "com.ibm.program.project"
  • The convention of using two-letter country (e.g. "us.")codes per ISO 3116 et al should only be used if justified and clearly documented.  Otherwise prohibit this and stick to the normal convention of using. "com.","gov.","org.", etc. as high-level package names.
        3.1.3 Class and Interface Declarations 
  • The convention described here is out-of-date and would require needless manual effort when using modern, standard, enterprise development tools.  The guidance should be that all code within a package follow a consistent convention for including these elements but this should not be specifically mandated.
  • The  standard should not be  for all source code in a project to use the same format for these declarations - this is unrealistic, creates needless work, and unnecessary.  Since projects will (hopefully) re-use code from other projects from a variety of sources (internally developed, commercial developer, open-source, etc.), while code within a package should be formatted consistently, it is unlikely that code in separate packages will follow the same format.  Re-formatting these declarations would create build/re-build dependencies that are disruptive and it is a waste of effort since modern IDEs will represent the code logically to the programmer or analyst regardless of how the code is organized.
  • The specific convention of organizing public, then protected and then private is arbitrary and pointless. Modern Java tools will often automatically re-format these elements according to their own conventions - fighting this is a mis-use of effort. The rule should be to order these elements consistently within a package to make the code readable and maintainable but not to mandate a specific order.   
  • The specific convention of organizing methods by functionality rather than scope, accessibility, or alphabetically is also somewhat arbitrary.  Again, modern Java tools will often format the code according to their own convention and also allow presentation of the code in the IDE to be re-sorted on-demand.  Manually enforcing an arbitrary order is a waste of effort.  The guidance should be to require all classes in each package follow the same order and leave it at that.
4. Indentation 
  • Mandating spaces (instead of tabs) and the number of spaces (3, 4, 8, etc.) for indentation is a waste of effort.
    4.1 Line Length 
  • The convention on lines under 80 characters is archaic.  The standard should be that the source code can be printed on standard 8.5" by 11" sheets of paper without the printing process cropping any lines or forcing an arbitrary line wrap.  The standard should also be that the code can be displayed on the standard enterprise desktop screen (XGA resolution) without side-scrolling.  In both cases the standard should direct a minimum text size - either 10pts, 11pts or 12 pts - for legibility.
    4.2 Wrapping Lines 
  • The convention on breaking expressions that won't fit on a single line at appropriate points (comma, operator, etc.) is good.  An enterprise standard shouldn't need to include the code examples though - every reader should know what a comma is and even the most novice Java programmer knows what an operator is.
5 Comments 
  • The general guidance should be to minimize comments in code and favor instead to put this information into the enterprise system development tools for requirements management, systems analysis and design, change management, etc.  In code comments are fine for students or small-shop developers who aren't working on enterprise applications.  However, when developing enterprise applications, information has more visibility and impact if it is stored in the enterprise system development tools - putting it in code hides the information from non-technical stakeholders or creates opportunities for the business and technical visions to diverge.
  • Unless there is specific information defined for code comments (and not suitable for the enterprise system development tools), using in-code comments as describe here should be discouraged.
  • The convention here that should be most aggressively followed is the one that says, "It is too easy for redundant comments to get out of  date.  In general, avoid any comments that are likely to get out of date as the code evolves...The frequency of comments sometimes reflects poor [emphasis added] quality code.  When you feel compelled to add a comment, consider rewriting the code to make it clearer."
    5.1. Implementation Comment Formats 
  • This section is more descriptive than applicable as modern guidance on using Java.
  • The enterprise guidance should  be that these comments generally created automatically by the standard enterprise tools - e.g. Rose for systems analysis and design.  The guidance should generally direct the use of those  tools and prohibit fine-grained, manual, commenting outside of those enterprise tools.
        5.1.1 Block Comments 
        5.1.2 Single-Line Comments 
        5.1.3 End of Line Comments 
    5.2 Documentation Comments 
  • If (or when) the enterprise commits to uniform electronic publishing of Java docs, then standards for javadoc comments should be developed.  As long as this isn't the case, and as long as the enterprise favors other mechanisms and tools (Rational Rose, Rational Application Developer, etc.), then enforcing the use of javadoc comments is a waste of effort.  If a project wants to use them, they should use the features of their tools (e.g. Rose) that will generate them automatically - manual effort here is a waste of money.
6. Declarations 
    6.1 Number per Line 
  • The convention to put one declaration per line should be followed.
  • The convention to prohibit declarations of mixed type on the same line should be followed.
  • Conventions about the use of tabs or spaces for declarations are arbitrary, out of step with current tools, and should be discarded.
    6.2 Initialization 
  • The convention to initialize variables when they are declared should be followed.  This contributes significantly to the reliability, performance and maintainability of the code.
  • This sections says "The  only reason not to initialize a variable where it's declared is if the initial value depends upon some computation occurring first".  If this is the case, declaration of the variable may be pre-mature and the code may need to get revised to conform to the "initialize when declared" rule.  If that is not appropriate, then this should be clearly justified and documented - one of the enduring, legitimate, uses of in-code comments.
    6.3 Placement 
  • The convention to put declarations, and initializations, at the beginning of a block should be followed.  This contributes significantly to the logical correctness and maintainability of the code.
  • If a declaration cannot be put at the beginning of a block, this indicates poorly organized code and the code should be re-written.
  • If the code cannot be re-written to put declarations at the beginning of a block, this should be clearly justified and documented - an appropriate use of in-code comments.
  • The convention to avoid declarations with the same name (i.e. "hiding" or "masking") a declaration visible from a higher level block should be followed.
    6.4 Class and Interface Declarations 
  • The convention to enforce no spaces between a method name and the starting parentheses is arbitrary and should be abandoned.  The  convention should be for classes within a package to follow the same convention (be it space or no space).
  • The convention to enforce the opening brace on the same line as the declaration is arbitrary and should be abandoned.  The convention should be for classes within a package to follow the same usage - opening brace at the end of the line or on a new line.
7. Statements 
    7.1 Simple Statements 
  • The convention to put only a single statement on each line should be followed.
    7.2 Compound Statements 
  • The convention to indent enclosed statements "one level" should be followed.  Conventions about if this should use tabs or spaces (and how many spaces) are a waste of effort.  The convention should only be that this indentation be consistent (tabs, spaces, etc.) for classes within a package for legibility.
  • The convention to put the opening brace at the end of the line (rather than on a new line) is arbitrary - enforcing it would be a waste of effort and create disruptive build/re-build dependencies.  The convention should be only that placement of the opening brace be consistent (end of line or new line) for classes within a package.
    7.3 return Statements 
  • return statements should  return primitive types (int, float, etc.)
  • return statements should return objects
  • return statements should return the return values from method calls without method arguments - e.g. object.method() - since this is a common method for accessing object (or Java bean) properties.
  • return statements passing on return valued from method calls with arguments - e.g. object.method(type arg, type arg) - may indicate code that is poorly designed, not implementing adequate exception handling, or not doing adequate data checking. This format should be discouraged.  This code should generally be re-designed to conform to one of the prior rules. If the code cannot be re-designed, that should be documented - an appropriate use of in-code comments.  
  • The form shown in the example return (size ? size : defaultSize)should generally be prohibited unless it is the result of performance optimization - which should be documented with in-code comments. The use of the "?" operator should generally discouraged and replaced with statements which are easier for novice programmers to understand.
    7.4 if, if-else, etc. Statements 
  • This section is generally descriptive and not guidance.
  • The convention to always use braces for "if" constructs should be followed.  Omitting the brace creates code that is error-prone and more difficult to maintain.
    7.5 for Statements 
  • The empty for statement should generally be discouraged.  While it is technically permitted in Java, "doing all the work in the initialization, condition and update classes" can create code that is error-prone and difficult to understand. This format should only be used if justified, documented, reviewed, approved, etc. 
    7.6 while Statements 
  • The empty while statement is technically permitted in Java - what is being described here is a convention of the language but not necessarily a good convention - but should be discouraged.  It should only be used if justified.
    7.7 do-while Statements 
  • This describes the syntax of the do-while statement but is not a recommended practice for using this statement relative to the other logical alternatives.
  • Guidance should direct the use of the simple while over the do-while as this prevents a common source of logical errors and makes the code more maintainable.   do-while should be permitted if justified to peer or other engineering review.
    7.8 switch Statements  
  • The use of a break for each case should be required. Not  doing this indicates code that may be poorly designed, error-prone or difficult to maintain. 
  • The use of a default case should be required. The convention to include a break for the default case should be followed.
    7.9 try-catch Statements 
  • catch statements must not be empty but should actually do something with the exception they receive. At a minimum they must use the logging mechanism to log that an exception has occurred.
  • The use of the finally should be required. It may never get used but ensures the code is more robust in exception handling.
8. White Space 
  • Conventions for white space (so many spaces here, so many blank lines here) are arbitrary.  Enforcing them is a waste of time.  As long as the use  of white space is consistent within a class and within classes of a package, this is generally sufficient.
  • You can have a $2,000 (or free!) tool re-format your code for whitespace.  Having whitespace guidelines to be followed by $60-100k/year developers on this issue is a waste of money.
    8.1 Blank Lines 
    8.2 Blank Spaces 
9. Naming Conventions 
  • The convention for package names to begin with high-level domain names (in reverse) has already been discussed but it should be followed.
  • In this section, SUN says "Subsequent components of the package name vary according to an organization's own internal naming conventions [emphasis added]..."  Enterprise coding standards should be explicit on this - see my suggestions above on section 3.1.2.
  • The guidance on naming interfaces should be expanded.  The current common standard practice is for interfaces to conform to one of a number of naming patterns.  One is the "-able" pattern such as the Serializeable interface found in the Java API.   The other is for the interface name to incorporate a recognized pattern - e.g a Messenger.
  • Enterprise guidance on this topic should direct the use of interface names that conform to the conventions in the current Java API ("Serializeable", "Comparable", etc.) for any interfaces that have similar semantics.
  • Enterprise guidance on interfaces should also direct naming that uses names from defined patterns - Messenger, Proxy, etc. - for interfaces that follow those semantics.
  • The convention to prohibit variable names with underscore or "$" should be followed.
  • The convention to prohibit one-character variable names should be clarified - e.g. that this is allowed for simple iteration.
  • The convention for class constants to use all uppercase and the  "_" as described should be followed.
10. Programming Practices 
    10.1 Providing Access to Instance and Class Variables 
  • This convention should be clarified.  The convention should be that by data members are private by default and accessed thru methods. 
  • If a class is essentially a data structure (as discussed in this section) an appropriately permits access to the data members, the naming of that class should follow an enterprise convention for data classes (see section 9)  - e.g. TransactionData , AddressEntity, etc. 
    10.2 Referring to Class Variables and Methods 
  • The convention to avoid using the object to access a class method or data should be worded more strongly.  This should be generally prohibited rather than simply "avoid".
    10.3 Constants 
    10.4 Variable Assignments 
  • The convention to avoid assigning several variables to the same value in a single statement should be more strongly worded.   This should be "prohibit" rather than simply "avoid".
  • The convention on using assignment operators where it could be mistaken for an equality operator should be clarified.  This would be more clear to add how the assignment operator should not be used within the if, while, do-while, etc. statements.
  • The prohibition to not use embedded assignments should be followed.
    10.5 Miscellaneous Practices 
        10.5.1 Parentheses 
  • The convention to use parentheses to make the  logic explicit rather than rely on operator precedence should be required.
        10.5.2 Returning Values 
  • The first example violates modern, standard, use of Java - to only have one return statement.
  • The second example violates the "only one return" convention.  It also violates the modern, standard, convention to avoid the ternary operator "?" and favor using statements which are easier to understand e.g. if-then-else
        10.5.3 Expressions before the '?' in the Conditional Operator 
  • This convention is redundant with the general convention of always using parentheses to make the logic of a statement more clear and avoid relying on operator precedence.
  • This convention is unnecessary since the general practice should be to avoid the ternary operator in the first place.
        10.5.4 Special Comments  
  • Using comments as a defect tracking mechanism is fine for students and small-scale developers but inappropriate for modern enterprise developers. 
  • Since this standard was written, the use of JUnit for unit testing has become nearly ubiquitous - such a comment is unnecessary since that a class, method, etc. needs to be fixed will be indicated by the failed unit test.
  • In an enterprise, the defects are better tracked at the class or component level in the test, change management and development systems.
- Brian
 

No comments: