During the summer of 1996 a discussion about the use of global variables appeared in comp.lang.apl. I meant to post the following article but never got around to it.
Raul Miller wrote on July 11, 1996:
Semi-globals are different. Semi-globals are typically used in APL to work around the limitation of passing only two arrays to a function, or to work around the limitation that name spaces aren't first class objects of the language.
In first-generation APLs this was certainly the case, but nested arrays make it very easy and efficient to pass any number of arrays into or out of a function. Also, I don't think there's much practical difference between true globals (localized nowhere) and semi-globals (global to some functions, but localized in some higher-level function). Although a semi-global ceases to be global at some level, all the problems of globals can be encountered with semi-globals as well. In the following discussion, I use the term "global" to mean any object not localized by the function that uses it.
Sometimes the number of arguments to a function is inconveniently large. Making some of the arguments be global variables allows you to focus attention on the more significant arguments, which can be passed as formal (left and right) arguments. For example, APL's equal function has three arguments: the two usual operands, plus the comparison tolerance. Passing the latter via the global #CT allows you to focus attention on the operands in places where = is used. This technique is also handy for arguments that don't usually change with every call. An example is "symbolic constants", such as variables that specify file names and paths.
Globals are also useful for pass-through communication. A function that uses a subroutine shouldn't have to know about every feature of the subroutine. Some features might logically be the responsibility of a higher-level routine, and only the higher-level routine should know about the feature. Globals variables are a way of implementing direct long-distance communication between routines that have one or more functions between them on the calling stack. #PP is an example here, with the user typically being the ultimate higher-level routine. Functions that use monadic format (explicitly, or implicitly by displaying numeric values) shouldn't generally have to know about the printing precision; they shouldn't have to limit themselves by specifying a fixed value that can be altered only by changing constants within the program. The user can communicate the desired precision to {format} by setting a global variable.
Sometimes a global variable can be thought of as part of a program. For example, some tasks can be implemented more efficiently by precomputing a table and using the table to avoid needless recomputation each time the program is called. Such globals are not really much different from a global subroutine used by a program. An example of this is the SETS3 function (c.l.a, 16 Apr 96), which used a precomputed matrix of combinations. A somewhat different example is my collection of assembler routines (FastFns), which have the machine code stored in global variables. This was originally done on the APL*PLUS/PC system because it is much faster than imbedding the long numeric vector in the function, but it has other advantages as well: It's easier to update the object code, and on the APL*PLUS II/III systems, the correct signature in the first element can be installed once, when the object is loaded into the workspace, instead of having to be done each time the FastFn is called.
Still another use for globals is for private, static data used by a function. ("Static" meaning the previous value will be needed on the next call to the function.) An example is #RL. It would be an nuisance to have to specify #RL as both an argument and a result to every roll or deal operation. Although it might be nice if such data could be imbedded within a namespace for the module, it's not hard to avoid the problems that a namespace would solve by means of some simple naming conventions. And working in a flat namespace environment is more convenient than having to constantly switch from one namespace to another.
Using globals does require a certain amount of discipline to avoid problems:
You should document their structure and contents. A logical place to put this is in the function that defines the variable. However, some globals are not defined by any program--they're set by the programmer. In such cases, a "varname{delta}DOC" variable is a reasonable place for documentation. (These DOC variables are also important for files, another type of global object.) Documentation for trivial variables, which don't really need their own separate DOC variable, can be collected in a MISC{delta}DOC variable.
You should generally document globals where they are used. The "external" documentation for a function (the comments before the first line of code) should list most globals used by the function and describe what they contain. One exception is system-wide globals that are used everywhere and are thought of more as symbolic constants than as arguments. Another exception is globals that are used for long-distance communcation with other modules and are not used directly by the function in question.
Should you repeat the documentation describing the structure of a global in the functions that use it? This is an open question. Doing so means that any change in the structure will require documentation updates in more places, and it increases the likelihood that the documentation may be out-of-date. However, doing so also makes it much easier to understand what the program is doing with the variable.
When a program sets a variable that will be used globally by a subroutine, it is helpful to indicate this useage in a comment. Otherwise, a programmer looking at the code will see a variable being set with no apparent usage, and he may end up wondering whether this is a dead variable left over from some retired section of code. Or worse, if the variable is used both within the setting function and in a subroutine, the programmer might incorrectly assume it is used only locally and that he can safely change its structure without worrying about other references to it.
Names are important for globals. Long names (never single-letter names) should be used, with the length of the name being loosely related to how far the variable reaches up or down the calling stack before it's localized or referenced. (For example, a semi-global that's used strictly to communicate between one program and a subroutine needn't have that long a name.) Two techniques can help avoid name collisions: Don't use words for global names (e.g., FLAG), and prefix the name with the first few letters of the "owner" function's name. For example, a program named CAPSCREEN might use a global named CAPSCR{delta}CTABS. This prefixing convention essentially carves out a relatively private namespace for globals used by a particular program, and it also helps to link them in alphabetical listings. (By the way, these naming conventions apply to functions as well as variables.)
Even with these conventions, there are still some disadvantages to using globals. If a function has a global argument that needs to be changed from one call to the next, you can't easily build an expression such as (FOO X)+(FOO Y) that involves two calls to the function. An explicit argument would be more convenient in this case. However, many subroutines are never used in compound expressions, so this is not a real problem for them. Global results also make it difficult to build expressions, but returning multiple explicit results does not make it much easier to construct expressions. If you want to be able to use a function freely in expressions, it should return a single result.
As for compilation, I think declarations should inform the compiler about the type/rank of upward globals and should be used to indicate any downward globals that are referenced or set by subroutines. (It would also be useful to be able to specify exactly which variables are used and set by each subroutine, so the compiled code doesn't have to materialize them all for every call and check them on every return.) I think it's a mistake to allow the semantics of a language be driven too much by the needs of a compiler (e.g., altering scope rules or banning globals). APL's success stems in part from implementing what makes sense, rather than what's easy to do.
Using globals presents the programmer with some extra challenges in the goal of writing clear programs, but they are an indispensible tool in writing flexible, efficient, and maintainable applications.
Home Page