The User Command Processor

by Jim Weigang


This article first appeared in the APL91 Conference Proceedings, APL Quote Quad, Vol. 21, No. 4 (August 1991). Copyright © 1991 ACM. Reproduced by permission of the Association for Computing Machinery.


Abstract

The User Command Processor is a new feature of several APL*PLUS systems which allows users to define commands, analogous to system commands, that can be executed from within any workspace. This enhancement is implemented by means of two simple changes to the APL interpreter. Coupled with a suite of two-dozen predefined commands, the result is a file-based program storage and execution environment that integrates many important features not provided in standard APL systems. Using the command processor, applications of unlimited size can be developed, run, and maintained without many of the headaches that are characteristic of workspace-based systems. This paper describes the basic methods whereby the command processor operates, provides an overview of the predefined commands, shows how a new command can be defined, and illustrates how a large application can be built using the command processor.


Introduction

The command processor is based on two changes to the APL interpreter. The first is a new system function called #UCMD, which is used to execute user commands. #UCMD is a very simple function; all it does is tie a file (named UCMDS), read component 3 of the file, define the component as a function (known as the "boot function"), and execute the function. The argument to #UCMD, which contains the text of the command, is passed to the boot function as its argument. The boot function, which is written in APL, does the rest of the work required to execute the command. The #UCMD function makes it possible for a user-defined program to be called from within any workspace, an enhancement that opens a world of possibilities.

The second change is to make the interpreter recognize not only system commands, which start with a right parenthesis, but also user commands, which start with a right bracket. User commands are translated to a proper APL expression and executed. For example, the command ]LIST X becomes #UCMD'LIST X'. The right-bracket notation is a convenient way of entering commands without having to quote the argument, double quotes within the argument, etc. As is the case with system commands, right-bracket commands are accepted only in immediate execution mode; the right-bracket form cannot be used within a function. Instead, #UCMD is used when a command has to be executed under program control.

The changes to the interpreter make it more convenient to execute user commands, but they are not absolutely necessary. For example, in version 9 of the APL*PLUS /PC System, the command processor is called through a user-defined {delta}UCMD function, and right-bracket commands are entered using a function key that executes {delta}UCMD {quotequad}. Building #UCMD and ]-command recognition into the interpreter just makes it possible to execute commands without having to reserve and use a function key.

Also, it is worth noting that the #UCMD/] enhancements are extremely general in nature. Virtually the entire character of the command processor as described here is determined by the boot function found in component 3 of the UCMDS file. By changing the boot function, a user can create an entirely different style of command processor. #UCMD provides a general-purpose, easily-customized gateway to functions stored in files.


Predefined Commands

The UCMDS file contains not only the boot function but also many subroutines and predefined commands. The predefined commands provide the APL user with a number of useful programmer utilities and a complete set of commands to manage functions and variables stored in command files, including:

]USAVE    saves objects in a file
]ULOAD    loads objects from a file into the workspace
]UNAMES   lists the names of functions and variables in a file
]UERASE   erases objects in a file
]UCOPY    copies objects from one file to another
]UNSAVED  compares the workspace and a file, and lists what's different
]UCOMP    compares two files, lists detailed differences
]UFILE    selects the current file or creates a new file

Command files are used to store new commands that you write and also can be used to store application code. Some of the more useful programmer utilities include:

]LIST     lists programs in the workspace, aligning comments
]GLOCALS  shows global variables used in a program
]UGROUP   finds all subroutines and variables needed to run a program
]WSLOC    finds all occurences of a phrase in functions
]FNS      shows the names of functions in this or another workspace
]LIB      shows library information with dates and sizes
]STORAGE  shows disk free space
]SUMMARY  gives abstract for each function in the workspace

The command processor includes a command that displays documentation for commands. Users can display the names of all available commands, list an abstract for each command (as shown above), and get detailed documentation about commands.

When you execute a user command, the command processor searches a list of files for the command. The ]UFILE command is used to display or modify the file search list. When it is run without arguments, it displays the current search list:

      ]UFILE
[1]  5 MYWORK
[2]  2 UCMDS

Given the list above, the command processor will first search file MYWORK in library 5 for a command, and if the command isn't found there, it will search 2 UCMDS. This behavior allows you to store your own programs in a separate file from the UCMDS file and combine the commands in several different files by putting them all on the search list. The first file on the search list is called the "current file." This file is used by the ]ULOAD, ]USAVE and other commands unless another file is specified using the /F= option.


Syntax of Commands

The predefined commands establish a general syntax for commands that provides great flexibility and allows for easy extension. The general form of a command is:

     ]LIST FN1 FN2 V1 /PRINT /F=MYPROGS
      |--' |--------' |---------------'
      Name Arguments  Options

The command name follows the right bracket and is followed by arguments. Each command may have its own syntax for arguments, but usually the argument is a list of names separated by spaces. Following the arguments are optional parameters which are identified by name. An option may be a simple flag such as /PRINT, which causes the output to be sent to the printer, or the option may have a value provided, as in /F=MYPROGS which causes programs in file MYPROGS to be listed. Options may be included or omitted to vary the behavior of a command.

Options are especially useful when it comes time to add a new feature to a command. The feature can be activated by using a new option, and the value of the option can be used to pass parameters to the new code. Because there is no limit to the number of options a command can have, you can keep adding new features to a command indefinitely. Options minimize the impact of enhancements on established users of a program. When users are given an enhanced version of a command, they won't have to change their existing programs or behavior to use the new version. They can learn about the new options only when they need to use them.


Defining A New Command

The most important feature of the command processor is that it allows you to write commands of your own. This is as simple as creating an appropriate function and saving it in a command file. The command immediately becomes accessible from within any workspace. You can define commands to initialize your printer, set function keys, edit a to-do list, activate a debugger, transfer data to a remote computer, display the sizes of variables in the workspace, plot data, or anything you'd like to do without leaving the workspace you're in.

Suppose you have a program called SORTLABELS that renumbers the line labels in a program. When it comes time to use this program, you probably will be in an application workspace that doesn't contain a copy of SORTLABELS. In this situation, running SORTLABELS typically involves a minimum of the following steps:

  1. Copy the SORTLABELS function, along with its subroutines and global variables, into the active workspace. You'll have to remember what library and workspace SORTLABELS is stored in, and you may even have to remember what subroutines are required. You must be sure to use )PCOPY to avoid erasing anything already defined in the workspace.

  2. Run SORTLABELS.

  3. Erase SORTLABELS along with its subroutines and global variables, being careful not to erase those subroutines that were already defined in the workspace before step 1.

You can make your program a great deal easier to use by defining a ]SORTLABELS command, which will allow you to sort labels from within any workspace using a single command such as:

  ]SORTLABELS MYFN1 MYFN2

Here's what's involved in defining a new command:

A user command is a monadic function whose name begins with "CMD". The text following CMD is the command name. (The CMD prefix allows the command processor to distinguish between top-level commands and subroutines in a file.) When the command is executed, the argument to the CMD function will be a character vector containing the text of the arguments and options to the command. Rather than change the name and syntax of your program, it is preferable to create a CMD cover function that calls your program. This function can parse the command line, extract options, and set up variables and arguments as required by your program. It will serve as the interface between the command form and the standard APL form of your program. For our example command, the cover function is very simple:

    {del}CMDSORTLABELS A
[1]  @Sorts the labels in functions Š
[2]  SORTLABELS A
    {del}

Next, you need to define a "group" for the command. The group is simply a character matrix listing the subroutines and global variables that are needed to run the command. The name of the group variable is "GRP" followed by the name of the command function (in this case, GRPCMDSORTLABELS). You can either define the group manually, or you can use the ]UGROUP command to create the group automatically, as in:

  ]UGROUP CMDSORTLABELS /WS

(The /WS option tells the command to work with functions in the workspace instead of the current file.) When the group is defined, you can save the command and subroutines in the current command file by using the following command:

  ]USAVE .GRPCMDSORTLABELS

(The period in front of the group name tells the command to save both the group variable and all objects named in its value.) Once the group is saved, the ]SORTLABELS command is available for use.

At this point, running ]SORTLABELS requires that the command processor perform a separate file read operation to load each object in the group. (This is because each object is stored in a separate component of the command file.) For a large command having many subroutines, this can be rather time consuming. The loading can be speeded up by creating a "package" for the command. A package is a variable that contains all the objects needed to run the command. The package allows the command to be loaded using just a single file operation.

Packages can be created easily using the ]UMAKE command, as in:

  ]UMAKE CMDSORTLABELS /NOCOMMENTS

The package is saved in the file as a variable having the name PKGCMDSORTLABELS. If such a variable is defined in the file, the command processor will load the package when executing the command instead of loading the objects individually. The /NOCOMMENTS option causes comments to be removed from the functions in the package, minimizing the size of the programs that are run. (Stripping the comments in the package does not affect the original copies of the programs, which are stored separately.)

Because the command processor remembers the date and time each object was saved, it is able to determine automatically which packages contain obsolete versions of programs. The ]UMAKE command has an option to update such out-of-date packages, relieving the user of what would otherwise be an onerous task.

It is worth noting that some programmers "solve" the problems of programs with numerous subroutines by writing huge, monolithic programs, or by defining the subroutines locally within the function. In the first case, the programmer gives up some of the most important features of a programming language--the ability to divide a problem into manageable subproblems, and the ability to call a subroutine wherever it is needed in the code. Using a subroutine to solve a subproblem means that if you find a bug in your solution, you only need to change one program. If you don't use subroutines or if you imbed the subroutines in the programs, you'll have to find and change all places where the faulty solution was used. Packages provide a much cleaner solution to these problems, and also solve other problems such as how to maintain compact versions of programs without losing the commented source code.


Localization

The command processor uses dynamic localization to avoid name conflicts and ensure that commands are erased after execution. When the package for a command is loaded into the workspace, the functions and variables are defined within a subroutine that localizes the names of all objects in the package. This prevents the command programs from conflicting with programs in the workspace. You can have a program named OVER in the workspace, and a command can use a program named OVER, and no conflict will occur.

Localizing the command also ensures that the command will be erased after execution is complete. Once the localization subroutine is popped off the state indicator, whether by normal program completion or by error, the command is guaranteed to be gone.


Robust File Structure

Each command file has a tree structure that allows a set of changes (called a "transaction") to be made to the file instantaneously, with the changes either being made completely or, in the event of an error or interrupt, not at all. Transactions prevent a file from being only partially updated and containing inconsistent information. (Such as two components that are supposed to have the same shape but don't because after one component was updated, an error or interrupt prevented the second component from being updated.)

Transactions are implemented by having a certain component (the "root component") point to the directory components in the file, which in turn point to the data components. Each component of the file is either "in-use" (part of the current tree structure) or "free" (not currently pointed at). During a transaction, data is written only to free components, and at the end of the transaction the root component is made to point to the new directories, which point to the new data. Thus, until the root component is replaced, the file is effectively unchanged (because only free components have been changed) and an interrupt will not leave the file in an inconsistent partially- updated state.

The command processor also addresses a problem that afflicts component files on the APL*PLUS /PC and APL*PLUS II /386 systems. Unusable space is created in a file when a large array is written to a small component. (This is called the "exploding #FREPLACE problem.") The transaction subroutines circumvent this problem by recording the size of each component and not writing an array to a component that is too small to contain it. When an array is written to the file, the subroutines pick the free component that holds it with the least wasted space, or, if no such component exists, they append the array as a new component. This technique of tracking component sizes tends to eliminate the nonstop growth that afflicts other heavily-used APL files. With use, a command file tends to grow a certain percentage larger than its minimum compacted size, and then hover at that size, growing only slowly with additional use. As a consequence, the annoying and time-consuming #FDUP operation that must be applied regularly to other APL files can be avoided with command files.


Large Applications

Development of an APL application typically starts with functions saved in a workspace. As the application grows, the programs eventually may outgrow the capacity of the workspace. When this happens, several different courses of action are possible.

One popular solution is to split the application into two or more workspaces. Either the user will be expected to load the workspaces manually, or the application will automatically hop from workspace to workspace as required. Neither of these solutions is very attractive. In the first case the user is inconvenienced, and even if the workspace switching is automatic, maintaining the application becomes a much more difficult task. The workspaces probably will have some functions in common, and making sure that each workspace has the latest version of every function is an unpleasant chore. Failure to propagate changes to all workspaces can make bugs that were fixed once come back later if an obsolete version of a program is modified and becomes the most recent version.

Another solution is to store functions in files and read them into the workspace when needed. Often, this is done using a system written specifically for the application. This can add substantially to the cost of developing the application, and because the functions-on-file system is unique to the application, the application will be harder to maintain and less likely to work neatly with other utilities.

The user command processor offers a practical and ready-to-use system for efficiently managing functions stored in files. In a command processor- based application, the programs are stored in a single command file instead of being stored in multiple workspaces. The ]UGROUP and ]UMAKE commands are used to prepare a package for each command of the application. The packages are saved in a separate file from the source code, and only the package file is delivered to the customers. The application workspace contains just the kernel functions necessary to execute commands stored on file. When the workspace is loaded, the startup program called by #LX reads the directory components of the package file. This allows the application to find where a package is stored in the file without using any additional file operations. Within the application, an EVAL function is used to execute commands stored on file. EVAL uses the directory information to find the appropriate package, reads and defines the package, runs the command, and finally erases the package from the workspace. Although this sounds complicated, it can be executed very quickly. Large commands consisting of many subroutines can be loaded in a fraction of a second. The author has used the command processor to construct two large applications (one involving 10,000 lines of code) with very satisfactory results.


Development History

The command processor is the culmination of the author's struggle with numerous applications that outgrew the bounds of the workspace. A separate functions-on-file system was developed for each application, with no compatibility between the file formats. The command processor is like a refined version of these applications with the application- specific commands removed, leaving only the general-purpose framework upon which other applications can be developed. The present system is derived from the APLIPS image processing system, written in 1983 for Melvin F. Janowitz, and the ADAPS data analysis system, written in 1984 for Mike Sutherland. The idea of right-bracket commands that can be executed in any workspace originated with Clark Wiedmann, who used such a system to execute utility operations while developing the APL compiler for STSC. Clark also spurred the development of the first version of the command processor for the mainframe APL*PLUS system in 1986. The current version of the command processor benefitted considerably from being used as a platform for the PM profitability model written for Citicorp Diners Club in 1989. Many of the concepts incorporated in the command processor are borrowed from operating systems the author has used, such as MS-DOS, VM, and Unix.


Summary

The command processor is a breakthrough in providing APL users with convenient access to programs. It saves the user from having to remember where programs are stored, and it automates the process of loading programs and cleaning up after running them. It encourages the documentation of programs by eliminating space taken up by comments when programs are run. By providing a flexible and extensible syntax for commands as well as on-line help, it makes programs easier to learn and use. The predefined commands provide many programmer utilities not built into the interpreter. Large applications can be constructed using the command processor as a framework. From the casual programmer to the serious application developer, the command processor provides something of value to virtually every APL programmer.


References

[1] ADAPS User's Guide, Adaptive Data Systems, 129 Ludlow Rd., Chicopee, MA, (1988).

[2] APL*PLUS /PC User Manual, chap. 15, STSC Inc., Rockville, MD, (1990).

[3] APL*PLUS II /386 Utilities Manual, chap. 5, STSC Inc., Rockville, MD, (1990).

[4] J. Weigang, PM Workspace Reference Manual, Citicorp Diners Club, Englewood, CO, (1990).



Home Page