Comments on the ALP Architecture Design Document Draft of April 30, 1999

Frank Manola

Object Services and Consulting, Inc. 
fmanola@objs.com


9 May 1999


Overall comments:

Overall, this document contains much useful material, and seems to be a good start.  However, there is also a lot of work to be done.  There is much material that is confusing, and could be made much clearer.  The organization needs improvement too.  There is a lot of repetitious material, sometimes using different terminology (which can get confusing).  Sometimes there is no apparent "thread" or story line.  More of a top-down organization would help.  Throughout, there needs to be better cross-referencing, e.g., from places where overviews are provided to later sections where further details are provided.  I think a great deal of improvement could be made by having someone with a "higher-level" perspective go through what is here and appropriately tweak the various detailed sections, and add additional explanatory material.  It would also be helpful to have some material describing the purpose of this document, its intended audience, and its intended relationship to other relevant documents, e.g., the Plugin Developers' Guide,  and any other related material.   (Note that, without any material describing the intended purpose of the document, I've had to use my own interpretation of what should be in an "Architecture Design Document", which need not be the same as that of the authors, so my comments should be interpreted in that light).
 

Detailed comments:

Section 1.1.1:  In the second paragraph, the discussion and actions described are characterized as applying to "crisis or non-routine change".  Is this intended to suggest that ordinary operations of a business don't require the socialization, multi-system integration, information flow up and down the hierarchy, etc. discussed here?  Is ALP only intended to apply to non-routine or crisis situations?  It seems to me that ALP's motivation could be described without appearing to partition the world into "routine" and "crisis" operations, since ALP's capabilities could apply to all of them.  For example, many business operations that appear "routine" involve many if not all of the steps described here, although often to smaller extents (I could certainly describe the task of providing a new customer with telephone service, and fitting that task into the ongoing activities of maintaining telephone service for current customers, in terms similar to these).

It would also be helpful if the material could indicate whether the actions described ("socialization", etc.) are (a) perfectly reasonable under the circumstances, and ALP is going to perform them or help support them better, or (b) necessary due to current systems and technology, and ALP is going to replace those activities.  (In other words, why does the document describe these activities?)  The description also contains an unnecessary amount of what might be considered "jargon" which doesn't seem strictly necessary;  e.g. "synchronous enterprise", "socialization begins among the chief players", "stand up operational, tactical plans".

The fifth paragraph ("Solutions to this problem...") and beyond do not, strictly speaking, discuss the general business problem ALP is trying to solve, but rather (a) explain why prior attempts haven't been successful and (b) what the characteristics of an "elegant solution" would be.  These are good things to know, but it would be helpful (a) to separate these things from the discussion of the business problem itself, and (b) to tie them more explicitly to ALP.  For example, describe how ALP avoids the problems encountered by previous attempts to build systems to solve these problems.  Describe how ALP exhibits the characteristics of the elegant solution described here (or at least assert explicitly that it does).

Section 1.1.2:  This section is rather skimpy in terms of explaining the overall model of how an ALP cluster works.  For example, the model described in the second paragraph is asserted as if it is an accepted general model, but is it a known cognitive model of problem solving that has been well-studied (where are the references?), or is this just an ad hoc model?  A more thorough explanation of this model would lay the groundwork for a much better understanding of the rest of the document.  The final paragraph of the section asserts that the concept of relationships is an important notion in ALP's cognitive model, but it goes on to simply assert that there are two types of relationships.  It seems as if a more thorough discussion would be appropriate (at least a forward reference to Section 2.1.4, where these relationships are discussed in the specific context of the system).

Section 1.1.3:  Where's the content of this section?

Section 1.2.1:  The fourth paragraph mentions that the components that make up a society are constantly changing.  Where is this ability to change (and ALP's support for it) discussed in the rest of the document?   The text says that "Figure 1-1 illustrates the components which are utilized to construct a society using ALP", but there is no mapping provided from the graphics in the figure to the concepts described in the text.  For example, which things are clusters?  which are plugins?  What sorts of relationships do the lines between the ovals represent?  What's the difference between a solid line and a dashed line?

Section 1.2.2:  The "Plan" in Figure 1-2 should be labelled "LogPlan" to correspond more directly to the text.  Also, how about explicitly relating the different types of plugins to the cognitive model described in Section 1.1.2?

Is Figure 1-3 supposed to contain numbers corresponding to the nine steps listed in the text?  If so, they don't appear (note:   I looked at the figures in both Office 97 on Windows 95 and Office 98 on a Mac;  in this figure, some of the boxes don't appear when viewed on the Mac;  I will note discrepancies with other figures when they come up).

Regarding the nine steps of Cluster activity:

step 1:  I think it would be clearer to say "The incoming task "wakes up" an Expander Plugin that matches the task".

step 6:  this says "The Allocator informs the LogPlan of the allocations now associated with each task."  Is this the same as "The Allocator inserts the allocations now associated with each task into the LogPlan"?

step 8:  "The Assessor Plugin is informed..."  How is it informed?  Isn't this the same mechanism that is described as "waking up" in earlier steps?  If so, why the terminology change?

step 9:  "The Assessor sends a notification of task resolution..."  Is simply allocating resources to a task considered "resolution"?  After all, the task hasn't necessarily been performed yet.

Section 1.2.2.2:  At the end of the section, is a cluster associated with one JVM?  What is the relationship between hardware platforms and nodes?

Section 1.2.3:  In the second paragraph, "Upon receipt of a directive that concerns a specific PlugIn..."  How does the LogPlan determine what directives concern what PlugIns?  Also, the text mentions "the Plan API", a detail (of how the Plugins communicate with the LogPlan) that is unnecessary at this point.

What are the architectural implications of having all this communication via the LogPlan?  Is there a claim that this generalizes to handle all business problems (as seems to be asserted earlier) whether they involve logistics or not?  (or is it claimed that all business planning to handle crises really boils down to "logistics" in the end?)

Allocators are described as "recieving" [sic] "task messages".  What are "task messages", as distinct from the other types of communication described here?

Aggregators are described as combining "many tasks into one or more tasks at a higher level of abstraction."  This should be described differently (the task "move 15 tanks"  created by aggregating "move 10 tanks" and "move 5 tanks" isn't a task at a higher level of abstraction;  it's at the same level of abstraction, with an aggregated quantity).

Section 1.2.4:  The beginning of this section illustrates a problem which sometimes occurs elsewhere as well:  it reads as if it were written by a different person, and so uses different terminology and concepts from the rest of the document.  For example, the "physical" concepts we've seen already are things like "nodes" and perhaps "hardware platforms", and the "logical" concepts are clusters and plugins.  How do the concepts of "installation" and "server", used in Figure 1-5, fit into that picture?  What is a "COGAAR Component" (i.e., what ALP concepts do the ovals in Figure 1-5 represent)?  Is an "installation" (in the figure) the same as a "workstation" (used in the text)?

Section 1.3.1:  In the description of the various ADTs (which generally could be much improved):

MessageTransport:  "defines the content of the logistics communication"?  The term "MessageTransport" sounds like it would be a transport mechanism (i.e., some form of message infrastructure), but this description sounds like it is defining (or also defining) the actual formats of the messages as well.  Is that true?

Distributor:  The second sentence seems to say that an asset is a type of task.  Is that true?  "The Distributor ADT is a mechanism to update a plugin..."  Does it update a plugin, or notify it?  What is the relationship of a Distributor to a cluster?  One-to-one?

Subscription:  Since the second sentence here is the same as the second sentence under Distributor, it seems as if the Distributor should be explicitly mentioned here somewhere (and conversely Subscription should be explicitly mentioned under Distributor).  E.g., an explicit statement something like (if this is what really happens):  a plugin creates a subscription and registers it with the Distributor.  The subscription contains a description of information that the plugin is interested in.  When new data is added to the LogPlan, the Distributor matches that data with the registered subscriptions, and, for any matches, notifies the associated plugins.

"A Subscription is part of a Plugin"  Is it "part of" a Plugin, or "created by" a Plugin, or "associated with" a Plugin?

"A Subscription registers for a task"  Is an allocation a task?

Notification:  This says that a notification is a type of directive sent between clusters.  But the descriptions of Distributor and Subscription talk about plugins being notified with certain things happen.  Is a notification in the sense meant here used for notifying plugins?  If not, perhaps some different term could be used for one or the other to avoid confusion.

Workflow:  "A Workflow references an Expansion."  "A Workflow references an Aggregation."  Are these both always the case?  Or is it always one or the other?

Logistics plan:  "The Logistics plan receives and sends task information to a Plugin through the distributor using subscriptions."  It would be helpful to have a more complete description (somewhere) of the interaction between the LogPlan, the Distributor, and Plugins.  I would particularly note that while diagrams of clusters usually explicitly show the LogPlan and Plugins, they rarely (if ever) mention the existence of a Distributor.  What's the reason for this?  If they were shown, where would they go?  For example, there is no distributor in Figure 1-7, which is supposed to describe the intra cluster data flow.  If it's important to describe the distributor as a "primary object" in Section 1.3.1, why isn't it used in describing cluster operations in Section 1.3.2?

Logic Provider:  It's not clear what this does (and on reading further in the document it's still not awfully clear).  Also, like distributors, logic providers don't seem to be shown in cluster diagrams.  Why?

Section 1.3.2.1:  Second sentence:  "Using the numbers within the diagram to follow the flow:"  There are no numbers within the diagram.  [More specifically, there are no numbers in the Figure either when displayed using Office 97 under Win95 or Office 98 on the Mac.  On the Mac, some of the boxes don't display as well.]  Also, what is the "Registry"?  [More precisely, since the Registry isn't discussed until Section 2.1.2, either include a forward reference, or don't mention it here.]

Figure 1-7 and its associated steps seems to overlap Figure 1-3 and its associated steps to a considerable extent.  Having these multiple descriptions both introduces considerable redundancy and, due to the somewhat different wordings and diagrams, the potential for inconsistencies to creep in (if they aren't there already).

step 3:  "Prior to committing the Task to the LogPlan, the Cluster checks to see which Plugins have a Subscription for that Task."  What part of the Cluster checks?  Why does it do this?  Note that the Distributor isn't mentioned.  Note also that the next step has the LogPlan, not the Cluster, sending the task to a subscribing Plugin.

step 4:  What if no plugin has subscribed to the new task?

step 9:  What is the "Cluster infrastructure"?

Section 1.3.2.2:  Figure 1-8 seems rather "abstract".  More labels might help (e.g., what do those arrows under "Inter-Node LDM Object Bus" represent?  Why not just label the ovals "Cluster")   [Note:  on the Mac, some of the boxes in this figure don't display properly].

Section 2:  It would help to have some material introducing this section, and explaining what its specific purpose is.  The subsections seem to differ greatly in their levels of abstraction.  In particular, in many sections there is material that seems awfully low-level for a discussion of "architecture design topics".  This material could be usefully removed and replaced with material that more thoroughly discusses architecture and design issues.

Section 2.1.1:  This section is the first one illustrating the comment above about low-level material.  The command line syntax for starting a node seems too low a level of detail for an architecture design document.  More appropriate would be seem to be information on the contents of the configuration files and what alterations in node operation they can control (e.g., include this material rather than asking the reader to "view a node.bat file to understand further tha possible settings for a node instantiation").

Section 2.1.2:  I can't understand this section.  It sounds like the Registry is some kind of name server, but the description is too unclear for it to be helpful.

Section 2.1.3:  Why is Cluster Object Factory described here, and not either in Section 2.2, or in Section 2.2.5?

Section 2.1.4:  The discussion of messaging in this section is very helpful and "architecture-relevant".  However, it's not clear why it appears in Section 2.1, which is supposedly about nodes, and not in Section 2.2.

Section 2.1.4.1:  "On startup, the cluster contacts its superior and provides an enumeration of all its capabilities.  Basically, it transfers a copy of itself as an AssetAssignment message."  Does it really transfer a literal copy of the complete cluster?  Isn't there some way provided to describe the capabilities of a cluster without actually copying the code (and if "a copy of itself" means something else, what does it mean)?

"While an organization can have only one superior for a given time period, superior-subordinate relationships are time phased, as are other relationships."  What mechanism is provided for changing these relationships, and where is it described?

What are the ALPINE OrgDataPlugin, OrgReportPlugin, and GLSAllocatorPlugIn?  Do they do anything other than what is described here?  Is there one of these in every cluster?  (Note that this is the first mention of these components).

Section 2.2:  In general, much of the material in this section seems to be organized and written in terms of programming artifacts (descriptions of various base classes, algorithms, etc.) rather than "architectural design topics".  Much of this is appropriate if the intended audience is primarily developers, but at the same time it seems that more of a "big picture" of some of these topics could be presented before getting into the details.

Section 2.2.1:  Most of the material in this section seems too low level for an architecture description (e.g., "The initialization file must be the same name as that set as the node.bat parameter name").  What is the significance of the various states in the GenericStateModel?

Section 2.2.2:  This seems like useful material, but it needs to be explained much better.  Since Logic Providers are characterized as "infrastructure plugins", it seems as if it would be useful to have an overview of the structure of the infrastructure, and how these components fit into it.

Section 2.2.3:  What is "hysteresis" as used here?

Section 2.2.4:  This section is very hard to understand.  The text should explicitly reference Figures 2-7 and 2-8 where it is appropriate.

Section 2.3:  Some of the material from subsections of this section, e.g., descriptions of assets, workflows, directives, and tasks, is pretty essential in understanding material presented much earlier.  While it's obviously impossible to present everything at the beginning, it seems as if some of these basic concepts could be described much earlier (in concrete ways, as they are here), and this would contribute a great deal to understanding more complex material concerning interactions among components.  At the very least, there should be forward references to this material from places where the concepts are introduced (but not explained) earlier.

Section 2.3.4.1:  Neither Figure 2-17 nor Figure 2-18 display properly in Office 98 (they are OK in Office 97).