Dynamic Database Panel II
Notes from Meeting #2

Craig Thompson, OBJS
SRI, Menlo Park, CA
April 8-9, 1997

[These are not minutes of the meeting - they are my notes - so they do not provide a complete record of what we covered at the meeting.]

Agenda*

Executive Summary
Attendees
DDB Goals
Background Presentations

RADIUS
GIS Products
OMG/OGC Standards Tutorial
Object Services Architecture for Distributed Middleware and Database Services

Scenario-based Views of DDB

Feature-level Change Detection
IFSAR Change Detection
Entity Level Dynamics
Intelligence Analyst's Aid
Database Scenario re Updating the Hydrology Model
Distribution Scenario
GIS Community Scenario
JFACC meets Dynamic Database

Architectural Issue Discussions

Representation Layer
Consistency Management
Exoskeleton DBMS - Distributed of Function and Data
Design Decisions

Proposed Outline
Wrap-up
Conclusions and Puzzles

* I left around 2:30pm on April 9 so these notes do not cover the last hours of the meeting.

Executive Summary

Tom Burns (DARPA) went over his briefing foils describing the goals of the DDB program. He made some changes based on the meeting. We heard background presentations on RADIUS, the state of practice in the GIS industry today, the state of relevant standards principally OMG and OGC, and on object services architectures. We talked about a series of scenarios that capture views of the DDB. Then we brainstormed on architecture issues related to DDB, principally consistency management in a distributed information environment where many traditional database functions are exported to a network, noting this is a higher risk architecture than a central DBMS since we understand it less well but that it seems to model our understanding of the situation better. Tenney proposed a strawman outline for a DDB annotated briefing.

The meeting helped to scope the DDB problem. My take - initially, DDB will provide a situation model for tactical environments, covering both geospatial and thematic information, but not covering the whole world as I had first thought (though it might grow up to cover the world via some federation-of-regions approach similar to how the MIDB and tracking databases interoperate with others of their own kind). As envisioned, DDB appears complementary to the NIMA-OGC-industry story as I understand it so far. It will have to support multiple coordinate systems but prefers some standard ones (maybe start with IUE-RADIUS's) and can add more. It is open and extensible to multiple representation schemes but comes with a growing standard class or object library covering signal-feature-entities of common interest (start with TBD - MIDB or Common Schema? - not discussed). Also, it may be that different data repositories store different views of objects, that there is no central view, but instead there are tie points or a kind of skeleton object model for connecting representations. Though it will probably be initially centralized as a DBMS, DDB will eventually be throttleable between central and distributed forms providing more or fewer database guarantees and functions (consistency, uncertainty management, security, replication, versioning, …) and populated as needed according to some need-based cost model. A two track approach was suggested to get started: build a simpler somewhat centralized DDB to stabilize some of the modeling issues but at the same time work on an exoskeleton architecture approach to develop the technology for a DDB distributed in data and function and use this to grow the central approach.

My action items (subject to clarification with Tenney) are

provide more detail in the form of annotations on the Services Architecture briefing,
flesh out the DDB distributed function architecture
work with Allan Doyle on tech transition recommendations

Attendees

Bob Bolles, SRI
Tom Burns, DARPA
Bill Fabian, SAIC
Scott Fouse, ISX
John Lowrance, SRI
Dave Maier, OGI
Allan Doyle, BBN
Aaron Heller, SRI
Bob Tenney, ALPHATECH
Victor Tom, AAEC
Craig Thompson, OBJS

Day I - Tuesday, April 8, 1997

Dynamic Database Goals, Tom Burns, DARPA

Tom went over a foil set which he will try to get up on a web site (for now, paper only). This is a 10 year program. It needs an architecture that can grow over 10 years.

Q: relationship between NIMA and DDB? A: initial focus is tactical, theater. Tom is meeting with NIMA soon to work on drawing boundaries.

Q: does DDB integrate only within geospatial or all thematic/situation info. A: both, creates model of the situation.

Q: if I made a bad assumption, can I recover? A: yes, and must reinitialize other sources that had bad derived information from bad assumptions. Interesting metric: how fast can you make system consistent again. Think of DDB as set of spreadsheets. Uncertainty ripples through.

Comment: Logically integrates across domains, e.g., hydrology model and transportation model, don't put power grid in a river, wires that disappear go under bridge.

During the meeting, Tom reworded some of the DDB goals based on the presentations.

Background Presentations

These presentations help set the context of DDB.

Radius, Aaron Heller, SRI

(Tenney introduced Aaron, first mentioning that Tenney's been looking at schemas and has the MIDB all sources DBMS C4I schema document, which is 600 pages.)

Basic idea of RADIUS was model-supported exploitation. One example is: if VIP parking lot fills up then look at nearby display site to see high tech demo. If we have overlays on maps then we can locate hot spots easier. Quick Look, a production subset of RADIUS, is in place in National Exploitation Lab. Another example is cables in field to see if there is ground disturbance to see if oil rig is going up. Each site has local vertical coordinate system (done manually now) based on 6-8 images. Multiple looks at map from different camera parameters. Registration involves finding N data points that are same in several pictures and using rays to connect the points. Q: do line segments have attributes like vertical wall? Yes. If I drag one line, do nearby lines know? Sort of. They bit off the real problems like errors and error propagation. There is missing fusion re buildings having overhangs, rivers flow down hills, other semantic information.

On coordinate systems: GEO 3D geocentric - Local 3D -projection- sensor 2D coordinate system. 2D Objects and Images and the display are transforms from the sensor 2D. GEO maps to various coord systems like Lat/Long, State Plane, UTM via non-linear transforms. What do I do when I have single image showing two sites. There are different models of sphere. USGS uses North America 1927, WGS84 - GPS uses this. States have state plane coordinate systems. Every object (e.g., building) has coordinate system and there is a transform to the site coordinate system. There are transform pairs for bi-directional mappings.

Q: should DDB require one or a small number of coordinate systems -or- be catholic and allow any number? RADIUS tried to make general and simple coordinate systems. Transportation system measures locations in miles along a road. Was there a search requirement. Might use a single coordination system for coarse search and detailed filtering that is post-processing. So index returns candidates and not hits. Drawing complexity associated with objects, does bounding box in a move. Similarly when I zoom. More work on level of detail. USGS has book on coordinate systems --Universal Mercator. Add random amounts of metadata that does not fit schema, e.g., how I know road was asphault? I read it in the New York Times.

Imagery sensor model for mensuration and for absolute positioning. Government has developed the former, Ruler, can measure points on world pretty accurately. GDE, Autometrics, … have own sensor models - proprietary.

RADIUS does not have an explicit model of site showing this parking lot is connected to this road. SEDRIS Synthetic environments at Tech. In demo, you blow out bridge and autonomous jeep knows to find another route. Builds on a tiling. They have another multi-page object model. They model gunship as connection of its parts. They are defining file format as well as object model. Faceted models (triangles) is harder to do search and reasoning on. So representation must address statistics, uncertainty, many coordinate systems. Action Item: Aaron will send us pointers to 5-6 documents on RADIUS - part of IUE workshop. There will be a RADIUS book.

Theme: must support multiple representations and multiple coordinate systems. OGC is taking a more universal coordinate system approach. Is it universal or yet another coordinate system?

GIS Products, Allan Doyle, BBN

Industrial GIS community is somewhat closed and has a reasonably clear idea of its own scope. For a good overview, see Fundamentals of Spatial Info Systems by Laurini and Thompson, Academic Press, 1992. Also, see handout ACM survey article April 97 by Shekar et al.

GIS categories include automated mapping, thematic mapping, map overlay modeling, spatial statistics (accident rate per 1 mile segment of highway), spatial analysis, spatial query, spatial browsing, spatial reasoning, geocoding (given address find lat/long).

Evolution of GIS Products. Traditional all-purpose GIS in 1985, now Product Family GIS still vendor-proprietary and by 1999 Componentware GIS expected. Open GIS community has concept of a bunch of communities with special interests. If I can talk about structures of communities then I can talk across boundaries.

Hard problems are

conflation - combining map info from different sources - e.g., having the streets line up
generalization - going from fine to coarser resolution. Show road and roalroad but for human to see the detail now must move road that's 10 feet from railroad though now looks to be 500 feet. Labels move around, etc.
multi-resolution
topology maintenance
distributed databases - in the small but not across NIMA and USGS. Also mentions how to track things like oil rigs that move when crossing DBMS boundaries.
continuous versus discrete phenomena - where is line between pine and deciduous trees
semantic interoperability
models including hypothesis/what-if/uncertainty - no one else is folding this in
use of existing databases with potentially overlapping information as first class contributors to the situation model. Commercial data might be more accurate than govt data if cleaned for commercial use, trust boundaries.

Players in marketplace:

Categories: Traditional, Newer Entries, Notables
Products include:

ESRI (Environmental Research Institute) - www.esri.com $200M company - full featured but moving to desk top and embedded. ARC/INFO flagship product, ArcView is subset for visualization, MapObjects is embeddable into MS Windows programs via OLE/COM, WWW offering via Java or GIF images from ARC/INFO engine. Also has SDE Spatial Data Engine, designed to do spatial indexing well.
Intergraph www.intergraph.com, is $1B, much in CAD. Working mostly on windows. Web offering is GeoMedia using ActiveCGM. Competing with SGI
MapInfo - $30M - light weight - product is in Excel. Have good geocoding engine.
GenaMap - no edges to maps, seamless, can generate new spatial views by DBMS view notion via spatial filters. Older models mapped San Francisco and you fell off the edge of that world.
Smallworld - www.smallworld-us.com - first OO GIS, uses optimistic versioning, like ObjectStore but can use other DBMS'. Concept of multiple worlds for same object. Same object might be in multiple worlds. Lock on European utility communities.
Laser-Scan - www.laser-scan.com - sounds like Smallworld
Autodesk - www.autodesk.com - biggest CAD vendor - moving into GIS. They are into 3D wire-frames. Good dynamic range and precision. Particular bolt on a ship.
Advanced Visual Systems - n-dimensional visualizations

Mini Tutorial on OMG (and OGC), Craig Thompson, see presentation

OMG is a consortium of 600+ companies with a history from 1989-present focused on open component-based distributed middleware. The presentation covered the environment of DDB including other DARPA programs, other DoD/Govt agencies like NIMA, industry products (Allan's talk), and industry standards focused on OGC (a little) and OMG (a lot). The presentation then drilled down on OMG:

the structure of the organization,
the OMA architecture (bus plus services) and design principles,
specific adopted standards that populate the OMG architecture (with links to each) and roadmap directions,
how DARPA can influence OMG, and
some details on time, location, certainty, and security.

Messages were:

technology must be tied to social organizations
OMG provides a good blueprint for componentware middleware including many traditional DBMS functions
there is much more work to be done before this kind of architecture "covers" the DDB problem (must populate OMG framework with additional services, must do more R&D to prove out the OMG approach) so two way partnering is needed.

Services Architecture Layer for Distributed Middleware and Database Services, Craig Thompson, see presentation, (covered on Day II)

Traditional DBMS systems encapsulate a fixed functionality under the hood. The more the more heavyweight. Wanted is to have our cake and eat it too - an extensible DBMS/framework architecture that is both an open extensible toolkit and a working system. The former, so it is extensible, the latter, so it is usable out of the box. Services architectures attempt to open systems by replacing calling tree with bus and services hanging off so wiring of services can be done later. OMG does this but so does Microsoft (e.g., OLE DB). Talk provides motivation for why you want a fundamentally open architecture, some approaches people have taken, how OMG fits this picture and some of the promises, roadmap, issues, and risks of the approach.

Scenario-based Views of DDB

We looked at the DDB problem from many user scenario points of view to try to extract views of the architecture.

Feature-level Change Detection, Victor Tom, AAEC

In SAIP, entities are target-sized objects. The database contains locations and features of target objects. New detections are target sized. Working in the desert, there are many rocks. Some of them that move are tanks. Change detection is performed at the feature level. Also talked about registration problems where plane flying one way using SAR registers all data points a little off to the side. Need to capture lines and also uncertainty if this data will be used for change detection later on. Need image but also terrain map that says, this is a rocky field area. Might not go all the way to noting the thing was a rock. After about four looks, the uncertainty gets small enough to claim you are seeing real changes. Lots of domain specific knowledge - power grid viewed broadside via radar gives you blooming like effect. Same with railways. Segments of metal polyline broadside to you are pronounced. Accidental alignment is not rare.

DBMS problem - For 40x40km, they use 1M objects stored on RAIDs in Object Store and 4 servers with two analyst workstations. For this block of data you look in four databases. 300 objects fetched from DBMS per second. Use quad trees in x-y. Had Object Store people on site for weeks understanding the problem. They use SGI machine to process SAR images. 300 detections come at them per second. About a dozen features (floating points) per detection for dot on SAR being compared to the existing database. Rate of real changes is much smaller. Collection pre-planned beforehand.

Allan: feels like ATM with switches, data flow machine with loops. Reallocate processing chunks to certain areas. Close to sensor there is a lot of data push. Pre-position low level data analysis.

IFSAR Change Detection, Bob Bolles, SRI

IFSAR produces 3D - similar idea to using stereo, uses two SAR images and composes to locate points more directly in 3D. But to do this reasonably, you need to know about a large number of artifacts. Similar to all data, just at high data rates. Change detection via process like the following conventional steps (signal to feature to entity) in 1M accuracy in 3D.

data acquisition - SARa, SRb, INS, GPS
image formation - phase unwrapping
image annotation - via shadows, front porch, veg
registration - absolute or relative coordinates
differencing - significant change
region formation - group changes into regions
region classification - add or subtract
reporting - what is significant
database updating - fill in, change, stats

Many analyses want the raw data and don't want pre-analyzed data since each layer makes mistakes and create artifacts. Start with INS, GPS and uncertainty estimates. (Reminds me of much simpler Stanford work on hierarchical recursive descent of computing diffs between two documents.) DTED is human product with people worrying about tree level and vegetation penetrating radar. Plan to put one on UAV predator. There are big shadow areas in SAR since missions come in low ( 10-20^o) and don't see into shadow areas behind buildings. So hide tanks behind buildings. Far side is better for registration than leading edge of buildings. Later I fly by from another angle. Some things I have seen twice and others once and some things no times. In built up area, there are many shadows and many bright points. Recommends (tongue not entirely in cheek) that DDB should focus on Iowa and Kansas first, NY later.

Comment: existence of DBMS will change way we collect data. For instance, some aerial data plus SAR gives better effects than SAR only. Use multiple phenomenology. Next thing you run into is doctrinal changes.

Entity-level Dynamics, Bob Tenney

Many kinds of moving entities like vehicles, some good and some bad, some change terrain, some background like refugees, some combat, aggregated objects like convoys, some do warfighting, some engineering, can be aggregated into missions, operational scenario. Maybe 50-100,000 vehicles with reports of movement every minute for one analyst. MTI finds things with Doppler movement down line of sight vectors. Geometry of uncertainty is ellipse 20m by 300m. So convoys moving right at you are harder to see. Ground order of battle MTI analyst. Connects dots to become tracks. Inputs are MTI reports, sensor model, vehicle model, and MTI tracklets. Everyone working on this has their own points and lines and estimates of these and these schemas are built into algorithms. Much vehicular traffic goes along roads so if you know that road comes toward you or moves away then you can use this info with the MTI info and predict signal noise. Usually assumes roads are given, as polylines. Many ways to represent activity graphs (Petri nets, …). Then there are graphs with potential patterns. Inventory x(t) = x(t-1) + arrivals(x) -departures(x) like patterns. MTI assumes roads are known and given. Does care about obscuration. Vehicle level fusion. What happens when two vehicles come to intersection? One stops. Semantics. Have to worry about double counting.

Intelligence Analyst's Aid, John Lowrance, SRI

This problem starts with somewhat analyzed data and tries to do evidential analysis fusing and providing certainties from ELINT, IMINT, and other INT data. Automated version of this is like automated version of manual. His inputs are cartographic, text and CNN reports, and is focusing on evidential reasoning, how to go from facts to conclusions. What are grand challenges for Uncertainty Management:

DDB must be capable of multiple representations of space, time, belief. Must be able to index into this. So we need an overarching representation. (questionable if is one)
when you introduce a new representation, you need to introduce standard services, mapping to overarching representation. Also, pairwise representation conversions. Transforms from one form to another. Best sequence of information preserving transformations.
DDB will store all situational information - what's happening and not just where objects are
Pedigrees are essential to uncertainty management. What are sources, what fusion algorithm, how credible the results, are you double counting. Cascading. Pedigree goes to most recent sources and can go back. If knowing how products are produced then you can tell if you are double counting. Q: Is this true re opaque operations and aggregate operations? A: record entities returned from such operations
The DDB is a database of evidence and opinion. Entries do not change, not updated, monotonic. So you add a new argument that overrides. (Semantic Garbage Collection might be possible). External agents ascribe truth to entries in the DDB, the DDB does not. The specification provides a Truth Maintenance Service (derivation history with propagated uncertainty). This supports argumentation from many angles and so there can be multiple conflicting statements. No blessed model of the truth. Tenney comments: consistency is more important than correctness - JCS tells us this in some publication. Response: A well constructed argument contains dissenting facts and why they are there.
Want to index over space, time, type of entity, and probability. Uncertainty can cut across any and all attributes. Hard to see how to overlay uncertainty index over relational model. What objects might have been in this area in the last four days. So, if I have all info for a 30x30km area every hour for a month, then queries are about tell me what entity did. Uncertainty on entity (rock or truck), loc here or 3m away, time is 20 minutes off. Give me expected number of vehicles that passed through volume over this time. Examples are two tracks crossing or did they meet and go apart? Keep license plate? Relax the model, only need to know its a tank, not what kind, …

Comment: World divided into lumpers versus splitters (and a very few like me who want to explain how to have your cake and eat it too) - GM wants virtual office so GM gets gears from small suppliers, more flexibility if product line shifts. Want virtual object dbms so you can add in new function. Absorption model allows some functions into the DBMS boundary. Boundary will vary over time. Push them out until they need to come in.

Database Scenario re Updating the Hydrology Model, Dave Maier

Hydrology info collected over two years, some products are summary data on how river changes, need pedigree info to go along with data products, some info is archived into regional summary products, some info is archived and some is lost. Might send data eagerly (push), send notification of replicas, or pull from other subsystems. Subscription fulfillment. Filtering - I only want new data if river moved its banks. Repair - we sent you last week's data by mistake. Rollback, compensating update. Flagging, tagging, or notification to tell someone who used the data that something is wrong. So mark it as bad and in spreadsheet way, the information is recalculated. Purging in self-cleaning oven model (discrete repairs) and frost-free refrigerator (runs procedure continuously). Pollution, dilution model. Good data after that dilutes the bad data I got last week. What kinds of pollutants will be diluted and what won't. Comment: Multi-source fusion dilutes out - designed to get rid of errors from single sources. Comment: Sensitivity analysis finds that some errors do not matter.

Some discussion on security and survivability, pedigrees, security exoskeletons and endoskeletons. Craig comment: this is similar question to the one, where are the DBMS boundaries and what functionality is in/out. Rubberband functionality into system only if DBMS is open-architected.

Distribution Scenario, Craig Thompson, see presentation

If sensors, collection sites, expertise analysis and behavior, and the many consumers of DDB data are not co-located, then the problem is distributed. For each of the other scenarios, the descriptions left out the system steps that hide the distribution (and security) of the data they are analyzing (good - that means distribution is somewhat orthogonal up to latency). One might use a graph to model DDB data flow. There are many representations and they need coercing so maybe we never put the data into a central logical form - maybe we just use it to compile down mappings from representation A to C via defined mappings from A to B and B to C where B is the general mapping. The functions of the architecture are distributed; security and data storage are distributed.

GIS Community Scenario, Allan Doyle

NIMA plans to issue CBDs to industry to collect spatial grids. Commercial use in intelligent transportation systems. We need to wed the DDB to commercial sources as well. Pedigrees that reach back into commercial systems might not be as trusted and might provide opaque pedigrees (Craig: or they might be more trusted since they have removed errors from data). Discussion: Allan: An issue is doctrinal lines and which command takes responsibility for DDB. Craig: one possibility is DDB provides for a federation architecture like the tracking DBMS did and that NIMA can be viewed as the central authority and owner of a strategic DDB that is the sum of tactical DDBs and does archiving. DDB as manager of many data sources and operation are managed.

JFACC meets Dynamic Database Scenario, Scott Fouse (covered on Day II)

Given a plan, you execute it (hit targets) and campaign assessment captures "how did we do". Talked about tying into query with an economic model to see what cheapest plan is. Foils show off Plan vs Actual and green for good and red for not done. Yellow is a need and plan but haven't done it yet. The plans themselves are stored persistently in the DBMS.

Day II - Wednesday, April 9, 1997

Views of DDB Architecture

DDB Representation on Whiteboard

During the Tuesday Scenarios discussion, Bob Tenney captured the following list of things that have to be represented in DDB:

Signal Features - edges, vertices, points, noun clauses, sensor coordinates
Physical Features - soil properties, vegetation properties, thermal properties, fields, forests, multiresolution, sensor footprints and viewing geometry
Terrain (2.5D/1.5D) - Elevation, resolution, global WGS coordinates
Cultural (3D) - buildings, roads, parking lots, trees, adjacency, connectivity, names, multiresolution, deterministic clutter, local cartesian coordinates (body, part)
Vehicle/People - car, truck, tank, railcar, airplane, + physical characteristics, names and kinematics
Force/Function - procedures, functional relations, logistics flows

System - coverages, indexes, configuration, data and component location, component connections, users, schedules, dates, products, knowledge, domains.

Craig: a puzzle is, what sort of representation model do you want for DDB? Objects, of course! But with objects, they have fixed numbers of attributes since they are an abstraction of a real world thing from one point of view. Over the life of a system there are always more attributes you might want for certain purposes so should objects have multiple representations. Over lunch: Maier mentioned ecology experiment with many scientists and experiments -- they use a skeleton data model to try to tie results together so all have something of a common model of tree-ids and heights that are in the grid but many differ in the data they collect about trees. Someone else called this tie points.

Consistency Management

Maier: Master-copy consistency either via explicit replicates or implicit in products. Domain-domain (structures touch the ground). With centralized control, the DDB knows every use of the data. An alternative is that clients that check out data take responsibility for consistency management. The external apps must become responsible for processing changes and exceptions. So, might start DDB with master-copy, then move to domain-domain (with 2 domains), then expand to more domains. Provides a consistency kernel. Tenney asks if you can overlay the terrain, entity model, and IFSAR and how to resolve conflicts consistent with imagery. Bill talks about master-clone - only masters can propagate change, clones can replicate self and choose for app-specific reasons to vary. Clone is autonomous. Can nominate change to master.

Craig: John Beetam's dissertation at Stanford about 15 years ago covered a general framework for update in a multiple representation problem where it provided bookkeeping of changes made to one representation (CAD logic level) that needed to be made in a corresponding representation (layout) and some could be done automatically and some had to be done manually. That means the data is inconsistent between the two representations for a time, you know can track that it is inconsistent and you know the changes that were made to one representation that must be (eventually) accounted for in the other.

Exoskeleton DBMS - Distributed of Function and Data

Craig to David M: what you are doing is "turning the DBMS inside out" by defining some protocols (rich family with policies) that an external agent must use to maintain consistency among data sources even if not under the central DBMS control. The same sort of thing can be said about security, versioning, replication, persistence, …. Also, uncertainty boundaries. This is a general approach to thinking about an exoskeleton DBMS.

Universe is partitioned in various ways. Universe to Surface, Signal, Transport, the latter maps to vehices, routes, roads. Given <geographic region BY Equipment, Transport, Surface}> matrix, what happens if DBMS is central versus distributed with consistency maintenance. Could have one big DBMS or nine regional functional ones. Now add a split of some DBMSs providing functions f1..fn and others fi..fj where functions overlap. Do you again split the problem into the cartesian product of databases (splitter) or have one heavyweight maximal DBMS (lumper) or something in between.

Theological issue - there will be many gods. Agent architecture (or services architecture) for DDB - agent knows about domain knowledge, mutual models, evidence updates, as consistency relations among agents. What is distinction between god and minion. Both have facts and responsibilities. Now what is the partitioning. At the extreme, there a a bunch of agents. I know about roads or I need to know about roads. Pedigree points back and forward to support drill down and change propagation. Might want to look at distributed truth maintenance (Lesser). OR community has looked at this as well.

In the middle, the data may be more passive and a large conductor (super-agent). A local DBMS. Bob's Consistency Service knows about buildings and grounds. Allan's Consistency Service, if you subscribe, lets you know about tree lines. One god or many, a pantheon of control. California owns the master who is responsible for propagation. Others in Palo Alto are responsible for proposing updates. An upper level controller might just control the boundaries where lower levels control within their subarea. If there is a consistency service then the things want to be controlled. New UAVs that fly into buildings create new consistency management puzzles -- so there will always be new needs to add to the DDB.

Good memory: at one point Tenney draws circle (agent) with little circles inside for some of its functions. Craig: points out that Tenney is making agent in his own image since the picture sort of looks like him.

Simpler near-term design with one god, then move to a more distributed environment. Must be immediate value. Maybe there is a centralized schema exercise and a parallel path to demonstrate consistency. What does it take to add a fourth model?

Consistency specifications must take into account temporal dimensions so you might ask for the 1985 map of Palo Alto. One puzzle is, in your request, did you leave out consistency specifications because you did not understand something about the data.

If you add an economic model so customers will "pay" for the gathering and querying of the info I might need. You populate the model on an as needed basis.

Tenney: Unanticipated queries to build on-the-fly products. Is the schema set up up front or as part of query? Not sure what we decided but we did decide that we'd put more effort into answering queries from top brass than from less important sources.

Design Decisions

Coordinate systems

single universal - NOT
few -> reference -> global WGS84, tangent Cartesian
any -> local processing
I think: choose preferred set, but allow user defined coordinate systems

Representation

single universal - NOT
few -> reference indexed covering features to forces

skeleton schema + difference schemas - tie points via be-number
populate

any and all - local only, not necessarily indexed or linked
I think: must support several (though few/no DBMS systems do this now) - would not hurt to start with IDL

Uncertainty

similar to coordinate systems, discrete or continuous. There are multiple kinds of uncertainty policy. Going from Bayesian to bounded scheme loses information. This means we are supporting multiple schemes here too.
note: it is not normal in traditional dbms to store versions, plans, inconsistencies, ... so we are generalizing the DBMS paradigm since we must store all of these persistently.

Architecture

definitized by June - impossible with accelerated schedule
planned and staged - milestones and baseline + evolution process - still hard
initial condition for centralized plan (18-24 months) moving to a more distributed model
I think: resonable, suggest two track, one central and one on "distributed" system architectue

Archiving

everything
policy-definable - retention rules - the general approach, subsumes the other two
data + pedigrees

Suggestion for near-term - two tracks

as if the problem is centralized - focused on building prototype
as if the problem is decentralized - focused on defining distribution architecture

Proposal for Final Briefing

Tenney proposed the following outline (which I only captured part of). He will post it to the web site as a .ppt template.

Introduction - Vision for Year 2010+

problem
solution
approach

Baseline architecture

knowledge architecture
domains including signal features, lines of comm, equipment, forces and functions,
system - resources, topology, demands
Domain Services/Functional Architecture
Initialization, registration/matching, coordinate conversion, change analysis, evidence management, index maintenance, error recovery
…
domain functions - consistency constraint, models
system functions - version management, indexes, resource types

Capability Evolution

centralized w deep domain
distributed w deep coordination
unified

Environment - software and hardware
Evaluation and Transition
Community Support
Appendix - technical

Wrap-up

Next meeting - Washington D.C. May 7-8 at ISX offices in Washington D.C.

Q: if others ask to be involved, what is the answer? Tom's answer is to filter.

Conclusions and Puzzles*

* as influenced by the two meetings I have attended.

On some of the topics we discussed:

coordinate systems - select a few most general as preferred, support user defined coordinate models, rely on RADIUS/IUE for a starter set, talk to OGC about decision. Maybe gather the knowledgeble people from RADIUS, IUE, OGC, SEDRIS, USGS, probably a dozen more places. Issues include: does all data get natively represented in the preferred coordinate systems or can it natively reside in domain-specific coordinate systems, if ever needed in another coordinate system, can one compile the mapping to the new coordinate system, loss of information and error propagation across transforms, indexing in an environment with many coordinate systems.
schemas and representation - there are many to choose from - we cannot ignore MIDB (all 600 pages), which provides a general situation description, or Tracker DBMS or the ISO Common Schema that is evolving. Maybe we again need to gather the most knowledgeable parties into a set of workshops. There are several issues: not only what entities to represent but also what representation scheme(s) to use (one or many, objects or relations, if objects then which religion IDL, Java, C++, ..., or several). Then there is the puzzles about skeleton/tie object models and representing relationships including those across object models or DBMSs to help capture cross-system constraints.
GIS and spatial operations, image understanding operations, ... - Allan's talk gave one list of puzzles but this is a semantically rich area. One puzzle I will add from the systems perspective is: a wide range of capabilities are contained in commercial and research systems that manipulate images and other spatial representations. From a systems perspective, if the data is in data source one governed by operator set one but operators from another set are needed to process some aspect of the data (the normal case now, where the data is in one GIS product stovepipe and operations of interest reside in another), then how can I get the data to the operators the cheapest way? Will the mapping across representations (disassociating the data representation and storage) from the operator sets provide the right indirection?
distributed services/agent architecture - This is the long list of system and database functions we know we want the DDB to be able to do for us, both up front and evolving over time - persistence, concurrency, ... rules, uncertainty, .... Starrting with a closed central model seems to represent comfort since we know more about building these. The distributed variant represents the greatest risk and a huge payoff well beyond DDB if we can understand how to build systems that can throttle from being many little systems to a maximal system -- in the general case, this is the stovepipe antidote but the problem can be simplified to discrete subproblems. We have some strong hints that this is possible but there will be lots of engineering experiments along the way to better understand how to do this. A good inventment for DARPA -- revolutionary if it succeeds with high DoD and industry payback.
modeling consistency, uncertainty, possible worlds, all past states - We should be designing these in from the start. They can affect the query language, indices, etc. Not the only thing to think about up front - other things to think about up front are cost models and QoS in queries, optimization in distributed systems, etc. These are a set of subproblems that tie both into the representation and into the distributed services/agent infrastructure architecture.

On the possible structure of a DDB program:

Structure of program - define a two track program

Track #1 - focus on getting the following understood in a centralized DBMS world -- coordinate systems, schemas and representation problems, GIS and spatial operations, image understanding operations, and modeling consistency, uncertainty, possible worlds, all past states
Track #2 - focus on the distributed services/agent architecture

Define initial definitional and scoping phase of 2-3 years to define a common architecture and prototypes for DDB that consists of

architectural or system view of DDB (Track #2)

DDB must provide consistent situation model of tactical battlespace based on all-source data including framework for

geospatial information
thematic information

provide means of tying the views below together into an integrated system with open interfaces, so result is evolvable, scaleable, secure, survivable, understandable
system must be constructed modularly as components with well defined interfaces
define architectural principles for component composition, federation, and evolution resulting in survivable system
provide view of system as data or information flow framework, modeling how we do the problem today and how we might do it in future

image analysis architectural layer (Track #1)

provide signal to entity framework consisting of data types based on kinds of information and family of algorithms that operate on them

representation architectural layer (Track #1)

scope is situation modeling in tactical battlespace
identify spatial representations/schemas, provide transforms and mapping among these, recommend mappings (universal or dominant).
provide support for schema evolution, weak schemas for semi-structured information
catalog domain entities of interest at multiple levels of detail
provide uncertainty modeling, argumentation, pedigrees,

database, middleware, and comm architectural layer (Track #2)

defines collection of services for distribution, replication, persistence, versioning, transactions, security, payment, ownership, firewalls, …
a goal is to make these generic but the results, demonstrated in the DDB environment
results must be portable across a range of environments, depending on Java/ActiveX/CORBA/web infrastructures or similar future technologies as appropriate.

integration demonstrations and technology transfer (Track #1 and #2)

apply results of DARPA technology programs to DDB problem (I*3, IC&V, IM, IUE, …)

define or apply I*3 information - information mediation architectures, queries, agents, relaxation, ontologies, query languages
define or apply Information Management - modeling, views, metadata and repository architectures
define or apply Collaboration technologies - provide components for sharing information, modeling process, modeling data and workflow
define or apply Visualization technologies - how user can state queries, see results, and interface to map displays

tech transfer models of how developed technologies will affect

other DARPA and DoD application programs (JTF, JFACC, ALP, GENOA, …, NIMA, ...)
industry products (possibly via OGC)
industry componentware standards (OGC, OMG, others?)

Dynamic Database Panel II Notes from Meeting #2