Craig Thompson, OBJS
SRI, Menlo Park, CA
April 8-9, 1997
[These are not minutes of the meeting - they are my notes - so they do not provide a complete record of what we covered at the meeting.]
* I left around 2:30pm on April 9 so these notes do not cover the last hours of the meeting.
Tom Burns (DARPA) went over his briefing foils describing the goals of the DDB program. He made some changes based on the meeting. We heard background presentations on RADIUS, the state of practice in the GIS industry today, the state of relevant standards principally OMG and OGC, and on object services architectures. We talked about a series of scenarios that capture views of the DDB. Then we brainstormed on architecture issues related to DDB, principally consistency management in a distributed information environment where many traditional database functions are exported to a network, noting this is a higher risk architecture than a central DBMS since we understand it less well but that it seems to model our understanding of the situation better. Tenney proposed a strawman outline for a DDB annotated briefing.
The meeting helped to scope the DDB problem. My take - initially, DDB will provide a situation model for tactical environments, covering both geospatial and thematic information, but not covering the whole world as I had first thought (though it might grow up to cover the world via some federation-of-regions approach similar to how the MIDB and tracking databases interoperate with others of their own kind). As envisioned, DDB appears complementary to the NIMA-OGC-industry story as I understand it so far. It will have to support multiple coordinate systems but prefers some standard ones (maybe start with IUE-RADIUS's) and can add more. It is open and extensible to multiple representation schemes but comes with a growing standard class or object library covering signal-feature-entities of common interest (start with TBD - MIDB or Common Schema? - not discussed). Also, it may be that different data repositories store different views of objects, that there is no central view, but instead there are tie points or a kind of skeleton object model for connecting representations. Though it will probably be initially centralized as a DBMS, DDB will eventually be throttleable between central and distributed forms providing more or fewer database guarantees and functions (consistency, uncertainty management, security, replication, versioning, …) and populated as needed according to some need-based cost model. A two track approach was suggested to get started: build a simpler somewhat centralized DDB to stabilize some of the modeling issues but at the same time work on an exoskeleton architecture approach to develop the technology for a DDB distributed in data and function and use this to grow the central approach.
My action items (subject to clarification with Tenney) are
Tom went over a foil set which he will try to get up on a web site (for now, paper only). This is a 10 year program. It needs an architecture that can grow over 10 years.
Q: relationship between NIMA and DDB? A: initial focus is tactical, theater. Tom is meeting with NIMA soon to work on drawing boundaries.
Q: does DDB integrate only within geospatial or all thematic/situation info. A: both, creates model of the situation.
Q: if I made a bad assumption, can I recover? A: yes, and must reinitialize other sources that had bad derived information from bad assumptions. Interesting metric: how fast can you make system consistent again. Think of DDB as set of spreadsheets. Uncertainty ripples through.
Comment: Logically integrates across domains, e.g., hydrology model and transportation model, don't put power grid in a river, wires that disappear go under bridge.
During the meeting, Tom reworded some of the DDB goals based on the presentations.
These presentations help set the context of DDB.
(Tenney introduced Aaron, first mentioning that Tenney's been looking at schemas and has the MIDB all sources DBMS C4I schema document, which is 600 pages.)
Basic idea of RADIUS was model-supported exploitation. One example is: if VIP parking lot fills up then look at nearby display site to see high tech demo. If we have overlays on maps then we can locate hot spots easier. Quick Look, a production subset of RADIUS, is in place in National Exploitation Lab. Another example is cables in field to see if there is ground disturbance to see if oil rig is going up. Each site has local vertical coordinate system (done manually now) based on 6-8 images. Multiple looks at map from different camera parameters. Registration involves finding N data points that are same in several pictures and using rays to connect the points. Q: do line segments have attributes like vertical wall? Yes. If I drag one line, do nearby lines know? Sort of. They bit off the real problems like errors and error propagation. There is missing fusion re buildings having overhangs, rivers flow down hills, other semantic information.
On coordinate systems: GEO 3D geocentric - Local 3D -projection- sensor 2D coordinate system. 2D Objects and Images and the display are transforms from the sensor 2D. GEO maps to various coord systems like Lat/Long, State Plane, UTM via non-linear transforms. What do I do when I have single image showing two sites. There are different models of sphere. USGS uses North America 1927, WGS84 - GPS uses this. States have state plane coordinate systems. Every object (e.g., building) has coordinate system and there is a transform to the site coordinate system. There are transform pairs for bi-directional mappings.
Q: should DDB require one or a small number of coordinate systems -or- be catholic and allow any number? RADIUS tried to make general and simple coordinate systems. Transportation system measures locations in miles along a road. Was there a search requirement. Might use a single coordination system for coarse search and detailed filtering that is post-processing. So index returns candidates and not hits. Drawing complexity associated with objects, does bounding box in a move. Similarly when I zoom. More work on level of detail. USGS has book on coordinate systems --Universal Mercator. Add random amounts of metadata that does not fit schema, e.g., how I know road was asphault? I read it in the New York Times.
Imagery sensor model for mensuration and for absolute positioning. Government has developed the former, Ruler, can measure points on world pretty accurately. GDE, Autometrics, … have own sensor models - proprietary.
RADIUS does not have an explicit model of site showing this parking lot is connected to this road. SEDRIS Synthetic environments at Tech. In demo, you blow out bridge and autonomous jeep knows to find another route. Builds on a tiling. They have another multi-page object model. They model gunship as connection of its parts. They are defining file format as well as object model. Faceted models (triangles) is harder to do search and reasoning on. So representation must address statistics, uncertainty, many coordinate systems. Action Item: Aaron will send us pointers to 5-6 documents on RADIUS - part of IUE workshop. There will be a RADIUS book.
Theme: must support multiple representations and multiple coordinate systems. OGC is taking a more universal coordinate system approach. Is it universal or yet another coordinate system?
Industrial GIS community is somewhat closed and has a reasonably clear idea of its own scope. For a good overview, see Fundamentals of Spatial Info Systems by Laurini and Thompson, Academic Press, 1992. Also, see handout ACM survey article April 97 by Shekar et al.
GIS categories include automated mapping, thematic mapping, map overlay modeling, spatial statistics (accident rate per 1 mile segment of highway), spatial analysis, spatial query, spatial browsing, spatial reasoning, geocoding (given address find lat/long).
Evolution of GIS Products. Traditional all-purpose GIS in 1985, now Product Family GIS still vendor-proprietary and by 1999 Componentware GIS expected. Open GIS community has concept of a bunch of communities with special interests. If I can talk about structures of communities then I can talk across boundaries.
Hard problems are
Players in marketplace:
OMG is a consortium of 600+ companies with a history from 1989-present focused on open component-based distributed middleware. The presentation covered the environment of DDB including other DARPA programs, other DoD/Govt agencies like NIMA, industry products (Allan's talk), and industry standards focused on OGC (a little) and OMG (a lot). The presentation then drilled down on OMG:
Traditional DBMS systems encapsulate a fixed functionality under the
hood. The more the more heavyweight. Wanted is to have our cake and eat
it too - an extensible DBMS/framework architecture that is both an open
extensible toolkit and a working system. The former, so it is extensible,
the latter, so it is usable out of the box. Services architectures attempt
to open systems by replacing calling tree with bus and services hanging
off so wiring of services can be done later. OMG does this but so does
Microsoft (e.g., OLE DB). Talk provides motivation for why you want a fundamentally
open architecture, some approaches people have taken, how OMG fits this
picture and some of the promises, roadmap, issues, and risks of the approach.
We looked at the DDB problem from many user scenario points of view to try to extract views of the architecture.
In SAIP, entities are target-sized objects. The database contains locations and features of target objects. New detections are target sized. Working in the desert, there are many rocks. Some of them that move are tanks. Change detection is performed at the feature level. Also talked about registration problems where plane flying one way using SAR registers all data points a little off to the side. Need to capture lines and also uncertainty if this data will be used for change detection later on. Need image but also terrain map that says, this is a rocky field area. Might not go all the way to noting the thing was a rock. After about four looks, the uncertainty gets small enough to claim you are seeing real changes. Lots of domain specific knowledge - power grid viewed broadside via radar gives you blooming like effect. Same with railways. Segments of metal polyline broadside to you are pronounced. Accidental alignment is not rare.
DBMS problem - For 40x40km, they use 1M objects stored on RAIDs in Object Store and 4 servers with two analyst workstations. For this block of data you look in four databases. 300 objects fetched from DBMS per second. Use quad trees in x-y. Had Object Store people on site for weeks understanding the problem. They use SGI machine to process SAR images. 300 detections come at them per second. About a dozen features (floating points) per detection for dot on SAR being compared to the existing database. Rate of real changes is much smaller. Collection pre-planned beforehand.
Allan: feels like ATM with switches, data flow machine with loops. Reallocate processing chunks to certain areas. Close to sensor there is a lot of data push. Pre-position low level data analysis.
IFSAR produces 3D - similar idea to using stereo, uses two SAR images and composes to locate points more directly in 3D. But to do this reasonably, you need to know about a large number of artifacts. Similar to all data, just at high data rates. Change detection via process like the following conventional steps (signal to feature to entity) in 1M accuracy in 3D.
Many analyses want the raw data and don't want pre-analyzed data since each layer makes mistakes and create artifacts. Start with INS, GPS and uncertainty estimates. (Reminds me of much simpler Stanford work on hierarchical recursive descent of computing diffs between two documents.) DTED is human product with people worrying about tree level and vegetation penetrating radar. Plan to put one on UAV predator. There are big shadow areas in SAR since missions come in low ( 10-20o) and don't see into shadow areas behind buildings. So hide tanks behind buildings. Far side is better for registration than leading edge of buildings. Later I fly by from another angle. Some things I have seen twice and others once and some things no times. In built up area, there are many shadows and many bright points. Recommends (tongue not entirely in cheek) that DDB should focus on Iowa and Kansas first, NY later.
Comment: existence of DBMS will change way we collect data. For instance, some aerial data plus SAR gives better effects than SAR only. Use multiple phenomenology. Next thing you run into is doctrinal changes.
Many kinds of moving entities like vehicles, some good and some bad, some change terrain, some background like refugees, some combat, aggregated objects like convoys, some do warfighting, some engineering, can be aggregated into missions, operational scenario. Maybe 50-100,000 vehicles with reports of movement every minute for one analyst. MTI finds things with Doppler movement down line of sight vectors. Geometry of uncertainty is ellipse 20m by 300m. So convoys moving right at you are harder to see. Ground order of battle MTI analyst. Connects dots to become tracks. Inputs are MTI reports, sensor model, vehicle model, and MTI tracklets. Everyone working on this has their own points and lines and estimates of these and these schemas are built into algorithms. Much vehicular traffic goes along roads so if you know that road comes toward you or moves away then you can use this info with the MTI info and predict signal noise. Usually assumes roads are given, as polylines. Many ways to represent activity graphs (Petri nets, …). Then there are graphs with potential patterns. Inventory x(t) = x(t-1) + arrivals(x) -departures(x) like patterns. MTI assumes roads are known and given. Does care about obscuration. Vehicle level fusion. What happens when two vehicles come to intersection? One stops. Semantics. Have to worry about double counting.
This problem starts with somewhat analyzed data and tries to do evidential analysis fusing and providing certainties from ELINT, IMINT, and other INT data. Automated version of this is like automated version of manual. His inputs are cartographic, text and CNN reports, and is focusing on evidential reasoning, how to go from facts to conclusions. What are grand challenges for Uncertainty Management:
Comment: World divided into lumpers versus splitters (and a very few like me who want to explain how to have your cake and eat it too) - GM wants virtual office so GM gets gears from small suppliers, more flexibility if product line shifts. Want virtual object dbms so you can add in new function. Absorption model allows some functions into the DBMS boundary. Boundary will vary over time. Push them out until they need to come in.
Hydrology info collected over two years, some products are summary data on how river changes, need pedigree info to go along with data products, some info is archived into regional summary products, some info is archived and some is lost. Might send data eagerly (push), send notification of replicas, or pull from other subsystems. Subscription fulfillment. Filtering - I only want new data if river moved its banks. Repair - we sent you last week's data by mistake. Rollback, compensating update. Flagging, tagging, or notification to tell someone who used the data that something is wrong. So mark it as bad and in spreadsheet way, the information is recalculated. Purging in self-cleaning oven model (discrete repairs) and frost-free refrigerator (runs procedure continuously). Pollution, dilution model. Good data after that dilutes the bad data I got last week. What kinds of pollutants will be diluted and what won't. Comment: Multi-source fusion dilutes out - designed to get rid of errors from single sources. Comment: Sensitivity analysis finds that some errors do not matter.
Some discussion on security and survivability, pedigrees, security exoskeletons and endoskeletons. Craig comment: this is similar question to the one, where are the DBMS boundaries and what functionality is in/out. Rubberband functionality into system only if DBMS is open-architected.
If sensors, collection sites, expertise analysis and behavior, and the many consumers of DDB data are not co-located, then the problem is distributed. For each of the other scenarios, the descriptions left out the system steps that hide the distribution (and security) of the data they are analyzing (good - that means distribution is somewhat orthogonal up to latency). One might use a graph to model DDB data flow. There are many representations and they need coercing so maybe we never put the data into a central logical form - maybe we just use it to compile down mappings from representation A to C via defined mappings from A to B and B to C where B is the general mapping. The functions of the architecture are distributed; security and data storage are distributed.
NIMA plans to issue CBDs to industry to collect spatial grids. Commercial use in intelligent transportation systems. We need to wed the DDB to commercial sources as well. Pedigrees that reach back into commercial systems might not be as trusted and might provide opaque pedigrees (Craig: or they might be more trusted since they have removed errors from data). Discussion: Allan: An issue is doctrinal lines and which command takes responsibility for DDB. Craig: one possibility is DDB provides for a federation architecture like the tracking DBMS did and that NIMA can be viewed as the central authority and owner of a strategic DDB that is the sum of tactical DDBs and does archiving. DDB as manager of many data sources and operation are managed.
Given a plan, you execute it (hit targets) and campaign assessment captures "how did we do". Talked about tying into query with an economic model to see what cheapest plan is. Foils show off Plan vs Actual and green for good and red for not done. Yellow is a need and plan but haven't done it yet. The plans themselves are stored persistently in the DBMS.
During the Tuesday Scenarios discussion, Bob Tenney captured the following list of things that have to be represented in DDB:
System - coverages, indexes, configuration, data and component location, component connections, users, schedules, dates, products, knowledge, domains.
Craig: a puzzle is, what sort of representation model do you want for DDB? Objects, of course! But with objects, they have fixed numbers of attributes since they are an abstraction of a real world thing from one point of view. Over the life of a system there are always more attributes you might want for certain purposes so should objects have multiple representations. Over lunch: Maier mentioned ecology experiment with many scientists and experiments -- they use a skeleton data model to try to tie results together so all have something of a common model of tree-ids and heights that are in the grid but many differ in the data they collect about trees. Someone else called this tie points.
Maier: Master-copy consistency either via explicit replicates or implicit in products. Domain-domain (structures touch the ground). With centralized control, the DDB knows every use of the data. An alternative is that clients that check out data take responsibility for consistency management. The external apps must become responsible for processing changes and exceptions. So, might start DDB with master-copy, then move to domain-domain (with 2 domains), then expand to more domains. Provides a consistency kernel. Tenney asks if you can overlay the terrain, entity model, and IFSAR and how to resolve conflicts consistent with imagery. Bill talks about master-clone - only masters can propagate change, clones can replicate self and choose for app-specific reasons to vary. Clone is autonomous. Can nominate change to master.
Craig: John Beetam's dissertation at Stanford about 15 years ago covered a general framework for update in a multiple representation problem where it provided bookkeeping of changes made to one representation (CAD logic level) that needed to be made in a corresponding representation (layout) and some could be done automatically and some had to be done manually. That means the data is inconsistent between the two representations for a time, you know can track that it is inconsistent and you know the changes that were made to one representation that must be (eventually) accounted for in the other.
Craig to David M: what you are doing is "turning the DBMS inside out" by defining some protocols (rich family with policies) that an external agent must use to maintain consistency among data sources even if not under the central DBMS control. The same sort of thing can be said about security, versioning, replication, persistence, …. Also, uncertainty boundaries. This is a general approach to thinking about an exoskeleton DBMS.
Universe is partitioned in various ways. Universe to Surface, Signal, Transport, the latter maps to vehices, routes, roads. Given <geographic region BY Equipment, Transport, Surface}> matrix, what happens if DBMS is central versus distributed with consistency maintenance. Could have one big DBMS or nine regional functional ones. Now add a split of some DBMSs providing functions f1..fn and others fi..fj where functions overlap. Do you again split the problem into the cartesian product of databases (splitter) or have one heavyweight maximal DBMS (lumper) or something in between.
Theological issue - there will be many gods. Agent architecture (or services architecture) for DDB - agent knows about domain knowledge, mutual models, evidence updates, as consistency relations among agents. What is distinction between god and minion. Both have facts and responsibilities. Now what is the partitioning. At the extreme, there a a bunch of agents. I know about roads or I need to know about roads. Pedigree points back and forward to support drill down and change propagation. Might want to look at distributed truth maintenance (Lesser). OR community has looked at this as well.
In the middle, the data may be more passive and a large conductor (super-agent). A local DBMS. Bob's Consistency Service knows about buildings and grounds. Allan's Consistency Service, if you subscribe, lets you know about tree lines. One god or many, a pantheon of control. California owns the master who is responsible for propagation. Others in Palo Alto are responsible for proposing updates. An upper level controller might just control the boundaries where lower levels control within their subarea. If there is a consistency service then the things want to be controlled. New UAVs that fly into buildings create new consistency management puzzles -- so there will always be new needs to add to the DDB.
Good memory: at one point Tenney draws circle (agent) with little circles inside for some of its functions. Craig: points out that Tenney is making agent in his own image since the picture sort of looks like him.
Simpler near-term design with one god, then move to a more distributed environment. Must be immediate value. Maybe there is a centralized schema exercise and a parallel path to demonstrate consistency. What does it take to add a fourth model?
Consistency specifications must take into account temporal dimensions so you might ask for the 1985 map of Palo Alto. One puzzle is, in your request, did you leave out consistency specifications because you did not understand something about the data.
If you add an economic model so customers will "pay" for the gathering and querying of the info I might need. You populate the model on an as needed basis.
Tenney: Unanticipated queries to build on-the-fly products. Is the schema set up up front or as part of query? Not sure what we decided but we did decide that we'd put more effort into answering queries from top brass than from less important sources.
Suggestion for near-term - two tracks
Tenney proposed the following outline (which I only captured part of). He will post it to the web site as a .ppt template.
Next meeting - Washington D.C. May 7-8 at ISX offices in Washington D.C.
Q: if others ask to be involved, what is the answer? Tom's answer is to filter.
* as influenced by the two meetings I have attended.
On some of the topics we discussed:
On the possible structure of a DDB program: