Craig Thompson, OBJS
ISX Offices, Washington D.C.
12-13 March 1997
[These are not minutes of the meeting - they are my notes - so they do not provide a complete record of what we covered at the meeting.]
This was meeting #1 of four meetings. The end result of the panel will be a briefing presented to Tom Burns (DARPA/ISO). The briefing aims to provide a high level architecture for an envisioned Dynamic Database (DDB) that will provide a central information repository for geographic data sources needed by the DoD. More bluntly, the briefing will focus on how to build the DDB.
Part of this meeting was briefings - I learned a lot. So did other panel members, this in spite of the fact that most had already sat on a six month panel on DDB in the spring/summer. Now, a large BAA (to fund a 100-150 person effort) is expected in June 1997, planned awards in January 1998 and first demos expected 18 months after, with an overall program life of 15 years.
My principal action items before the next meeting:
DDB II meeting #2 will be 8-9 April at SRI in Menlo Park.
For the record, there is no "organizational conflict of interest" if panel members serve and then respond to the BAA. This is true because panel members are only recommending an architecture and not helping to write the BAA solicitation, which is happening entirely within DARPA. The panel will report a week or more before the CBD announcement, that is, before Tom Burns puts pen to paper to write the BAA.
The Second DDB Panel is now funded and we can charge to it.
present at this meeting
Tom provided succinct overall vision foils:
Bob introduced the problem as "to build a computer model of the entire planet at finest grain anyone would ever want" - some tongue in cheek but not if you assume it was driven by a cost model.
Bob went over the
briefing of the First DDB Panel (large .ps file) completed last
Spring/Summer. The briefing has since been given to Admiral Danton (NIMA),
I*3 (in November, I heard it then), Naval Studies Board and others. The
initial architecture task has been funded ($500K) over three months and
that is what the task is of the Second DDB Panel (the current one),
to come up with an initial functional architecture and understand the Grand
Challenges it is meant to solve. Here are notes I took at the I*3 meeting
in November 1996.
Dynamic Database, Bob Tenney (Alphabase, email@example.com)
Concepts for a Dynamic Database, a study done last spring and summer, accessible from the ISO web site. Likely future DARPA program. Key people were Paul Maim, Bob olles, Dave DeWitt, Steve Flank, Scott Fouse, John Gilmore, Dave Gunning, John Leon, Dave Maier, John Lowrance, Tom Strat, Nik Subotic, Bob Tenney, Chair. This is not a traditional DBMS - it is much more. Data and change is everywhere on the net. This must not be a static thing that is build to solve one problem. This is a living map (one view). Dynamic DBMS (still DBMS centric) holds all these views. Sometimes you want resolution at 1' at some location. So fisheye views. Render 3D. Some information pre-assembled by agents, including people. Logical and physical road. Actual report is on ISO web page someplace. Multiple layers, scales, representations, avoid great gobs of imagery. Massively available imagery. Avoid bulk stores of data. Identify uninterpreted sections. Pedigrees to help explain what is going on. Provide data services to BADD and DMIF. DBMS systems to not do anything (actively). The thing that makes it useful is a set of services. Where do services come from (query, store, protect, …). Requirement is a service is compatible to a use domain to avoid o(N) vs. o(logN). Some operations are decompose queries and also assemble responses, local queries. Dynamic DBMS looks like another server on a bus. In the middle is a dynamic situation model of terrain, vehicles, causal relationships. Now shows a foil with lots of boxes and I/O connectors. Want to loosen up data push and pull. DDB stores terrain. DDB II does mediation too. Moving application into DBMS, expanding over time.
Technical issues: what goes into DDB is 3D models, terrains, entities, … into dynamic situation model. A lot of spatial stuff. Turns out you do not know anything but instead you have estimates. Must attach uncertainty estimates to everything. Computer metric for how much data they need for many purposes. 10 megabytes for a square kilometer. This is a living model, constantly updated from sensor data. Legion set of sources of commercial, DMA/NIMA, sensors. Big problem. Busy problem. Too big for one computer, so it is distributed. Mediation component: query processor, search engine, way to assemble responses, needs 3D data over time and space so very geospatial. Query: I want to put force here in this valley and what radar sites do I have to worry about. Need some way to publish products and change this daily without having to recompile the whole system.
Bob mentioned JTF data server and map server have been relatively successful as compared to web server and situation server built on it.
Bob mentioned two views of DDB: (a) DBMS at center of picture (b) for mediation only, no storage, go back to original sources for storage, just a cache, store rather than compute. We are to come up with a conceptual architecture (boxoloogy), functional in nature, not so much a physical architecture (specific numbers of replicas) or operational (in terms of military operations).
Mentioned Quick Start, a funded feasibility effort to put together an aggressive narrow path DDB demo based on some accessible and non classified Lansat and SAR data stored in an OODB and doing some simple change detection. Some pieces might be Wisconsin Paradise + BBN Mapserver, Global Infotek Harvest/Tsimmis + Paradise.
GCCS is the DISA Global Command and Control System. The Track Database described is operational and keeps track of sighting histories of physical objects like planes, ships, subs, tanks, not quite people, documents, … while trying to avoid tracking seagulls, waves, cars not of interest. Measurements being imprecise, there is uncertainty with each measurement, commonly kept as ellipses. Decisions aids help extrapolate where an entity was in the unobserved past. There might be 11,000 tracks in the Mediterranean, each may have 1000 observations (<lat, long, time, elint attributes, acoustic parameters, ...>), but elint (electronic intelligence) is getting better, so we can see more, so the DBMS size will increase. Update rate for sub is every two days but for plane is every few seconds.
Histories are maintained differently for different objects. May not keep every detection. Might keep other attributes like what ship class. Maneuvering Targets Statistical Tracker (MTST) reasons that entity is ship versus tank but not frigate versus carrier. There may be several tracks per entity since different radars might have different signatures. Tracks are given UIDs consisting of type + 9 digit counter; count is saved across crashes. If UID a123 turns out to be same as b456 then merge an one UID dominates. If analyst cannot tell if one or two entities, then keep as two entities (manual action).
The GCCS Track database is distributed -- each site holds the data they are interested in. Data is stored in flat files with most recent stored in RAM. Queries via scan. Ambiguities flagged (oh, it went behind a hill). Smart push-pull filters on push and pull sides of different DBMS copies route entities of interest to DBMS that want them. Filters can involve geography, entity type, treaty area, air picture, areas of interest. Negotiations between analysts and DBMSs, e.g., you must have missed a report .
There is a track owner and permissions are required for you to delete, update, correct, merge, ... a track I own. Updates are broadcasts. SOP (standard operating procedure) is requests via email. Not all nodes have TCP/IP. A ship (entity) has a current awareness zone. CINQPAC or CINQLNT carry certain entities (geographically). Avoiding seagulls and sailboats: somewhere someone is making a manual decision re radar blips, that is, there is a human filter on the sensor. Useful acronym DIW meaning dead in water.
Joint Task Force commander breaks up his problem via air, land, sea geosectors but can divide it other ways too. Information is rolled up to JTF commander. This is a matter of setting up a filter. Called Battlefield rolling. Databases are arranged hierarchically, not peer-peer though you can send obnotes this way, but most casting is up/down. COP = Common Operational Practice governs information flow. There is a Top COP. Don't think so much about local disk as local replication. DISN SIPRNET talks across to supported CINC, supporting CINCs in Europe, agencies, CJTF Atlantic, Coalition S America, NMCC SVCs.
Broadcast to child/parent is event by event and can be queried. Top COP has system configuration and pass through of updates. If jammed and you lose some updates, then go to parent node for sitrep.
Tracking Database is tied into MIDB (see below). Periodically the system must be cleaned up. In GCCS, if you delete it it is expunged. "Yesterday" is ambiguous with respect to rollback. Is "yesterday" what I now know about it or is it what I knew then. There is no time tagging. How much can we afford -- its a question of purpose. For instance, training is not the purpose; C4I is the mission. Knowing what'd going on now.
John Thomas was chief architect for NIMA but now Steve Carrol and Ron Burns (government) share that role. He reminded us he is not a NIMA spokesman. He provides system engineering support for the Central Imagery Office portion of NIMA. Note: Shel Sutton (MITRE), my co-chair on OMG Internet SIG, is a NIMA-CIO architect and chairs the Open GIS Consortium imaging working group as well as the OMG GIS SIG.
In the Gulf War, we needed imagery but did not get it. NIMA is the locus now, has the moral authority and money, around $2B per year. Contractors have too much and too little voice. Imagery problem is an information systems problem -- 10 gigabyte data flow. Cycle is: collect, distribute, exploit, store. All is push. In mid-80s all was film. Now collection is mostly digital but the other functions are still mostly film. Long term storage requirement in 100s of petabytes. Updates every day is terabytes/day, 1000-10,000 transactions/min, 3000 files per day. Pipes are still too small so analysts work on small image subsets. Also, old habits die hard. Want: repository integrated geospatially using best quality of service. Customer adds value -- trusted co-producers. Not using one coordinate system but many. USGS/DMA driving WSGS92 latest release of data. Also need common time reference. Need to catalog metadata. Mentions Federal Government Data C... FGDC Version 0.x.
New architecture is block-like, in a prototype stage, based on services. Want seamless grid and roaming-like dataflow. Note: I asked if they tracked Motorola Iridium, which does this for cellular -- they are aware. DAG Access to library recently won by Harris, building client + database access. Analysts want web-like response time. Architecture covers types of data, flows, customer profile, catalog server, and metadata server. The architecture is interface driven and an extension of the OMG OMA. Services are boundaries, parameterized with size, frequency, retention. An implementation must be compliant to an element boundary but can do 17 other things. Change in emphasis from what it is to what framework it is compliant to. Want ubiquity via a defined DII architecture. Working on prototyping a good version 0.x to evolve from.
Amy is MIDB program manager. Her predecessor was Bruce Thompson.
MIDB is acronym of acronym, never fully expanded, something like Modern Imaging Product Architecture DBMS. The predecessor to MIDB was text based, and Model 204-based.
Congress funds two separate areas: General Defense Intelligence and tactical operations. A big challenge is to cross these funding lines seamlessly.
MIDB is a relational (Sybase) database containing general defense intelligence (military strategic) information on the order of battle down to the unit level for active countries or the army or division level for other countries. MIDB is an all source DBMS and is used especially as a tactical feed, to populate and then feed to tactical C4I operations. The structure and most of the content is unclassified. Access is limited to those with a need to know, like intelligence analysts. MIDB supports a specific number of operational functions like planning, exploitation, … MIDB contains 95% of the general military intelligence data. Inputs come from SIGINT, HUMINT, MAGINT, MASINT (signal, human, ?, ? INTelligence).
MIDB contains a huge variety of data but not imagery or tactical information (like mission planning, another command does the latter) -- including air, ground, space, where are the sensors, who talks to who, airfield information, lines of communication, nuclear industry raw materials, processing, chemical industry, chem-bio industry, electrical industry, military production, who sells to who, medical, hospital, vets, military geography, orphanages, safe houses, population centers, growth rate, air field intelligence, lighting on runway, characteristics and performance, merchant ships of interest, strike assessment, length, width, elevation but not imagery. They do not duplicate NIMA. Not all queries are geospatial - might just use country code CC = Zaire. Contains no data on U.S. persons or places (by law). On the other hand, they can feed CAD tools to do 3D walk-throughs of buildings. Q: Is Zaire in there? A: cannot say. Provides metadata to access other DBMSs. Metadata includes how long is that field. There may be a lot or a little stored about eye color.
MIDB is populated from many source DBMSs and an army of intelligence analysts reading newspapers. Analysts not only read MIDB but also write to it so it is a central resource on their desktop, integrated into the way they work. There is a menu-driven GUI. Analysts typically want to get at any or all parts of the database. Still working to get tactical level to send information back up to national level. DoD data dictionary strategy is based on a common DoD schema, people push data up the hierarchy. MIDB has 27 main entities and 300 tables. There are many taps into data sources - many stored queries, graphs, timeline analyses, visual flows (bauxite mining, aluminum plant, airplane production). There are many flows: image - exploit - report - MIDB; image via analyst to MIDB, more.
Data has certainty factors, is entered by people, has data on every country, is in the gigabyte range, many operations are batch mode. MIDB is a heavily edited DBMS. MIDB represents coordinates in 1000ths of seconds (against future need) though users cannot tell if 1000 is accurate to 1 or 3 decimal places. Error ranges are kept as circles or ellipses, numeric ranges for plant capacity. There are logic edits to avoid typos but analyst can override or further refine. Location data small compared to mapping databases. Has very wide tables. Does not store polygons - another table with polygons. If it is known some entity is at location A or B then that is stored at A with confidence = tenuous, same with B. Annotations (relationships) on any field so Table is tied via a TIE field to other tables. Here is where you add extra-schema information like "its got cathedral ceilings." Double linked back from other table to implement bi-directional relationships.
MIDB is implemented with Sybase system 11. DeWitt says it does not do joins well, has tiny pages, is 2-3 years behind Oracle on handling spatial data, is limited to 16 constraints per table where Amy wishes she had 80, might not be a major DBMS player in 3 years. They have experimental ports to Oracle, not a huge problem. One reason for Sybase is the intel community uses it but the S&T? community uses Oracle. There are actually around 20 semi-consistent replicas of the MIDB database at different production centers located geographically. Exchange of updates is weekly outside your theatre but daily and hourly updates are possible. Every user is a producer. Someone owns the data but others can nominate changes based on observations like "I saw nine tanks and you list only six." This information is stored as unofficial until accepted by owner but the ID of the observer is also stored (server id, user id). There is also a history table.
MIDB has a legacy constraint to tie to data sources containing text so some of the design that might at first seem stupid is motivated by this requirement. There is one interface control document. MIDB 2.0 schema became available Feb96. It took 2-3 years to go from MIDB 1.0 to 2.0. Previously, the Air Order of Battle has 75 specialized products. MIDB makes schema changes once a year but more often if county boundaries change. Schema changes break some apps - traumatic. They try to anticipate future requirements up to three years out - for instance, adding satellite fields.
Users ping MIDB on timeliness, population and querying.
Amy mentioned MIG, Judy Albright (DISA) as R&D effort at tactical level, not natilal but occasionally national. MIDB is used to establish base data in the field. How to get data to field - straight data load or intermediary. MIDB format identical in each theater. If field will update MIDB in a disciplines way then MIDB can be the mother of all databases. Amy mentioned there are lots of political obstacles before we can have a global geospatial DDB. Commands have a dominant vote.
Sami recommends reading Advanced Battlespace Information Systems. Several thousand probes into DoD from outside each year, 80% benign, some targeting specific information. Coocoo's Egg, a novel, is worth reading regarding German hackers for hire. We know adversaries do traffic analysis of what is sent at what rate to whom as well as encryption attempts. Many problems:
Overall model is to do least cost-greatest damage attacks. Balance between prevention and detection. Talent pool of attackers better than inside.
Shows ISO architecture.
For your massive DBMS, how do you make it secure? Integrity on applications, signature on data, general authorization scheme (The Open Group's Adage, with ITO investment). Intel framework for API - CDSA. Track information assurance framework. JTF pedigree (source attributes, time attributes, …) to what level of granularity.
DII COE lists lots of software you have to play the game, evolving. That is the point of departure. Also, compliant not to today's COE 3.0 but for 4.0 or 5.0. Genesis has set the current standard.
Information architecture decouples application
Mel Wagner (firstname.lastname@example.org, 301-227-3470 /69 /14, ask for list of relevant documents) is JMTK Program Manager, working on part of the DII COE GCCS project that involves accessing data of interest from some data source and bringing it to the screen. Allan Doyle mentioned MCG&I as body overseeing JMTK. DII COE has eight levels of compliance. Give up your GIS system and use joint mapping toolkit. JMTK provides common view of battlespace, standardizes import, display, and manipulation of digital geospatial information, replaces legacy mapping functions for DII/COE Mission Applications. Web and etc. are emerging requirements.
Contributing components are
Q: Can you display 3D models and fly through a scene. A: Not now, NIMA sponsors stand alone, niche users. Upstream sophisticated requirements. They are a segment in the DII COE distribution. Uses UNIX and X-Windows, planning a move to NT within a year, DEC Alpha (?), IBM AIX (?).
Provides ways to control finished image displays (roam, zoom, intensity, …). METOC (meteorological and oceanography). Must accommodate OO over time. Tailor subsets and shrink to task. Migrate to COTS as it meets COE requirements. COE development strategies with incremental deliveries. JMTK V3.0 currently available.
COP is Common Operational Picture. Supports fusion of results of all sorts of apps: JOPES planning, internal community, air tasking order, transportation and logistics, medical, … CJCSI 3150.08 dated 1 Jan 96 (Common Joint Chiefs of Staff Instruction). Puzzle is there are stovepipes being brought together in system of systems, no longer called that. A puzzle still is the common semantics among these fused systems. Currently, does not interact with commercial ArcInfo, Terrain, … visualization tools. Allow processing inside remote system. SOA is 1 meter accuracy target.
--------------------------------------------- JMTK API ---------------------------------------------
(Visual Functions API talks to Analytical Functions API)
Spatial DBMS API whose implementation
talks via common file formats and read modules to foreign data sources
[NIMA Data, Analysis results, and Other MCG&I Data]
Products described -- several vector products available by CD. Still doing coverage of world. $5000/map sheet, 9000 Digital Nautical Charts. Many commercial sources. How do you relate to Navy. Also, Gridded DTED many-levels, digital bathy at multi-levels, raster products.
Short list of functions. Convert units, … terrain masking, point-to-point line of sight, store metadata (who produced, when, accuracy, security, …), ids of objects, … Migration plans for Navy, Army, Air Force. Identifying critical missing capabilities: overlay management different, declutter, mil std symbology ….
NIMA is suffering some current integration pains as CIO, DIA come together.
We had a round robin discussion of areas of concern.
Next meeting 8-9 April at SRI in Menlo Park. Red Cottage Inn, $55/nite.
Action Items (delivery is PowerPoint 4.0 as exchange format due to Mac users among us):
People to invite to future meetings
Some things for me to think about