Dynamic Database Panel II
Notes from Meeting #1

Craig Thompson, OBJS
ISX Offices, Washington D.C.
12-13 March 1997

[These are not minutes of the meeting - they are my notes - so they do not provide a complete record of what we covered at the meeting.]

Agenda

Executive Summary
Admin Notes
Panelists
DDB Objectives, Tom Burns, DARPA
First Dynamic Database Panel Briefing, Bob Tenney, Alphatech
GCCS Track Database, Duane Presler
NIMA Architecture, John Thomas, Booz-Allen
MIDB, Amy Dall, DIA (read this)
Security Issues, Sami Saydjari, DARPA
Joint Mapping Toolkit, Mel Wagner, NIMA
Discussion on Nightmares
Wrap up for this meeting
Action Items for me

Executive Summary

This was meeting #1 of four meetings. The end result of the panel will be a briefing presented to Tom Burns (DARPA/ISO). The briefing aims to provide a high level architecture for an envisioned Dynamic Database (DDB) that will provide a central information repository for geographic data sources needed by the DoD. More bluntly, the briefing will focus on how to build the DDB.

Part of this meeting was briefings - I learned a lot. So did other panel members, this in spite of the fact that most had already sat on a six month panel on DDB in the spring/summer. Now, a large BAA (to fund a 100-150 person effort) is expected in June 1997, planned awards in January 1998 and first demos expected 18 months after, with an overall program life of 15 years.

My principal action items before the next meeting:

service-level architecture and work with Scott Fouse and Allan Doyle on distribution model
scenario showing this off
grand challenges in this area

DDB II meeting #2 will be 8-9 April at SRI in Menlo Park.

Admin Notes

For the record, there is no "organizational conflict of interest" if panel members serve and then respond to the BAA. This is true because panel members are only recommending an architecture and not helping to write the BAA solicitation, which is happening entirely within DARPA. The panel will report a week or more before the CBD announcement, that is, before Tom Burns puts pen to paper to write the BAA.

The Second DDB Panel is now funded and we can charge to it.

Panelists

present at this meeting

Bob Tenney (Alphatech, Boston, 100 people, 20 year old company) - 617-273-3388 - tenney@alphatech.com - panel chair for spring/summer and current DDB panel
Tom Burns (DARPA/ISO DDB) - 703-696-7441 - tjburns@darpa.mil - was Wright Labs, Air Force, radar background like Larry Lynn, Director of DARPA. Will be program manager of entire DDB as well as over sensor portion.
Dave Gunning (DARPA/ISO I*3 and HPKB) - 703-696-2218 - dgunning@darpa.mil - our current program manager along with Kevin Mills on the "Scaling OSAs to the Internet" I*3 contract. Will be in charge of information mediation and the actual database portion.
Bill Fabian (SAIC) - Bill_Fabian@cpqm.saic.com - 703-558-2741 - supporting Dave Signori's overall DARPA ISO Reference Architecture
Dave DeWitt (U Wisconsin) - 608-263-5489 - dewitt@cs.wisc.edu - parallel, scaleable DBMS storage servers (scaleable by incrementally adding new processors, memories, disks), Paradise geographic DBMS built on SHORE is in last 2 weeks of funding. Mentioned Microsoft is selling Russian supplied 3M data in a terabyte database. Mentioned Jim Grey (now Microsoft) is visiting him in early April so perhaps MS will pick up Shore/Paradise.
Craig Thompson (OBJS) - 972-379-3320 - thompson@objs.com - brought in for OO architecture perspective
Allan Doyle (BBN) - 617-873-3398 - adoyle@bbn.com - JTF Mapserver architect, participates in Open GIS Consortium
Scott Fouse (ISX) - 818-735-6804 - fouse@isx.com - integrator on I*3 working on mediation, also works with Bob Neches (DARPA) on planning, alternate courses of action (today's state of practice: single plan), working with Dave Gunning on a future program called Command Post of the Future
Alok Nigam (Global InfoTek) - 703-938-0400 - nigam@globalinfotek.com - I*3 integration contractor working on Harvest/Tsimmis integration and porting to NT with IDL wrappers, geospatial in BOCA (DARPA program)
Bob Bolles (SRI AI Center) - 415-859-4620 - bolles@ai.sri.com - EO Models, image understanding, temporal coherence in video like sources, site models now months future 72 hours
Tom Gunn - (E-Systems-Raytheon, Greenville, TX) - 903-457-6621 - gunnta@gvl.esys.com DMIF representative, contract not quite negotiated, expects changes at kickoff meeting in April in LA - entity component of dynamic database, very experienced in field domain expertise, helicopters command background. System integrator. Working with Eclectic, a small company in Plano. Mentioned product finisher? Looking at using USC/ISI SIMS and Illustry Universal Server.
Nik Subotic (ERIM International) - 313-994-1200 - subotic@erim.org - SAR/EO Models - signal to feature data
Frank Huhn (SAIC) - 703-558-2755 - Frank_Huhn@cpqm.saic.com
John Leon (SAIC/I*3) - 703-558-2746 - John_Leon@cpqm.saic.com
Victor Tom (AAEC) - 617-890-4200 - tom@aaec.com - SAR phenomena

not present at this meeting

Steven Flank (DARPA/ISO DMIF) - has a NL message understanding background, will own some part of DDB.
others at DARPA working on related efforts: John Gilmore (SAIP), Bruce Johnson (MTE), F.T. Case (AJP), Bert Hui (Magic), Anis Husain, George Lukes, Gernot Pomrenke, Howie Shrobe, Tom Strat
Dave Maier (OGC) - included for database perspective
David Wells (OBJS) - included for OO architecture perspective

DDB Objectives, Tom Burns, DARPA

Tom provided succinct overall vision foils:

create and model the real world - shows a geospatial DBMS at the center of a broad array of functions.
searching for the right words since terms like

model are overloaded and have specialized meanings to communities like DMSO (Distributed Modeling and Simulation).
common, unified, consistent (my preferred term), single image, comprehensive, … may not quite provide the right connotations

the architecture must be scaleable
goal of panel is functional architecture (gazintas/gazoutas)
predictability, from change detection
mentioned I*3 mediation words: ontologies, change detection, rules and inference, explanation, pedigree, roll back, mapping, rapidly construct and update
avoid dynamic unstable, aggregated chaos

First Dynamic Database Panel Briefing, Bob Tenney, Alphatech

Bob introduced the problem as "to build a computer model of the entire planet at finest grain anyone would ever want" - some tongue in cheek but not if you assume it was driven by a cost model.

Bob went over the briefing of the First DDB Panel (large .ps file) completed last Spring/Summer. The briefing has since been given to Admiral Danton (NIMA), I*3 (in November, I heard it then), Naval Studies Board and others. The initial architecture task has been funded ($500K) over three months and that is what the task is of the Second DDB Panel (the current one), to come up with an initial functional architecture and understand the Grand Challenges it is meant to solve. Here are notes I took at the I*3 meeting in November 1996.

Dynamic Database, Bob Tenney (Alphabase, tenney@alphabase.com)
Concepts for a Dynamic Database, a study done last spring and summer, accessible from the ISO web site. Likely future DARPA program. Key people were Paul Maim, Bob olles, Dave DeWitt, Steve Flank, Scott Fouse, John Gilmore, Dave Gunning, John Leon, Dave Maier, John Lowrance, Tom Strat, Nik Subotic, Bob Tenney, Chair. This is not a traditional DBMS - it is much more. Data and change is everywhere on the net. This must not be a static thing that is build to solve one problem. This is a living map (one view). Dynamic DBMS (still DBMS centric) holds all these views. Sometimes you want resolution at 1' at some location. So fisheye views. Render 3D. Some information pre-assembled by agents, including people. Logical and physical road. Actual report is on ISO web page someplace. Multiple layers, scales, representations, avoid great gobs of imagery. Massively available imagery. Avoid bulk stores of data. Identify uninterpreted sections. Pedigrees to help explain what is going on. Provide data services to BADD and DMIF. DBMS systems to not do anything (actively). The thing that makes it useful is a set of services. Where do services come from (query, store, protect, …). Requirement is a service is compatible to a use domain to avoid o(N) vs. o(logN). Some operations are decompose queries and also assemble responses, local queries. Dynamic DBMS looks like another server on a bus. In the middle is a dynamic situation model of terrain, vehicles, causal relationships. Now shows a foil with lots of boxes and I/O connectors. Want to loosen up data push and pull. DDB stores terrain. DDB II does mediation too. Moving application into DBMS, expanding over time.
Technical issues: what goes into DDB is 3D models, terrains, entities, … into dynamic situation model. A lot of spatial stuff. Turns out you do not know anything but instead you have estimates. Must attach uncertainty estimates to everything. Computer metric for how much data they need for many purposes. 10 megabytes for a square kilometer. This is a living model, constantly updated from sensor data. Legion set of sources of commercial, DMA/NIMA, sensors. Big problem. Busy problem. Too big for one computer, so it is distributed. Mediation component: query processor, search engine, way to assemble responses, needs 3D data over time and space so very geospatial. Query: I want to put force here in this valley and what radar sites do I have to worry about. Need some way to publish products and change this daily without having to recompile the whole system.

Bob mentioned JTF data server and map server have been relatively successful as compared to web server and situation server built on it.

Bob mentioned two views of DDB: (a) DBMS at center of picture (b) for mediation only, no storage, go back to original sources for storage, just a cache, store rather than compute. We are to come up with a conceptual architecture (boxoloogy), functional in nature, not so much a physical architecture (specific numbers of replicas) or operational (in terms of military operations).

Mentioned Quick Start, a funded feasibility effort to put together an aggressive narrow path DDB demo based on some accessible and non classified Lansat and SAR data stored in an OODB and doing some simple change detection. Some pieces might be Wisconsin Paradise + BBN Mapserver, Global Infotek Harvest/Tsimmis + Paradise.

GCCS Track Database, Duane Presler

GCCS is the DISA Global Command and Control System. The Track Database described is operational and keeps track of sighting histories of physical objects like planes, ships, subs, tanks, not quite people, documents, … while trying to avoid tracking seagulls, waves, cars not of interest. Measurements being imprecise, there is uncertainty with each measurement, commonly kept as ellipses. Decisions aids help extrapolate where an entity was in the unobserved past. There might be 11,000 tracks in the Mediterranean, each may have 1000 observations (<lat, long, time, elint attributes, acoustic parameters, ...>), but elint (electronic intelligence) is getting better, so we can see more, so the DBMS size will increase. Update rate for sub is every two days but for plane is every few seconds.

Histories are maintained differently for different objects. May not keep every detection. Might keep other attributes like what ship class. Maneuvering Targets Statistical Tracker (MTST) reasons that entity is ship versus tank but not frigate versus carrier. There may be several tracks per entity since different radars might have different signatures. Tracks are given UIDs consisting of type + 9 digit counter; count is saved across crashes. If UID a123 turns out to be same as b456 then merge an one UID dominates. If analyst cannot tell if one or two entities, then keep as two entities (manual action).

The GCCS Track database is distributed -- each site holds the data they are interested in. Data is stored in flat files with most recent stored in RAM. Queries via scan. Ambiguities flagged (oh, it went behind a hill). Smart push-pull filters on push and pull sides of different DBMS copies route entities of interest to DBMS that want them. Filters can involve geography, entity type, treaty area, air picture, areas of interest. Negotiations between analysts and DBMSs, e.g., you must have missed a report .

There is a track owner and permissions are required for you to delete, update, correct, merge, ... a track I own. Updates are broadcasts. SOP (standard operating procedure) is requests via email. Not all nodes have TCP/IP. A ship (entity) has a current awareness zone. CINQPAC or CINQLNT carry certain entities (geographically). Avoiding seagulls and sailboats: somewhere someone is making a manual decision re radar blips, that is, there is a human filter on the sensor. Useful acronym DIW meaning dead in water.

Joint Task Force commander breaks up his problem via air, land, sea geosectors but can divide it other ways too. Information is rolled up to JTF commander. This is a matter of setting up a filter. Called Battlefield rolling. Databases are arranged hierarchically, not peer-peer though you can send obnotes this way, but most casting is up/down. COP = Common Operational Practice governs information flow. There is a Top COP. Don't think so much about local disk as local replication. DISN SIPRNET talks across to supported CINC, supporting CINCs in Europe, agencies, CJTF Atlantic, Coalition S America, NMCC SVCs.

Broadcast to child/parent is event by event and can be queried. Top COP has system configuration and pass through of updates. If jammed and you lose some updates, then go to parent node for sitrep.

Tracking Database is tied into MIDB (see below). Periodically the system must be cleaned up. In GCCS, if you delete it it is expunged. "Yesterday" is ambiguous with respect to rollback. Is "yesterday" what I now know about it or is it what I knew then. There is no time tagging. How much can we afford -- its a question of purpose. For instance, training is not the purpose; C4I is the mission. Knowing what'd going on now.

NIMA Architecture, John Thomas, Booz-Allen

John Thomas was chief architect for NIMA but now Steve Carrol and Ron Burns (government) share that role. He reminded us he is not a NIMA spokesman. He provides system engineering support for the Central Imagery Office portion of NIMA. Note: Shel Sutton (MITRE), my co-chair on OMG Internet SIG, is a NIMA-CIO architect and chairs the Open GIS Consortium imaging working group as well as the OMG GIS SIG.

In the Gulf War, we needed imagery but did not get it. NIMA is the locus now, has the moral authority and money, around $2B per year. Contractors have too much and too little voice. Imagery problem is an information systems problem -- 10 gigabyte data flow. Cycle is: collect, distribute, exploit, store. All is push. In mid-80s all was film. Now collection is mostly digital but the other functions are still mostly film. Long term storage requirement in 100s of petabytes. Updates every day is terabytes/day, 1000-10,000 transactions/min, 3000 files per day. Pipes are still too small so analysts work on small image subsets. Also, old habits die hard. Want: repository integrated geospatially using best quality of service. Customer adds value -- trusted co-producers. Not using one coordinate system but many. USGS/DMA driving WSGS92 latest release of data. Also need common time reference. Need to catalog metadata. Mentions Federal Government Data C... FGDC Version 0.x.

Business processes

acquiring and brokering information - future RFP for $60K to populate some area at 1M resolution.
info production - maps at 1:100,000, 1:250,000. Korea and US mapped but not world wide.
information services - information access and delivery based on catalogs information apps

New architecture is block-like, in a prototype stage, based on services. Want seamless grid and roaming-like dataflow. Note: I asked if they tracked Motorola Iridium, which does this for cellular -- they are aware. DAG Access to library recently won by Harris, building client + database access. Analysts want web-like response time. Architecture covers types of data, flows, customer profile, catalog server, and metadata server. The architecture is interface driven and an extension of the OMG OMA. Services are boundaries, parameterized with size, frequency, retention. An implementation must be compliant to an element boundary but can do 17 other things. Change in emphasis from what it is to what framework it is compliant to. Want ubiquity via a defined DII architecture. Working on prototyping a good version 0.x to evolve from.

MIDB, Amy Dall, DIA

Amy is MIDB program manager. Her predecessor was Bruce Thompson.

MIDB is acronym of acronym, never fully expanded, something like Modern Imaging Product Architecture DBMS. The predecessor to MIDB was text based, and Model 204-based.

Congress funds two separate areas: General Defense Intelligence and tactical operations. A big challenge is to cross these funding lines seamlessly.

MIDB is a relational (Sybase) database containing general defense intelligence (military strategic) information on the order of battle down to the unit level for active countries or the army or division level for other countries. MIDB is an all source DBMS and is used especially as a tactical feed, to populate and then feed to tactical C4I operations. The structure and most of the content is unclassified. Access is limited to those with a need to know, like intelligence analysts. MIDB supports a specific number of operational functions like planning, exploitation, … MIDB contains 95% of the general military intelligence data. Inputs come from SIGINT, HUMINT, MAGINT, MASINT (signal, human, ?, ? INTelligence).

MIDB contains a huge variety of data but not imagery or tactical information (like mission planning, another command does the latter) -- including air, ground, space, where are the sensors, who talks to who, airfield information, lines of communication, nuclear industry raw materials, processing, chemical industry, chem-bio industry, electrical industry, military production, who sells to who, medical, hospital, vets, military geography, orphanages, safe houses, population centers, growth rate, air field intelligence, lighting on runway, characteristics and performance, merchant ships of interest, strike assessment, length, width, elevation but not imagery. They do not duplicate NIMA. Not all queries are geospatial - might just use country code CC = Zaire. Contains no data on U.S. persons or places (by law). On the other hand, they can feed CAD tools to do 3D walk-throughs of buildings. Q: Is Zaire in there? A: cannot say. Provides metadata to access other DBMSs. Metadata includes how long is that field. There may be a lot or a little stored about eye color.

MIDB is populated from many source DBMSs and an army of intelligence analysts reading newspapers. Analysts not only read MIDB but also write to it so it is a central resource on their desktop, integrated into the way they work. There is a menu-driven GUI. Analysts typically want to get at any or all parts of the database. Still working to get tactical level to send information back up to national level. DoD data dictionary strategy is based on a common DoD schema, people push data up the hierarchy. MIDB has 27 main entities and 300 tables. There are many taps into data sources - many stored queries, graphs, timeline analyses, visual flows (bauxite mining, aluminum plant, airplane production). There are many flows: image - exploit - report - MIDB; image via analyst to MIDB, more.

Data has certainty factors, is entered by people, has data on every country, is in the gigabyte range, many operations are batch mode. MIDB is a heavily edited DBMS. MIDB represents coordinates in 1000ths of seconds (against future need) though users cannot tell if 1000 is accurate to 1 or 3 decimal places. Error ranges are kept as circles or ellipses, numeric ranges for plant capacity. There are logic edits to avoid typos but analyst can override or further refine. Location data small compared to mapping databases. Has very wide tables. Does not store polygons - another table with polygons. If it is known some entity is at location A or B then that is stored at A with confidence = tenuous, same with B. Annotations (relationships) on any field so Table is tied via a TIE field to other tables. Here is where you add extra-schema information like "its got cathedral ceilings." Double linked back from other table to implement bi-directional relationships.

MIDB is implemented with Sybase system 11. DeWitt says it does not do joins well, has tiny pages, is 2-3 years behind Oracle on handling spatial data, is limited to 16 constraints per table where Amy wishes she had 80, might not be a major DBMS player in 3 years. They have experimental ports to Oracle, not a huge problem. One reason for Sybase is the intel community uses it but the S&T? community uses Oracle. There are actually around 20 semi-consistent replicas of the MIDB database at different production centers located geographically. Exchange of updates is weekly outside your theatre but daily and hourly updates are possible. Every user is a producer. Someone owns the data but others can nominate changes based on observations like "I saw nine tanks and you list only six." This information is stored as unofficial until accepted by owner but the ID of the observer is also stored (server id, user id). There is also a history table.

MIDB has a legacy constraint to tie to data sources containing text so some of the design that might at first seem stupid is motivated by this requirement. There is one interface control document. MIDB 2.0 schema became available Feb96. It took 2-3 years to go from MIDB 1.0 to 2.0. Previously, the Air Order of Battle has 75 specialized products. MIDB makes schema changes once a year but more often if county boundaries change. Schema changes break some apps - traumatic. They try to anticipate future requirements up to three years out - for instance, adding satellite fields.

Users ping MIDB on timeliness, population and querying.

Amy mentioned MIG, Judy Albright (DISA) as R&D effort at tactical level, not natilal but occasionally national. MIDB is used to establish base data in the field. How to get data to field - straight data load or intermediary. MIDB format identical in each theater. If field will update MIDB in a disciplines way then MIDB can be the mother of all databases. Amy mentioned there are lots of political obstacles before we can have a global geospatial DDB. Commands have a dominant vote.

Security Issues, Sami Saydjari, DARPA

Sami recommends reading Advanced Battlespace Information Systems. Several thousand probes into DoD from outside each year, 80% benign, some targeting specific information. Coocoo's Egg, a novel, is worth reading regarding German hackers for hire. We know adversaries do traffic analysis of what is sent at what rate to whom as well as encryption attempts. Many problems:

Confidentiality - sensitive but unclassified. NIPRnet is Internet and router change can be re-vectored. The inference problem via fuzzing results, polyinstantiating (coherent lie).
Releasability - need to auto release data to coalition forces - label info at source and use guard. Source Trojan Horse can mislabel, can digitally sign. Happens at firewalls, guards, and applications. Signature implies infrastructure in place so hard to deploy. Only imaging, not logistics or battlefield is doing labeling
Data integrity - battlefield picture - invert red and blue forces. Bomb this force message changed to bomb friendly force. Change integrity in subtle ways to do damage. Fort Franklin exercise was to watch keystrokes. DoD is crunchy on the outside and chewy on the inside. 90% inside is COTS through distribution channel. 80% of Windows 95 written in India. Indian programmers 5X as productive and paid 1/5 as much. All you need is one person weak point.
Availability - flood Internet -> lose sensor and motor control. Highly classified systems are using public infrastructure, not just telephone but also data connections. Target and take out a key switches and router, hack in. Pre-placed virus in war - many images. There is no priority system on Internet. Quorum program in DARPA addresses this. (I mentioned van Jacobson's work at Berkeley on class-based routers). Active networks - packets reprogram routers.
Authentication - spoof Presidential messages, ordinance targeting. The data you are using is from a trusted source. Users with too many tokens. Want to go to a common authentication scheme. DB/OS/server/… authentication. Secret key distribution is a vulnerability. All a matter of increasing risk. Goal is to make it so expensive you have to buy the human.
Authorization - attacker switches mode so simulation reality. Insure DBMS/data space gives access only to authorized.
Close-in attack - replace hardware and exploit emanations
Subverted software - Java, agents, attach, distribution channel - attachments activated when you click on them.
System engineering - attack weak links. Login, cleartext, naming, robust hardware crypto, weak apps, untested OS, key generation, routing, audit review, insider. Fix the weaker links from a system perspective. Another view is to attack a function, like logistics which is less sexy than the more highly protected command and control. Information assurance looks across the board at all the slats in the picket fence.

Overall model is to do least cost-greatest damage attacks. Balance between prevention and detection. Talent pool of attackers better than inside.

Solution space:

don't wait for High Trust Workstations - compartmentalized - not spoofable - not clear you can meet the criteria. Designed for standalone systems. But DoD will not acquire Trusted systems. Policies so complex to administer that it was less usable
boundary protection (firewalls) layered with end-systems
detection via specific or via statistical behavior patterns - false negative rate problem
Satan - doorknob turning programs on Internet to generate canonical attacks.
defend weakest points of vulnerability
semi-transparent security
highly leverage commercial trends - past was lots of GOTS, now pressure is more COTS. You lose on GOTS. So transfer to DoD and Commercial. Modular open APIs.

Shows ISO architecture.

For your massive DBMS, how do you make it secure? Integrity on applications, signature on data, general authorization scheme (The Open Group's Adage, with ITO investment). Intel framework for API - CDSA. Track information assurance framework. JTF pedigree (source attributes, time attributes, …) to what level of granularity.

DII COE lists lots of software you have to play the game, evolving. That is the point of departure. Also, compliant not to today's COE 3.0 but for 4.0 or 5.0. Genesis has set the current standard.

Information architecture decouples application

Joint Mapping Toolkit, Mel Wagner, NIMA

Mel Wagner (jmtk@nima.mil, 301-227-3470 /69 /14, ask for list of relevant documents) is JMTK Program Manager, working on part of the DII COE GCCS project that involves accessing data of interest from some data source and bringing it to the screen. Allan Doyle mentioned MCG&I as body overseeing JMTK. DII COE has eight levels of compliance. Give up your GIS system and use joint mapping toolkit. JMTK provides common view of battlespace, standardizes import, display, and manipulation of digital geospatial information, replaces legacy mapping functions for DII/COE Mission Applications. Web and etc. are emerging requirements.

Contributing components are

spatial database management - CMTK
visual display - CHART
analysis - TEM

Q: Can you display 3D models and fly through a scene. A: Not now, NIMA sponsors stand alone, niche users. Upstream sophisticated requirements. They are a segment in the DII COE distribution. Uses UNIX and X-Windows, planning a move to NT within a year, DEC Alpha (?), IBM AIX (?).

Provides ways to control finished image displays (roam, zoom, intensity, …). METOC (meteorological and oceanography). Must accommodate OO over time. Tailor subsets and shrink to task. Migrate to COTS as it meets COE requirements. COE development strategies with incremental deliveries. JMTK V3.0 currently available.

COP is Common Operational Picture. Supports fusion of results of all sorts of apps: JOPES planning, internal community, air tasking order, transportation and logistics, medical, … CJCSI 3150.08 dated 1 Jan 96 (Common Joint Chiefs of Staff Instruction). Puzzle is there are stovepipes being brought together in system of systems, no longer called that. A puzzle still is the common semantics among these fused systems. Currently, does not interact with commercial ArcInfo, Terrain, … visualization tools. Allow processing inside remote system. SOA is 1 meter accuracy target.

JMTK Architecture:

Mission Apps

talk to

--------------------------------------------- JMTK API ---------------------------------------------

(Visual Functions API talks to Analytical Functions API)

talk to

Spatial DBMS API whose implementation

----------------------------------------------------------------------------------------------------------

talks via common file formats and read modules to foreign data sources

[NIMA Data, Analysis results, and Other MCG&I Data]

Products described -- several vector products available by CD. Still doing coverage of world. $5000/map sheet, 9000 Digital Nautical Charts. Many commercial sources. How do you relate to Navy. Also, Gridded DTED many-levels, digital bathy at multi-levels, raster products.

Short list of functions. Convert units, … terrain masking, point-to-point line of sight, store metadata (who produced, when, accuracy, security, …), ids of objects, … Migration plans for Navy, Army, Air Force. Identifying critical missing capabilities: overlay management different, declutter, mil std symbology ….

NIMA is suffering some current integration pains as CIO, DIA come together.

Discussion on Nightmares

We had a round robin discussion of areas of concern.

Miscellaneous comments

self aware routines - these know what the usual or statistical situation is, can be semantically based. Is an inconsistency tactically significant or a blunder.
Art of War has not changed so much but tools and techniques have. Consider communicating with adversaries deliberately.
Percolate and unstable. Dataflow problem, rhythm. Currently MIDB is trusted system high. In field is SIPRNet data already passed through, on a system high network. Multilevel packets over same comm. Multiple paths to wall and encrypted outside that. Lowest risk affordable.
Security must be on briefing slides as a collection of security services. Provisioning for attribute-based security, leveraging security services.

Nik: model is dynamic in many ways - which algorithms to use - physical features derived from signals. Long pole in tent is for human image analysts to interpret scenes. If you change classification algorithm, how do you reach back? Some classification algorithms are hardwired. Hard to train classifier for ground troops. How do I propagate down so I know to look for a feature? Multiple algorithms trying to estimate same information (from different data sets and from different algorithms). Classifier manager (sort of QoS issue). What does Fractal Dimension mean for algorithm A and B and tomorrow Extended Fractal Dimension.
Bob: may not be able to view the world as ground truth. Given the task and purpose, people describe the world in many ways. Yields different segmentation of data for different use. Can I define single spots in many dimensions.
Tenney: are we on slippery slope to redefine DoD information systems. Craig: no, but we must fit into targeted infrastructure
Data reduction versus expansion
data size: 4 terabytes collected vs. 30 processed in the form of products
Tom: mentions queries
at rates these are coming down, the steps must all be automatic for registration of images, change detection, capture ranges,
start up problem to know about where capture ranges are, then separate update problem. Automation.
Scott: sheer dimensionality is huge. 6 weeks for problems 10*3 smaller. Amount of flexibility is scary. Where do you poke at it to get it to represent ground truth. Corruption of data. Recovery from disk failure to break in. For this to succeed you need program buy-in for BADD, external agencies, DMIF.
DeWitt: still worried about volumes, NIMA numbers are big. Terabytes still on disk but petabytes go to tape. Biggest today is 10 terabytes on line. Data warehousing is gigabytes. Tape means slow. how many algorithms are parallel (DeWitt). Automating the workflow. NIMA keeps huge amount of imagery on film (back 30 years) and much on line. Pure OODB-based are toast. Object-relational will make it; IBM is far ahead. DDB is a good name. When it goes to tape seek becomes expensive and access kills - not finding point on tape but getting to that position. How fast do you need to get the data - might be 100s of seconds access. Tape to disk is only 1:4 in size. Tapes easier to put on shelf. Know how to do 2D and less 2 ½ to 3D to 4D.
Craig:

architecture modularity - how to keep system modular, scaleable, extensible and evolutionary, survivable, not brittle, composable, reliable, how to add functions to systems, interoperability, recursively decomposable & need good interface or service definitions. Push functions inside the database.

Optimize for one thing kills for something else. No single optimization since so many producer-consumers
good news is we will evolve into this
fitting workflow, security, … into the flow space.
shape of architecture - presentation, middleware services, storage

distribution and information flow by need

cost model of what to collect, variability
extensible schemas and targeted data so right timely detail migrates to point of need. - accommodating value-added data, public data sources feedins and back out, potable water instructions for survival
distribution, migration, union, compression, replication, caching, data flow, workflow, storage hierarchy, place the function and data in the right places in the needed forms
QoS view, control structure. Why generate all features all the time and generate some parts as needed. When and where do algorithms get run?

legacy environment model (system of systems, built bottom up)

political
devil is in details, timelines, commercial x NIMA x BADD timelines - operational evolution
preposition data to local database
there is a lot of history of what users want, part of history is based on what is know to be available, want to know more info than can be had. Power lines in Europe except helicopters, else lose more helicopters to power lines than to the enemy. If I can make it available. How much imagery do I need to collect, make data available to helicopter.
brittle flow of control built into organizations
fixed data gathering rates
sizing analysis on

disk space
catalogs of kinds of spatial data, sources, representation formats, …

functions mapped to systems

scope

to what extent is the DDB the mother of all DBMSs (union of all DBMSs)?
to what extent are we dealing with unification vs. accommodating heterogeneity?
is analysis closer to data collection or data dissemination?
accurate representation of battlespace, past and future

do we really want this if it will be expensive?
do we mean thematic data as well as GIS, e.g., all data?
is DDB a MIDB replacement or augmentation?

thousandth of arc is 3 cm. Targeting wants to be very precise. If they can report it, then represent it. How to combine if some items registered to a meter and others to a mile. Most automatic algorithms run on different levels of granularity.
Tom

scary, lots going on, this program must go beyond state of practice three years out

Victor - worried about what Nik is worried about

robustness of feature abstraction - need to look in the right places to do the image analysis. Can overanalyze uninteresting parts of images. Can find computational load on imagery.

John - don't worry about SAR, just ask SAIP (not the ATD, the long-term vision) for it in the form we want it. viewing DDB as repository for stuff. Need to define architectural boundaries.
Frank - not a really big box but a control of what you save, where you put it. Not just DBMS but many databases at many levels

Wrap up for this meeting

Next meeting 8-9 April at SRI in Menlo Park. Red Cottage Inn, $55/nite.

Action Items (delivery is PowerPoint 4.0 as exchange format due to Mac users among us):

Grand Challenges Technical Goals (long poles in tent) - how we distinguish ourselves.

updated every minute
updating millions of objects
mappers/seconds, world mappers/years --> world/seconds

Distributed email and sidebar meetings. Over the next few weeks, put together blocks of overall architecture.

Nik - feature definition, representation, volatility, algorithms, interpretation
BobB: registration, change analysis, refinement
Aaron: 2.5, 3.0, 4.0 D
DaveD: throughput requirements, hardware implications, what does NIMA mean by petabytes-is that moved a day. Geospatial IPT. How much of the DDB is already done in NIMA. $250M for building petabyte warehouse, Harris DAGS (sp?).
DaveM: support services - schema management, bulk data, archive management
Allan: xOTS components
Craig: service-oriented architecture, interfaces, messages, service composition, specific services, schema, indexing, query access, distribution, info flow, positioning services
Scott: information logistics, info flow management - pre-storing, forward positioning, information valuation that is context driven, security contouring (what ought to be sensitive)
John L: consistent evidence propagation, consistency management - don't try to resolve all ambiguities when you first see them. Box for this. Common operational view is red herring since there are alternatives in the model.

Case studies - scenarios - user perspective - representative tasks - end-to-end functions enabled with today or tomorrow technology

Nik - SAR feature-level change detection, targets and terrain on feature level
BobB - IFSAR terrain updates
Allan - Map production, visualize model, rendering into 2D, 3D, 4D, visualization and publishing
Scott - drill down, security (JFACC scenario), mediation, schema and object-relational mapping
Craig - cross node consistency management, master-clone object model
DaveD - geospatial/temporal queries - pose a test set of queries (What can I see from top of tank in three hours).
Bob T. - distributed

Architectures

data model maps and geospatial products feedback to data or model
consistency management is Scott and JohnL

Commentary

segment programmatic view and technical view. Services inside the shell vs. outside. Not just definitions of services but boundaries. Strong interface definition among functional modules.
sliding too much in direction of building general purpose thing - we want to model the battlespace. Suggestion is to do this as implementation choice, not architecture choice.
avoid planning Mafia.
comment: JTF is beautiful - just detailed enough to capture direction but not detailed enough to be ugly in the detail

Tom's remarks

We are dancing around the edge, the core technology is a model efficient in representation, prediction, and updatable. Many of us focused too much on Physical Architectures - avoid getting wrapped around axle.
Thank you. This will move DoD into future - this is next break-through procedure.
If you need meeting with Government person then contact Tom.

People to invite to future meetings

Carrico (schema)
Lucas - hardware
Heller - populating spatial data resources
Buehler - OGC
Kuhl - HLA
GALE
SHADE
ABIS
LeMoine - NASA Goddard
Elfinstone - NIMA, Bolles knows

Some things for me to think about

understanding DDB requirements, scope, architecture

presentation layer - visualization
middleware services layer
consistent, shared schema/ontology layer
GIS domain modeling and operators - compute line of sight
storage layer

services architecture

query server
annotation server
metadata fusion manager

ISO architecture
commercial awareness, standards awareness, web awareness

Oracle 8 due out within the month
Microsoft Sequel Server, OLE DB

acronym server needed (half joke)
OO Modeling Working Group -> ISO Schema Working Group - not making progress.

overlapping schemas, hierarchy, methods
bottom up, not top down approach covering portions of several programs
should be tied into

MIDB
NIMA representations including OGC's modeling work
RADIUS models

standard mappings from Relations to IDL and back per my recent conversation with Jim Trezzo (Oracle)

Action Items/Thoughts for me

see main action items in Wrap up discussion
send Tenney cc Burns, all:

portions of OMG trip reports related to HLA, OGC
contact emails for OGC (Kurt Buehler and David Schell)
DMSO High Level Architecture (HLA), Fred Kuhl, MITRE
Iridium roaming in a seamless grid
information about National Imagery and Mapping Agency (NIMA) Joint Vision 2010, GII 2000 Geophysical Information Infrastructure (GII) Integrated Product Team (IPT) - requirements and masterplan. NIMA is the union of the Central Imagery Office (CIO) and Defense Mapping Agency (DMA). In turn GII is the geophysical part of DISA's Defense Information Infrastructure (DII)
information on DARPA community efforts re metadata and annotation services
OMG firewall RFP in progress for IIOP - alert DARPA survivability
class based routers for Tracking problem and for security

discuss pedigree server vs. annotation server with OBJS annotation team - pedigrees are based on objective
statistical measures, direct observations, reasons and rationale. Some of the most useful information in the MIDB is apparently annotation information.
think about: tech transfer model, via OMG GIS, Simulation, and C4I Working Groups
think about: engineering dbms view that its not the common storage but the common representation problem
should we try to go to JFACC Kickoff (Logicon prime) in Los Angeles on April 2-4?
talk to Jim Trezzo (Oracle) re common mappings IDL to/from relations
should we get Paradise or SHORE or Oracle 8 or MS Sequel Server
license Open OODB from TI and reuse some parts - how do we share software licenses across contracts?
send Sami Saydjari info on van Johnson's work on class-based routers - also IIOP based firewalls.
think about setting up Internet registration resources that correspond to zip codes, counties, want ads, towns, government office, businesses, nodes-of-interest
reinstitute Security clearance (?)
idea: scatter transmitters for geopositioning - rather than painting circles on all roads in Wisconsin

Dynamic Database Panel II Notes from Meeting #1