Dynamic Database Panel II
Notes from Meeting #3

Craig Thompson, OBJS
ISX, Washington D.C.
May 7-8, 1997 
[These are not minutes of the meeting - they are notes - so they do not provide a complete record of what we covered at the meeting.] 

Executive Summary

We discussed the DDB program and how to structure it: and heard presentations on terrain modeling, NASA GIS DBMS, Hughes image DBMS, and distributed interactive simulation.  My homework is any additions to the section on community involvement and a new section on federation architectures.




[Unedited notes follow.]  <missed first 30 minutes of meeting>

Quickstart update - Quickstart is collecting data at sites. 1 m data collection though sensors could go to 1 ft but for sensors deployed . Could also be Lansat for $4K. Lincoln is in charge and is calibrating the area - analyst estimate of ground truth. Like to do SAR data at night. Normal case is you have some maps and various resolutions. Minus foliage. DMIF Topic 2 has dynamic database.

Program Status Briefing (Tom Burns, reviewed by Tenney)

Tenney will give us a copy of the current briefing, which has been given to Wishner (who has since left), Bob Douglas (taking his place), [Gunning reports to Garvey now]. Their audience's reaction: what is being done in DDB the program and DDB the vision? Where is it relying on other programs. Some issues re words like model (reference model). What if-ing in future and how to align past with present. Significance goes up hierarchy - statistically significant, physically significant, Also includes expected change. I expect X to occur and when it does, tell me. Militarily significant implies what you are planning to do. How to allocate tasking and functions across sites. Policy specified outside but we need to represent them. Focus. Is this fundamental to DMIF - yes, in a microcosm. Oriented to taking stuff from text formatted messages.

Pixels to Planning Vignette, Scott Fouse

Airport consists of models of hangers, planes, taxi-way and runway. RADIUS located tail of aircraft coming out of hanger (though car looks like plane). Planning = determine what elements of Airport to affect. Determine how = precision strike. Execution = SOF Team approaches airport by first dropping in at insertion point then using cover of terrain. Will illuminate targets at airport for precision munition delivery. Assessment = has objective been met are airport operations disabled. Line of sight = pre-computed or on the fly on demand. Migrate into dbms. Pre-compute for control tower or radar placement. Safe areas from known emplacements. Timeliness issue. To cache or pre-compute. How does DDB interact with mission rehearsal simulations. Simulation folks want part of this. Line between prediction and simulation is fuzzy. Another model is to predict weapon effects.

One of Larry Lynn's motivations in DDB is to preserve the value of sensors - implies doing exploitation part first. Visible stuff, thermal stuff. Pixels one place, camera models somewhere else, site models somewhere else.

Alan Doyle boxology model - observables fusion hypoth expansion expectations tasking tasks session scheduling observables. Overlay while with probability. Subtype with fusion. Backchain if some info changes and propagating change. Tenney simplifies this to data/operations/control. Asks why not to use Oracle 8. Allan says consider the architecture logically. No centralized control, but instead explode into subtasks.

Content is an orthogonal picture.

Distribution is too. Craig points out that federated simulation is a good model for kernel dbms'. If you organize the parts right the system can be self organizing. Configure yourself around system so nearby ones can take over. That survivability robustness has a lot of appeal. Decentralized control moves you away from a central DBMS, possibly including Oracle. Partitioning, workflow, task management. Extensibility joints. Survivability. Geospatial reference frames to add data types to. Skeleton schema. Not a global federation schema. Want to represent inconsistent stuff in consistent format.

Modeling includes uncertainty. Inter-community communication via mediation. DDB community.

Carl Cargill books. Microsoft is starting with your car. Oracle with your server. Windows 95 running where your radio does. Apps, built-in functions, storage is distributed.

Communities are content providers and/or specification providers.

Simulation (rendering, faceted) - fundamentally different representation. QoS. Microsoft matches names. Many representations. Build in robust in face of heterogeneity. Storage layer vs. generalized service layer vs. apps layer. Filters that take many formats and convert to particular format. Transforming into common format. Common format changes. Does not lose info.

Lunch discussion with Doyle and Tenney

We need to break program into separable parts (modeling and representation management, algorithms and sensor processing, info management that's decentralized.

After Lunch

Tenney discusses - if I write BAA tomorrow - 4 components: modeling, algorithm processing, info management (workflow, decentralized, ), integration. Also mini-grants on open issues. Also tech transition and community involvement.

Models is an overloaded word. Models of IR signature of tank. Models of vehicles need fuel. DMIF is doing some of these. Info management includes storage, mediation, control, and task models. Partitioning of situation estimate - site X granularity X layer (sf, pf, 2.5p, 1.5p, ) X time

Clusters in ALP contain planner, scheduler, and can be federated. Led to a discussion on consistency and controlled by Maier.

God - break up protocols into pieces. Global Object Directorate. Can use control regime to guarantee consistency. Owned, tightly managed, consistency feedback, delegated are progressively weaker. Looser dominion by god means dominions have more responsibility.

God1 - owned regime. God watching consistency management. God is watching, wraps different dbms'. Comment on time - do you delete tank on Tuesday on Wednesday. How to apply these notions to pedigree (Maier's later notes, apparently sent). Knows directly about terrain data representation. Might not be isolated from changes in representations.

God2 is no longer omnipotent but is omnipresent and omniscient. Looks for inconsistencies and sends adjustment commands. Knows only update API. Terrain and Building.

How is consistency related to multiple hypotheses and probabilities? People want to know about change - when is the model inconsistent with what is observed? Federated change detection capability. If god is omniscient then he can decide among inconsistencies. Not trying to explain all forms of federation. Security and archiving federations.

God3 is omniscient but not omnipresent. There is inconsistency notification. System adjusts so parts are smarter. Dominions must know about each other. Possibly a separation of high level vs lower level control. Adjustment negotiation. The I know it when I see it god. Style of management - you two are inconsistent so make yourselves consistent. Separate detection from resolution. Cannot talk about consistency in advance. Always will run into inconsistencies after the fact.

There are another collection of pictures wrt one vs. many gods.

God4 - delegate consistency monitoring to dominion. God ceases to exist or has limited role. Maybe does arbitration. Existence proof?

Two simulations, one in more detail, master-slave and I run an approximation. See IEEE Spectrum article on simulation.

Grades of god can be projected to other areas. Start with things tightly coupled. Later carve these into more independent models. Going from god1 to god4 you get evolvability and maintainability

Research issues are how to define and resolve consistencies.

How important is consistency vs. correctness. Can assert fused result as "either this and that". Should we build god2 before god4.

Would god 4 cycle? Consistency by consensus and proofs of convergence. Must converge pairwise for polytheistic. How to control convergence, QoS, optimization. Problem if there are no overlapping hypotheses. 2 to 3 dominions so do you have 1 god or 3 pairwise gods. Bayesian runs into this. Identify things that are more decomposable vs. tightly coupled.

Distribution across layers (I'm road guy and you are truck guy) (I'm the 2D guy and you are 3D)

Must be consistency within dominions. If dominion1 and 2 are different geographic regions then I have boundary conditions. If they are the same geography and different times, there is much more inconsistency. Bringing up new dominions from scratch.

Consequences of propagating. Vehicle tracking at details or at high level. So go between boundaries.

Example of dominions: images, roads, tracks, user sites re ground order of battle across universe, in detail on east side and on west side, site models. Register, geo-locate, translate across rep detail, all have sensor models, can all be moved into ground coordinate systems. Correlation in pixel spaces. Warp, reinterpret and re-sample. Build transforms and leave data in raw form or in common rep system. Flow of control might be from data to objects or from objects to data. Each site must register to other sites

Talked with Maier re inter-app protocols like DBMS transactions, security, DIS, ALP,

Images to Features via Extract. Register both. Match on SAR features. Correlate features to test modules. Update models. Some SAR features are polygons of homogeneous reflectivity. Correlation gives us some physical objects. Match on physical objects. Texture models are hypotheses. Whole class of things like texture. Land use guys will correlate bits of image with dozens of land use probabilities. Could scale this up in year one.

Exploitation side is same process. Features are shadow regions, etc. Modes are constructing state and later using it and updating it. Bootstrapping is more involved. Sites modeling a 3D. Things in site models are semantically consistent. RADIUS replaced the IA shoebox.. One side says this is a grassy area and the other says this is a parade ground. (use is a behavioral property of site. Physical site model versus functional/behavioral model.. IA mainly cares about functional model and not physical. Faceted representations stored in file system but Lockheed mapped these to Sybase. There was some disutility. Was 3 orders of magnitude slower. There are semantic constraints - two rivers cannot cross each other. Roads must be drivable. Rivers cannot go uphill.

Initialize roads and maintain consistency between sites and roads, physical objects and roads. Roads + MTI models yield tracks. Looking long term on MTI you will get systematic updates on roads.

How do three sites maintain views. Ground order of Battle East. Replication at each place of s subset of objects selected by pull query (select and merge). Time, geographic, type focus. Is gob universal built from COPs or from raw data.. GCCS did top cop. Designee who is theater wide cop. Conops issue in that *** might not take one-* recommendation.. E COP talks to W cop. Can *** or anyone else drill down. Also jump-up-and-down excitement. Conops comes up with common picture. Replicated down to divisions. There is a centralized maintainer at a level for everyone. *'s send out recon to see if local picture = on high picture. An inconsistency detection. Are we perpetuating this? Keep local adjustments.

Is the boxology a good start - yes, it provides a roadmap and you can build meaningful subsets.

Show geographic partitioning, services/control partitioning, mirror sites, types like tracks. Can you redo partitioning on the fly. Replication unless update rate is too high. Replicate maps since its rate of change is slow. Master-clone. Work on consistency at master. Replicate need not be kept in sync if aspects are irrelevant. Temporal granularity. System services are associated with each bubbles. Workflow manager. Demonstrate reconfiguration of this. Experiment A is a top cop. Experiment B is local picture and you derive top cop view from this.. Reconfig experiments. Fault tolerant experiments as well. Where something is taken out.

NASA EOSDIS, Ron Williamson, Hughes

part of mission to planet earth. 1600 gigabytes per day. Will archive 1.5 pedabytes of data over 20 years. Nine centers with core competence - land, socio-econ, air-sea, trace gases, snow and ice, upper atmosphere. There are still political and social barriers. ?Trend in ozone, El Nino, global warming. NASA has T1 links upgraded to T3. Wants ATM. Now is tcp/ip. FTP and using Andrew, Silicons bulk data transfer. 3D model of pressure profiles (no fixed grid, changeable). Working with Oracle, Sybase, Illustra, Object Design. Mostly R and Quad trees. Goddard doing spherical quad trees. Using DCE and not CORBA which was not mature enough 4 years ago to build 800M program on. Www.transarc.com is involved. TRIMM Ceres for rain and lightening. AM-1 soon. Wide area 5m resolution but focus on some satelilites might be < 1 meter. USGS is independent from NASA and has their own system. Scientists get access if they make their data public. Quality of info. Refereeing process support. Provides pedigree.

Levels 1-4 go from sensor to oriented common data then temporal models. Driven by raw data and buffering. A couple of terabytes raw per day, with 10X that processed. Scientists gather the data and drive how often products are needed. Working on 100 foot cube now, want to get down to below 10 feet cubes. Biggest costs are storage costs now. There is no normal form across the system for space and time. Want a canonical distribution format HDF-EOS hierarchical data format for earth observing system (self describing). Scientists are giving you raw data and filters. Might translate to .gif, .tif, ECS web site. HDF has been around for years. Scientists use CDF, HDF translates self-describing data structures. URL is http://edhs1.gsfc.nasa.gov. Use a rule-based planning and scheduling system. A level 3 product might need data from all nine sites from many bases. To get this product, combine these, and to get a lower fidelity solution. Or several low level one higher level if the high quality satellite is not available right now. Its a distributed planning system across all nine sites. Using publish and subscribe. Sybase and stored procedures and triggers. Active notification to users. Q from anywhere is give me cloud top height over Washington DC on April 1 97. Working on standardizing OQL. First deliveries are Dec97. Relaxing queries working at U Maryland. Also on data profiling to estimate size of response. Retasking when events occur like volcanoes. Large metadata repository. Common grid system so not working with apples and oranges. Different algorithms for pressure in 3D. Hughes does not develop. Mediation between collected data and query. Do you carry uncertainty forward. Yes, sensitivity, metadata slots. Hard to come up with core metadata model. 6 attributes common and 1000+ optional. 10,000 users. Can request query/product history/algorithms. All archive is robotic tape so its nearline. They schedule processing engines to produce products. Level 0, 1a, 1b, 2a mandatory products.

Separate clearing house from FGDS. Also feed Global Chains Masters Directory. Langley and Goddard for weather models at poles and rest. Government moves some products to standard processing. Query distributed data management system for products. DCE kerberos and PKE.

Technology problems: for very large DBMS, trying to use COTS vendors OO approach. Organized multilayer approach. Earth Science Data Types (ESDT). Then map this to Computr Science Data Types CSDTs. Points, lines, voxesl, spatial containment, not within, No vendor satisfies all requirements. OODBs fell on face (3 years ago) on DBA. But engines were good. Ended up focusing on sybase and reevaluating informix/illustra. Test: DBMS has 30M records. Spatial/temporal records. Blobs but most data in file archive. Wish list is large objects in database. 200 GB in DBMS up to 20 petabytes in file system.

The Controlled Image Base (CIB), Tony B..., Hughes

Their division has done work on DMA/NIMA. Primary bread and butter is operating and maintaining NIMA equipment. CIB award to Hughes for full production. CIB is single image of entire land mass, seamless, all one layer at 5m optical. Building standard CIB product. Unclassified. Compressed at 8:1. Accurate to 23 meters with 90% accuracy for every pixel. Orthorectification. Korea, Bosnia, some of China, coverage today. Drape this image data over DTED level 2. 80% of world in 2 years. 3M record Oracle database. Defined product sets and come via tape or cd-rom. Can roll/plan/zoom seamlessly over related areas. Pixel is 8 bits. Cambridge Research did fly though software. CIB Quicklook is 20 sq m. with 5 minute processing time (3 X 3 cells of 1500m squares) for overlay and insert into new environment. Base CIB is low res available on line. Quicklook is right now. Clouds and all. Put onto base map. Can use to fly through imagery. CIB is adopted (?) by Open GIS as one layer in the geospatial fusion stack. CIB provides mission readiness, Quicklook provides mission responsiveness. Used Oracle and GOTS compression. Registration and stitching images is proprietary. Have huge algorithm set for radiometric balancing so there are no seams and all image boundary artifiacts are gone.

Videotape of Powerscene. Can preview mission planning rehearsal in faster than real time. Target familiarization. DMA DTED. CIB provides texturing overlayed at varying resolutions. Can overlay various headsup displayus. Can overlay target markers. Can overlay 3d models that are fixed or moving. Wireflow domes of key airspace. Runs on silicon graphics. Successful at Aviano since commanders, pilots, DMA, contractor, Silicon graphics all work together. Moving this to PC-based toward market-based direction. Cambridge put in 3D models.

User Needs for Rapid Terrain Modeling, Ed Wright, Camber

Background is topo engineering and uncertainty and impact on operational decisions. Challenges, issues, and solutions. Some solutions are continuous variables and categorical variables. Was a topo engineer. MS in godesy. Problem in Gulf War was no maps. Supports NIMA. Working on Geo Mason grad program. First challenge at Ft Bragg is rapid response. A few hours, a few days, longer. Doctrinally its 18 hours. From deadstop to immediate. Availability of data products. DTED 66% coverage. GOOD FOIL. WES is waterways mobility station. M&S training is low priority but not when it is mission rehearsal when it becomes operational. M&S requires an order of magnitude more detail. Not just for pretty picture. Becoming part of concept for Force 21. Good domain model is sensitive to minute changes in real world, very sensitive to data. Mission planning and rehearsal want a good predictive model. Good domain mdoel is not good mission planning predictive model. Example given - guy leaves cigarette on chair. Small changes of distance to curtain is large variation of speed of fire. As we move to OOTW, we get many new requirements. Military operations in builtup areas (MOBA) want to know where doorknob is. Uncertainty is difference to carry out mission minus the knowwledge available to decision maker. Knowledge available is increasing but info we need to make decision is increasing in political situation where no casualties are acceptable due to CNN. One problem is shift to Rapid Generation, changing to rapidly developing datasets bases on less certain data. Model is sensor, data generation, dbms that is not full, then analysis for uses, then display and reproduction. NIMA will be CIB, DTED and digital JOG (1:250,000) available off-the-shelf foundation data. NIMA 6 mo goes to 1 m, now the mindset is to reduce this to 18 hours. Time consuming human editing for correlation methods. Feature data generation is fundamentally hard due to computer vision. = Automatic Target Recognition (ATR) for 2000 overlapping entity types. Fusion is many data types and formats and uses and all of different qualities. Shows nice foil of DMA, TEC, Command, Subcommand, all simultaneously. Only works if we have good rules for tracking quality. Doctrinally data gets passed back to producers but does not always happen (we did not produce it so ). Probably has utility to others. DBMS management challenge - in M&S ddb means new bridge, rain, turbulent data. Common picture requires all get common view. If you start with assumption that you send around deltas and start with 10m image base, replace with 1m and change is 100 times size. Displays are cool but can mask underlying uncertainty. Endless quest for certainty versus fog of battle. Line of sight (LOS) (green you can see and red you cannot. How accurate? When error propagates then you get a probability model. There aren't many people doing this. Monte Carlo. Probability. Got to know relative accuracy of the data. Least squares, adjustment, error propagation, error ellipses. How about categorical data. Use bayesian networks. Will we get networks of Crays to compute uncertainty? World wide Bayesian network. Could do coarsening. Lots of things to do when once you have the framework. Traditional product is green-yellow-red for go-nogo. So more colors show richer differentiation for go no go and areas of relative risk. Mobility evaluation. If no go, why? Slope, trees, many are uncertain variables. One stem diameter and spacing. Bayesian network helps to determine why the outcome. Nice slope, soil, moisture. Can back propagate. Most likely error is in slope maps (only 75% accurate). Soil strength could come back automatically. So could report of dampness. Other types of uncertainty are fuzzy boundary, logical consistancy, If I tighten, which things do I need to know the most, so task sensors. Good way to locate expectations. No one else is doing Baysian on spatial data but others are doing some related work. Measure quality as data is collected, record as metadata. Apps should read metadata, deterine if its good enough. User must be trained. Must propagate uncertainty. Registration - CIB is only accurate to 50 m CE 90%. When crisis occurs there will be requirements for 15 m then 3 m then GPS survey to 1 cm. Outlines scheme for updating older coordinate data on the fly. CIB comes with 8 bits per pixel.

NIMA Rapid Mapping exercises. If we train on best mapped areas, we are not training the way we fight. Start from scratch. How much data can you provide in 18 hours, 12 days. Another opportunity to insert DDB technology. Multiple views based on multiple levels of consistency. Bayesian networks can combine info from various domains.

DIS Protocols, Keith Green, IDA

Tenney says we should use Keith as an interactive user manual for DIS (Distributed Interactive Simulation). DIS grew out of SIMNET. Network can be LAN or WAN. Each host represents battlefield entity, component of BE, or collection of BE (company of tanks). Each BE knows about others by protocol of pdu's (protocol data units) via udp multicast (really use udp broadcast mostly). Different types of pdus. One kind is ES pdu (entity state pdu) which contains uid for each BE, what I am, where I am, my orientation, smoke plume, damage, . Where is tricky since there are many representations (flat earth). Every host computer has private (local copy) representation of the domain. Many simulators represent 10X10km or 1X1km or 60X60km low res. One representation is the world is flat and Cartesian. The DIS standard says to use round earth (WGS84). SIMNET said flat earth. Most maps do not correspond to WGS84. Current work on Global Coordinate System which maps to 400 local Cartesian "interesting places" and worry about seams. Different simulations use different resolutions. But Hughes and BBN have different simulators. Consistency? If tank is 10 feet in air, then put people in areas of common agreement. A lot of simplistic consistency issues. 30m vs 60m inconsistencies on who thinks they can see whom. Big issue - in George Lukes and DMSO's lap via CEDRAS data model. Articulation of oriented parts. In SIMNET, one sends out a fire pdu (I am firing at you), impact pdu (I hit you). You compute damage and send out status change pdu. Now fire, detonate, impact, status via request data and send data. Old days - status was how much ammo I have left, how much electricity. Now want to ask for any internal variable. Right now, simulation apps are humongous and people are rebelling from the new more general request data for arbitrary attribute. Vendors get together once a year to find out they have different understandings of messages. Unicast is to one, multicast to a subset, broadcast is everyone. Use ethernet, a broadcast medium. Can multicast via broadcast. Broadcast is bad. Your ethernet card sends packet to OS to decide not to use the packet. He's talking standard ethernet, not switched ethernet - they just got CISCO router to experiment with.

HLA - mainly has read papers so understands how it might work. Two ways to do simulations. DIS philosophy is high granularity. HLA is coarser grain. Electronic dice roll on whether company sees other objects in battlespace. And range in between. Heterogeneous simulation where part does one thing and another part does it another way. BBS is another aggregate level model. HLA provides a framework on how simulation should work. All participants in HLA must establish a common object model that can be detailed or coarse. Federated object model has up front publish and subscribe object model. Dead reckoning has to do with consistency. Ground truth does too. Location pdus are 150 bytes. Could stream location pdus whenever a change occurs. So they use dead reckoning algorithms. All moving entities track actual location and model if itself based on dead reckoning. Based on rate time distance and velocity in xyz. Can predict delta-t location. When delta gets greater than threshold (1meter) then send another packet. Also every 5 seconds. If no packets in 12 seconds then I drop out of simulation. 5000 entities. Can change timeout in large scale simulations. Some simulators have different models - eight different dead reckoning models. Also dead reckoning on orientations. Ground truth is reality according to simulation. Simulation entity does not necessarily know ground truth. Synchronization is not the DIS strong point - so two pilots see each other and think the other is his wingman since each simulation time is a little behind the other. There is one key owner simulation who gets to say what ground truth is. Uses Persistent Object Protocol dates back to early days. DIS protocols will not grow more except in HLA framework. DIS uses Euler angles and another simulation uses other approaches in dead reckoning. Mentioned that if some parts of simulation die, then query is made to network to find free hosts to do load balancing via dynamic scheduling. Might create new object on the fly like a missile. Persistent Object Protocol is a way to recreate objects if they die. George Lukes is developing features for standard way to blow up building. Can show this at IDA show. Only works in certain situations. If I make a crater, will other simulators see it. Many simulators will drive over crater still since they have no means of updating the wire frame terrain. DIS versus Distributed Interactive Estimation (DIE?). There's an assumption that someone knows truth. Cannot ask red force. What happens if I add uncertainty to the system. I know truth about a hypothesis. Problems for DIS are bandwidth speed, processor power. RTI lets you subscribe to what you need as opposed to giving you everything. You need 30 bits and I send you 100*150 bytes. "I'm still a tank, I'm still a tank." Up close versus far away. May not need to know. RTI is allowing people to send out multiple sets of info, 3 kinds of pdus with less info in some. I'm a tract vehicle (not an M-1). So I send you hi res and someone else low res. Its not DIS that allows relaxed info. How would you do this in uncertain world? In uncertain world, you don't know your precise state. A different from CORBA versus and publish and subscribe. Can do P&S with CORBA. Worry about logical connectiveness and how to get into to flow. How does uncertainty accumulate over time? If dominions know when to send info to other dominions. When is it a significant change to the requester? To what extent are there major qualitative differences in situation assessment versus simulation worlds? DIS is entity-based simulation. Specific entities versus fuzzy entities. Is ground truth inside the system or outside the system? Simulation entities have sensor models (eyes versus laser range finders).

Plan for Completion

Alan will send Craig his presentation (if he remembers -- he's going on vacation to Disney World).

Craig: who is DDB customer? How is it done today? What are the main research problems? More than one scenario? Boundary with other DARPA programs and value added? Domain modeling? System services? Indexing (uncertainty indexing)