Oiling the information age - 23/02/97
by Dan Tebbutt, Australian Consolidated Press
23/02/97 -- Meta-Content Format might just be the drill fitting we neet to tap the information age's submerged oil reserves.
Information is the oil of the future. It will inject lifeblood into impending economic engines and eventually become such a routine part of human life that we will wonder how we ever lived without information on tap. Crude global pipes already pump this new commodity, yet our ability to consume the resource effectively and efficiently remains in the Model T era.
Personal computing to date has been obsessed with individual productivity - word processing, spreadsheets, database access and so on. If anything, the PC revolution took us backwards in the quest for automated collaboration and information sharing: notwithstanding arcane interfaces, many minicomputers and mainframes offered better 'groupware' than today's PC LANs.
Information Feast or Famine?
One high hurdle is the disjointed nature of our information sources. Data is locked into word-processing files, email, Web pages, spreadsheets and numerous databases with incongruous data structures and limited ability to cross-communicate. HTML promises a universal file format, just as SQL pledged a universal database language, but both leave problems unresolved.
Today's computers are no better than banks of digital filing cabinets; PCs do a wonderful job of storing data but little to extract information value. Information overload is the problem, and enterprises are desperate for solutions - witness the millions poured into early data-mining attempts.
"We are swamped with information and yet we spend most of our time looking for information," said R Guha, a principal scientist at Apple Research Labs. "There's something clearly wrong here. It's like if we want food: I vaguely know there's a cafeteria somewhere so I go down to the ground floor. It's pitch dark and I can't see. I'm going to randomly walk around until I bump into food. I bump into a lot of things yet I go looking for more things to bump into thinking it's food."
Guha, who has a background in artificial intelligence, has spent several years focused on knowledge representation, attempting to codify the way humans describe the world into a form that can be digitised. For a year at Apple, he's looked particularly at metadata, the 'bits about bits' that explain how information is structured and described. His research has led to an architecture known as Meta-Content Model (MCM) that addresses many information-age dilemmas, and he's created an implementation called Meta-Content Format (MCF).
What is Meta-Content Format?
"Imagine, if you will, that content is like puppets, and the 'bits about bits' are like puppet strings," Guha explained. "Today we have incredibly primitive puppet strings and they are diverse in the sense that it's chaos. It's a lot like trying to control puppets where one string is made of straw, the other one doesn't exist and the third is a 10-gauge steel cable. MCF uses fairly powerful technology developed in AI to lend uniformity and standards to the way we control [metadata]."
Prevailing meta-content exists in many fragments, usually in application-specific formats and clients, but MCF promises a lingua franca. "MCF is a language for describing structure. There's this MCF library to which anyone can communicate their structure. Anybody can come and write viewers, and you can use any of these viewers - an outline viewer, a tree viewer, whatever." Interest in early versions of MCF is running hot, with over 100,000 beta downloads in the first few months from mcf.research.apple.com and hundreds of sites supporting the multiplatform fly-through viewer.
MCF is less a product than a method for breaking down barriers between information structures. Once data repositories (databases, Web sites, email, forms or any other structured knowledge) are described in MCF it opens windows to explore knowledge links and weave disparate data threads into an information tapestry.
To be precise, MCF allows universal representation and access of meta-content. It is emphatically not a database standard or language. Database architects and MIS managers will not be asked to disturb current infrastructures constructed with specialised tools like HTML, SMTP, relational databases management systems and multifarious proprietary formats. Indeed, MCF relies on structures and access standards like ODBC and JDBC to get at data. MCF adds an extra layer of intelligence between the user interface and middleware/backend routines. "It assumes the existence of something like ODBC to do the final step," said Guha. "You don't have to worry about whether you're talking to Informix or Oracle."
What problems does it address?
Technologies that make wonderful lab rats can prove conspicuously useless in real-world applications. APC asked Guha what problems MCF can answer.
Integrating data from different buckets into a single information pond is MCF's principal achievement, he suggested. People shouldn't need to access numerous sources to find answers: "Today information is organised by protocol, but really we want to deal with it by concept, using our own vocabulary." It's this concept-defining step that MCF seeks to enumerate explicitly. The result is 'conceptual steering' of aggregate knowledge-bases, including that mother-lode of learning, the Internet.
In enterprise contexts, MCF offers specific, long-sought benefits. Its ability to traverse and mesh disparate databases offers a potent tonic for enterprises eager to better tap the wells of information coagulating on corporate servers.
MCF is a 'glue layer' for accessing 'legacy data' held in multiple formats and sources. Guha argues that it's important to use a common, extensible and open format such as MCF since "what you're using today are the legacy systems of 2001. Someone called it 'future-proofing'. You never know how the world is going to be tomorrow or what you're going to have."
Web deployments are a shining example of MCF's potential. "A Web site has a structure but it's largely invisible," said MCF's product manager, Hardie Tankersley. "MCF allows Webmasters to visualise and express the site structure to users."
Connecting people with information is MCF's trump card. "Information currently comes in through a particular application or protocol," Tankersley said, "but that's not the way people want to use it."
How MCF works
MCF is deceptively simple and open to multiple implementations. "The hard part about what we've done here is not what's encoded; it's framing it in a way that's useful and compelling," said Guha. "Our model can incorporate other kinds of formats because it has a structure that can handle it without having it encoded in MCF."
The critical step is to describe the source data's schema, namely the data fields and their meanings. MCF's flat text-file is the preferred format for such descriptions, but Guha said alternatives like X.500, Microsoft's SiteMap, Netscape's SOIF or Novell Directory Services could also fulfil the task in limited deployments by translating MCM.
MCF files can be hand-written, but Guha expects they should be generated automatically by Web or database management tools. No change to the source data structure is required: MCM simply asks that data meanings be expressed in a readable way, with MCF offering a transparent answer. MCF does for data structures what HTML does for file formats.
The model can be applied to any organised data. During development, Guha built examples accessing SQL, HTML and gopher backend formats, but there's no obstacle to supporting any proprietary data pattern - relational databases, objective databases, Lotus Notes databases, email post offices, structured word processing archives, in-house database systems. Obviously, opening proprietary files to MCF requires vendor cooperation and Apple is actively pursuing partnerships.
Existing access tools are a vital adjunct to MCF, and protocols like ODBC, SQL, JDBC and HTTP are used to access information. "It assumes the existence of a standard way to access the data," said Guha.
MCF annotation can be added to any data structure - text, email, databases, sounds, graphics, video, virtual worlds - but more importantly it separates the user interface from the data format. If data semiotics are expressed in an open format like MCF, any front end can interpret and represent the meaning while data is accessed through existing protocols like HTTP and SQL. Thus, for a broad array of back ends MCF allows an open choice of user interfaces, such as:
- today's 3D fly-through and columnar lists
- logical groupings by date, creator, size, priority or other factor
- sensory groupings by colour, icon, shape, sound or geography
- proactive interfaces that alert the user when a conditional meaning is fulfilled
- advanced interaction through VR goggles or other futuristic interfaces
"There is no preferred view," said Guha, who foresees tremendous opportunities for specialised UI developers in an open field. Apple's HotSauce 3D fly-through offers a new view of the Web. Although the current plug-in interface is simple, Apple is openly encouraging third-party developers to explore exciting UI possibilities. Content vendors could develop a viewer specifically attuned to present their content, or they could offer any third-party interface. For example, Yahoo generated MCF descriptions of its site so users can 'fly-though' the popular Web catalogue, but it would not take much imagination to aspire to better 3D implementations featuring graphics, sounds and other hooks.
"We're trying to create a platform," said Tankersley. He said he hopes Australian developers will explore the opportunity to build MCF products, following numerous local sites which have already generated Web descriptions.
From Prototype to Popularity
Sun's Java programming language set the benchmark for Internet diffusion of a new technology. Acceptance and free-for-all distribution online contributed to widespread enthusiasm and provoked an unprecedented licensing orgy of Java. While Apple is not expecting a repeat with MCF, the company is learning from the Java model and has made large parts of the technology, public domain. MCF is geared for cross-platform deployment, the first targets being Mac OS, 32-bit Windows and a Java implementation for Unix. An OS/2 version is under discussion, and Apple's Newton team is also investigating MCF. Guha also expects considerable interest from developers of embedded-systems platforms like PlayStation and Pippin. Mac OS will eventually integrate MCF, although this is unlikely to be in 1997. Apple would consider licensing the technology to other OS vendors, although it really sits above OS services and much of MCF consists of freely-available standards anyway. Apple is evangelising MCF in standards bodies such as the W3 Consortium, Internet Engineering Task Force and Object Management Group.
Stage one of MCF's product cycle is the give-away HotSauce plugins that introduced Mac and Windows users to MCF's potential. This process is continuing via CD collections, but the key driver is Web site adoption - and it's already spreading too fast for the MCF team to keep its listing complete. Major 'HotSauced' sites include Yahoo, C|Net and the Australasian Legal Information Institute.
Stage two was the launch of the HotSauce software developers kit (SDK), a branded package allowing developers to explore MCF applications. The initial SDK is free, although Apple will consider making it a commercial product as it matures. "We hope [the SDK] will ignite imagination in some of our partners and get this technology in the marketplace," said Tankersley.
Meanwhile, Apple seeded MCF among close partners, and commitments quickly came from several companies, including Netscape, NetObjects, Yahoo!, Everyware and NetCarta (recently acquired by Microsoft). Other companies believed to be evaluating MCF include Oracle and Novell. A database tool codenamed BabelFish was demonstrated at Comdex, but Tankersley implied this would be likely to surface through database partners.