Difference between revisions of "MCF"

 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<h1> Meta Content Framework </h1><br>
 
<h1> Meta Content Framework </h1><br>
. <br>
+
<br>
 
R.V.Guha<br>Apple  
 
R.V.Guha<br>Apple  
 
Computer<br>  
 
Computer<br>  
Line 29: Line 29:
 
The following diagram illustrates this.
 
The following diagram illustrates this.
 
<br>  <br><br>
 
<br>  <br><br>
  <center>
+
<center>
  [[File:Mcfg.gif]]
+
[[File:Mcfg.gif]]
  </center>
+
</center>
  <p>
+
<p>
 
 
 
   
 
   
 
 
<br>
 
<br>
 
<br>
 
<br>
Line 48: Line 46:
  
 
<p>  
 
<p>  
 
  
 
The expressiveness has intensionally been limited in version 0.95 of the MCF  
 
The expressiveness has intensionally been limited in version 0.95 of the MCF  
Line 66: Line 63:
 
aim to cover. <p>
 
aim to cover. <p>
  
<br><br><br><br>
+
<br><br><br><br>
 
<h3>The Focus of MCF</h3>
 
<h3>The Focus of MCF</h3>
 
Though we do need an interchange syntax, the syntax itself is distinct from MCF.  
 
Though we do need an interchange syntax, the syntax itself is distinct from MCF.  
Line 88: Line 85:
 
What is important is the conceptual framework behind MCF and agreement on the meaning
 
What is important is the conceptual framework behind MCF and agreement on the meaning
 
of the actual terms used to describe the content.
 
of the actual terms used to describe the content.
<p>
+
<p>
 
   
 
   
 
The conceptual framework behind MCF --- the Meta Content Model --- is simple, yet powerful. There are a set of objects
 
The conceptual framework behind MCF --- the Meta Content Model --- is simple, yet powerful. There are a set of objects
Line 122: Line 119:
 
  <li> Another subset of these objects is called <b>Layers</b>. The layers are arranged in a total order.
 
  <li> Another subset of these objects is called <b>Layers</b>. The layers are arranged in a total order.
 
  </ul>
 
  </ul>
</ul>
+
</ul>
 
   
 
   
An <it>assertion</it> (or tuple), which is the statement of a relation between a certain
+
An <i>assertion</i> (or tuple), which is the statement of a relation between a certain
 
set of objects or the statement has a certain property, is the basic unit.  
 
set of objects or the statement has a certain property, is the basic unit.  
 
An assertion is an n-tuple (typically a triple),  consisting of a slot and an ordered list of  
 
An assertion is an n-tuple (typically a triple),  consisting of a slot and an ordered list of  
Line 132: Line 129:
 
is also true/false in all the superior layers, unless one of those  also contains the
 
is also true/false in all the superior layers, unless one of those  also contains the
 
assertion with a different true/false value.
 
assertion with a different true/false value.
  <p>
+
<p>
 
Since the layers themselves are units, the relation between the layers themselves
 
Since the layers themselves are units, the relation between the layers themselves
 
is expressed as assertions. These assertions are in the BaseLayer, a special layer
 
is expressed as assertions. These assertions are in the BaseLayer, a special layer
 
that is at the bottom of the total order.
 
that is at the bottom of the total order.
<p>
+
<p>
 
A chunk of MCF (in whichever syntax) is typically a set of assertions. In the preferred
 
A chunk of MCF (in whichever syntax) is typically a set of assertions. In the preferred
 
syntax (the MCF File Format), the assertions are grouped together based on their
 
syntax (the MCF File Format), the assertions are grouped together based on their
Line 154: Line 151:
 
   
 
   
 
   
 
   
+
<br><br><br><br>
<br><br><br><br>
 
  
 
<h3> The MCF File Format </h3>
 
<h3> The MCF File Format </h3>
Line 170: Line 166:
 
   other objects. The syntax for object references is given later in this document.
 
   other objects. The syntax for object references is given later in this document.
 
       A longer term, better solution for object references is described  
 
       A longer term, better solution for object references is described  
       here.
+
       at [[MCF - The Problem of Reference]].
 
<li> slot values are always sets. i.e., there is no significance to the order of values  
 
<li> slot values are always sets. i.e., there is no significance to the order of values  
 
and and number of times a value occurs. The combination of the unit, slot name and a slot  
 
and and number of times a value occurs. The combination of the unit, slot name and a slot  
Line 224: Line 220:
 
physical file or the end of the logical file.  The logical end of the file  
 
physical file or the end of the logical file.  The logical end of the file  
 
is specified by the token end-file: appearing on a new line.   
 
is specified by the token end-file: appearing on a new line.   
  <p>
+
<p>
 
 
 
 
  
 
An mcf object description has the following syntax. <br>  
 
An mcf object description has the following syntax. <br>  
Line 257: Line 251:
 
the definition. In this case, you can use segmented identifiers   
 
the definition. In this case, you can use segmented identifiers   
 
(with segments separated by the character '#' : such as
 
(with segments separated by the character '#' : such as
"http://www.foo.com/another-taxonomy.mcf#baz") where the entire string
+
"www.foo.com/another-taxonomy.mcf#baz") where the entire string
 
is the identifier of an object that is defined in the file  
 
is the identifier of an object that is defined in the file  
http://www.foo.com/another-taxonomy.mcf.
+
www.foo.com/another-taxonomy.mcf.
  
 
   
 
   
Line 272: Line 266:
 
use of the predicate. Of course, we have to start somewhere, and so we will
 
use of the predicate. Of course, we have to start somewhere, and so we will
 
have a use a base set of predicates as being predefined. These predicates
 
have a use a base set of predicates as being predefined. These predicates
are described https://web.archive.org/web/19970703020302/http://mcf.research.apple.com/hs/BasicSlots.html here.
+
are described https://web.archive.org/web/19970703020302/http://mcf.research.apple.com/hs/BasicSlots.html here (or see [[Basic MCF Slots]]).
  
 
<h4> Object References </h4>
 
<h4> Object References </h4>
Line 289: Line 283:
 
here.
 
here.
 
   
 
   
+
<p>
 
 
 
 
 
 
 
 
<p>
 
 
 
 
  
 
<h4> Headers </h4>
 
<h4> Headers </h4>
Line 325: Line 311:
 
Any characters appearing before a begin-headers: token or unit: token are ignored. <p><br>
 
Any characters appearing before a begin-headers: token or unit: token are ignored. <p><br>
  
 
 
<p>  
 
<p>  
 
   
 
   
  <br><br><br><br>
+
<br><br><br><br>
 
<h3>Standardized Vocabulary</h3><p>
 
<h3>Standardized Vocabulary</h3><p>
  
Line 346: Line 331:
 
<p>  
 
<p>  
  
<br><br><br><br>
+
<br><br><br><br>
 
<h3>Example</h3>
 
<h3>Example</h3>
  
Line 352: Line 337:
 
an example of the use of MCF.
 
an example of the use of MCF.
  
<br><br><br><br> <br><br><br><br>
+
<br><br><br><br> <br><br><br><br>
  
 
<h3> Appendix A: BNF for the MCF file format </h3>
 
<h3> Appendix A: BNF for the MCF file format </h3>
Line 386: Line 371:
 
=Related Articles=
 
=Related Articles=
 
* [[Basic MCF Vocabulary]]
 
* [[Basic MCF Vocabulary]]
 +
* [[Basic MCF Slots]]
 +
* [[Towards a theory of meta-content]]
  
 
=See Also=
 
=See Also=
Line 392: Line 379:
  
 
[[Category:Apple]]
 
[[Category:Apple]]
 +
[[Category:Computing]]

Latest revision as of 19:10, 30 September 2020

Meta Content Framework



R.V.Guha
Apple Computer

This paper provides a description of the Meta Content Framework (MCF), version 0.95.
Note : if you are just looking for a specification of the file format, you can skip to here. If you are just interested in creating an mcf description of your site that visitors can fly through, you might want to start from https://web.archive.org/web/19970703020302/http://mcf.research.apple.com/hs/NXSpace.html.

Goals of the MCF

The goal of MCF is to provide an adequate system for representing a wide range of information about content. The content targeted includes web pages, gopher and ftp files, desktop files, email and structured (i.e., relational and object oriented) databases, etc.

The corresponding meta-content includes indices such as Yahoo!, web site descriptions (which includes maps of web sites together with other information about the pages on the web site), gopher and ftp directory structures, email headers, data dictionaries, etc. The following diagram illustrates this.


 



Foundations of the MCF

The MCF has its origins in knowledge representation system such as https://web.archive.org/web/19970703020302/http://www.cyc.com/tech.html#cycl", KRL and https://web.archive.org/web/19970703020302/http://logic.stanford.edu/kif/kif.html and advanced database models (the relational object model) such as those in SQL3. The version of MCF described in this document does not have the expressiveness of all of these languages, but hopefully, some future version will include the best of these languages.

The expressiveness has intensionally been limited in version 0.95 of the MCF primarily for ease of use and for reasons related to computational complexity. It should be noted that even this version of MCF is significantly more dynamically extensible than most database languages.

MCF is not intended to be an extension of markup languages such as HTML. While it is possible and often useful to embed meta-content within HTML files, we believe that for many purposes, it would be better to extract out and independently represent this meta-content. MCF is intended to be a format for this representation. In fact, we expect a lot of meta content to be embedded in content and extracted automatically by robots that use the MCF to represent the results of their activities. In this spirit, MCF should be able to represent the meta content that proposals such as the https://web.archive.org/web/19970703020302/http://www.oclc.org:5046/conferences/metadata/metadata.html Dublin Core aim to cover.





The Focus of MCF

Though we do need an interchange syntax, the syntax itself is distinct from MCF. The same MCF content may be transcribed using different standard syntaxes (such as SOIF, SiteMap, MARC, etc.) and MCF parsers should be able to read all these different standard syntaxes.


So, for example, we are in the process of defining an alternate syntax for MCF based on SGML and this syntax might be more appropriate when the meta data is embedded with HTML. We could consider the proposed SiteMap format as an alternate syntax for a very limited subset of MCF.

We do however describe a preferred syntax --- the MCF File Format -- that is capable of exploiting the expressive power of MCF. The main reason for introducing yet another file format is so that we have an interchange format that is not beholden to legacy applications that can track the changes in the expressiveness of MCF.


What is important is the conceptual framework behind MCF and agreement on the meaning of the actual terms used to describe the content.

The conceptual framework behind MCF --- the Meta Content Model --- is simple, yet powerful. There are a set of objects with attributes and relations between them (technically speaking, this is a first order model.) Some of these objects denote content objects such as web pages, desktop files, etc. Some others might denote content entities such as newsgroup threads. Yet others might denote physical objects such as people, companies, etc. Content is typically about people, companies, etc. and if there is no way of refering to these, one cannot possibly do a good job of representing information about the content.

Specifically, we have:

  • A set of objects. E.g.,
    • the web page whose URL is "mcf.research.apple.com"
    • the person whose social security number is 550-91-6732 and name is Fred Smith.
    • the HotSauce plugin application.
    • a predicate whose name is "author" which is described in the file whose url is "...".
  • An important subset of objects are predicates/relations E.g.,
    • the predicate whose name is author whose first argument is a content object and whose second argument is an agent.
    • the predicate whose name is lastRevisionDate whose first argument is a document and second argument is a date specifying the date of last revision. It is very important to note that this predicate has to be used consistently everywhere for MCF to really work.
    • the ternary relation lastModifiedByOn whose first argument is a document, second argument is an agent and third argument is the date, which may be denoted by a string (NB: wherever possible, as in the case of dates, MCF will try to use existing standards.)
    • Another subset of these objects is called Layers. The layers are arranged in a total order.

An assertion (or tuple), which is the statement of a relation between a certain set of objects or the statement has a certain property, is the basic unit. An assertion is an n-tuple (typically a triple), consisting of a slot and an ordered list of n-1 object references and a layer. Each assertion also has a true/false value associated with it. Assertions are said to be true/false in the layer associated with them. An assertion that is true/false in a layer is also true/false in all the superior layers, unless one of those also contains the assertion with a different true/false value.

Since the layers themselves are units, the relation between the layers themselves is expressed as assertions. These assertions are in the BaseLayer, a special layer that is at the bottom of the total order.

A chunk of MCF (in whichever syntax) is typically a set of assertions. In the preferred syntax (the MCF File Format), the assertions are grouped together based on their first argument and all the assertions in a file are assumed to be in the same layer.

It is important to note that predicates/relations themselves are objects. This allows us to extend the vocabulary within MCF itself. This is both a blessing and a curse. It obviously makes it very easy to extend MCF for many different purposes. Applications which dont' recognize the semantics of a new predicate can simply ignore it. The downside is of course that different authors of MCF can extend in potentially incompatible ways. To alleviate this problem, we propose some https://web.archive.org/web/19970703020302/http://mcf.research.apple.com/hs/vocab.html basic terms that can be used to describe web hierarchies such as Yahoo! (see Basic MCF Vocabulary)

In the next section, we describe the MCF File Format, a preferred format for representing MCF.






The MCF File Format

MCF files contain descriptions of meta-content objects also referred to as "units". A unit consists of the following.

  • a unit identifier.
  • some number of predicates (also sometimes refered to as slots), each with one or more values
    • depending on the slot, there may be exactly one or more than one value
    • the value(s) may be strings, numbers, etc. or they may be references to other objects. The syntax for object references is given later in this document. A longer term, better solution for object references is described at MCF - The Problem of Reference.
    • slot values are always sets. i.e., there is no significance to the order of values and and number of times a value occurs. The combination of the unit, slot name and a slot value can be abstracted as a tuple in database terms or as a ground atomic formula in logic terms.
    • there is no minimal set of slots that an object should have, though specific applications may require certain slots to be present for certain kinds of objects.
    • in the case of predicates which take more than 2 arguments, the second argument onwards are enclosed with square braces --- [...].

MCF is an interchange format and does not make any assumptions about how information in this format is used by applications.

MCF Files and Units

Conceptually, the Web is a large graph where the pages are the nodes and hyperlinks are arcs between these nodes. Similarly, MCF defines a graph where units are the nodes and relations between units are the arcs. Since we have many slots, we get a much richer space with labelled arcs. The most general relations correspond to the notion of a directed arc and are represented by the predicate parent and its inverse child.

Each mcf file defines a sub-graph (typically a sub-hierarchy.) The file itself corresponds to a unit. The file may define one or more layers of the hierarchy under it.

If an object in a certain mcf file does not explicitly specify a parent, the parent will default to the object whose identifier is the url of that mcf file. The immediate children of the file's topic node should either not specify any parents slot or provide the the url of the file as the value for the parents slot. The first approach is better because it allows for the file to be moved around more easily.

The mime type for MCF is text/mcf. The urls for MCF files typically have the suffix "mcf".



MCF Syntax

An MCF file contains a set of headers followed by a list of mcf object descriptions. The headers may specify other mcf files that are logically included within that file. This is useful where a single (set of) files defines the predicates and units commonly used across a set of MCF files.

Each object description starts on a new line with the token "unit:". An object description ends either when a new object description is encountered or when the end of the file is reached. The end of the file may be the end of the physical file or the end of the logical file. The logical end of the file is specified by the token end-file: appearing on a new line.

An mcf object description has the following syntax.
unit: < unit identifier >
< slot-name > : < value 1 > < value 2 >...
< slot-name > : < value 1 > < value 2 >...
.
.
.

Lines starting with the character ';' are comment lines.

In this document, we will use the notation s(u, v1) to refer to the assertion denoted by the entry v1 occuring on the slot s of the unit u.

Unit Identifiers

Unit identifiers are strings. Identifiers for content objects (such as web pages) are their urls. The identifier for a unit is not necceserily the same as its name. Different units (i.e., units with different identifiers) may have the same name. The only exception to this rule are predicates, whose names are the same as their identifiers.



The unit identifier for non-content objects (such as subject categories) can be pretty much any string. However, if you want to refer to them outside of the file they are defined in, the identifier also needs to specify the location of the definition. In this case, you can use segmented identifiers (with segments separated by the character '#' : such as "www.foo.com/another-taxonomy.mcf#baz") where the entire string is the identifier of an object that is defined in the file www.foo.com/another-taxonomy.mcf.


Slots

Slot names are restricted to non-white space characters. A list of slot values is semantically equivalent to a set. So, the order of values and the

number of times a value occurs does not carry any significance.

It is further assumed that the unit for a predicate appears before the first use of the predicate. Of course, we have to start somewhere, and so we will have a use a base set of predicates as being predefined. These predicates are described https://web.archive.org/web/19970703020302/http://mcf.research.apple.com/hs/BasicSlots.html here (or see Basic MCF Slots).

Object References

#"id" is a reference to the object whose unique identifier is id. In some

cases, we can get away by just using "id" because we are expecting references to objects (and not strings). However, to avoid future cases of potential ambiguity between the string "id" and a reference to the object whose identifier is "id", we introduce this syntax. MCF parsers are free to tolerate and resolve this

kind of ambiguity.

If the identifier does not have any whitespace character, the quotation marks can be dropped so that we can write just #id instead of #"id". A longer term, better solution for object references is described here.

Headers

Headers are similar to meta-content object descriptions in that they are a sequence of slots and values. Headers really provide meta-meta-content.

The header slots currently used are,

  • MCFVersion: a decimal number.
  • fileLayer: the layer that the contents of this file belong to. Defaults to the most local layer.
  • include: a list of urls for the other mcf files that are logically included in this file.
  • tocOf: of the file is a table of contents for a web site, then this slot contains the url for that site.

In addition, the headers can include any of the slots (and values) for the object corresponding to that file. e.g., the slots name and description .

The headers begin with the token begin-headers: and end with the token end-headers:. If the token unit: is encountered before the token end-headers: is encountered, an end-headers: token is assumed. Any characters appearing before a begin-headers: token or unit: token are ignored.






Standardized Vocabulary

Each application can use its own vocabulary (in addition to the https://web.archive.org/web/19970703020302/http://mcf.research.apple.com/hs/BasicSlots.html built in vocabulary that is assumed to exist) though it would be highly desirable to use the standard slots whereever possible. Please see https://web.archive.org/web/19970703020302/http://mcf.research.apple.com/hs/vocab.html here for a growing list of standard vocabulary. If you need a predicate of category not in this list, please write to us suggesting additions.

Please follow this link to see the https://web.archive.org/web/19970703020302/http://mcf.research.apple.com/hs/hsspecific.html HotSauce FlyThru specific slots.





Example

Please follow https://web.archive.org/web/19970703020302/http://mcf.research.apple.com/hs/example.html this link for an example of the use of MCF.









Appendix A: BNF for the MCF file format


< mcf file > -> < headers > < unit list > end-file:
< headers > -> begin-headers: < linebreak > < slots > end-headers: < linebreak >
< unit list > -> < unit > < unit list > | < unit >

< unit > -> unit: < unit identifier > < linebreak > < slots >

< slots > -> < slot > < slots > | < slot >
< slot > -> < slot name > : < slot values > < linebreak >
< slot values> -> < white space > < slot value > | < slot values > | < t-value > | < q-value >

< slot name > -> < symbol >:
< slot value > -> < unit reference > | < string > | < number > | < symbol >

< t-value > -> [ < slot value > < slot value > ]
< q-value > -> [ < slot value > < slot value > < slot value > ]
< unit identifier > -> < string >

< unit reference > -> # < unit identifier >
< linebreak > -> any sequence of standard linebreak characters (including '\r' and '\n')
< white space > -> any sequence of standard white space characters (including '\t' and ' ')
< string > -> character sequence starting and ending with '"'
< symbol > -> any sequence of characters without any intervening whitespace characters.

Related Articles

See Also