OWL
	Part I. Miscellaneous subjects (and those still to be sorted)

OWL

Jurjen Bokma

September 2007

Links

Links

Protege is a Java-based Ontology Editor
OWL is a second order predicate logic language built on top of DAML+OIL, which in turn is built on top of XML and RDF. There exists an accesible Wine advice server implementation that makes use of JTP, that is used as a web-based reasoning system.
SPARQL is a query language for RDF
theFigtrees.net Has a SPARQL FAQ with links to implementations.
The “Semantic Reasoner” entry of Wikipedia has a list of available semantic reasoners with a (limited) feature comaprison.
W3.org has a FAQ on the Sematic Web
There is an extensive list of tools available for implementing parts of the Semantic Web at esw.w3.org.
Chimaera might be of help in “creating and maintaining distributed ontologies on the web”. In doing so, the DAML Ontology Library also might come in handy.
- Joseki is an HTTP engine with SPARQL support, Java-based and offering an HTTP service on port 2020. Its site states nothing about OWL though, just SPARQL and RDF.
- ARQ is a query engine for Jena that supports SPARQL. Jena in turn is a semantic framework for Java, and it does support OWL (as well as RDF, RDFS, and SPARQL). In addition, it is Open Source.
- Virtuoso is another Open Source HTTP SPARQL server. Not much is stated about its reasoning capacities, though. And it looks a bit convoluted, being an SQL server, a web server, a webdav server and SPARQL server all in one. Not much of “do one thing and do it well”, and nothing about OWL...
- Pellet is an Open Source OWL DL reasoner in Java. Among others, it does have a command line interface and a DIG server, but nothing is stated about HTTP.
- Minerva is part of the IBM Integrated Ontology Development Toolkit, and it seems nice (Open Source and Eclipse and all that). It is reported (by IBM) to be 10-20 times slower on Apache Derby than on IBMs own DB2, but it does support SPARQL, although not sure whether we can talk HTTP to it.
- Sesame is an open source framework for storage, inferencing and querying of RDF data. It does have SPARQL support in its newest version. Anton Jansen provided this link. He uses Sesame in combination with OWLIM (see next item), and also with ELMO (at the Sesame site), which maps Java classes to the OWL concepts.
- OWLIM is a high-performance semantic repository developed in Java (this link also from Anton). In its System Documentation there is also a performance analysis.
- There is also this Scalability report on triple store applications.
- When looking for a stash to store ontology data in, analogous to a database, one googles for "semantic repository".
In Protege, if you get the An error related to DOT has occurred message, you are in need of the graphviz package. Under Debian, it can simply be installed by saying apt-get install graphviz.

Using Protege (4.0 alpha) as an editor, just to get acquainted with the matter, I tried to develop a little ontology all for myself. The idea behind is to have an ontology that is fit to derive configuration files from. As an example, the config file for a DHCP server will contain the IP numbers of name servers, paths to files on a TFTP server, MAC addresses of network cards of computers, etc. etc. DHCP configuration files are much like one another if we travel from server to server, yet they differ in what specific IP numbers, MAC addresses etc. are mentioned. We could easily generate such a file for our customer by asking them questions like "What is the IP number of your TFTP server?" "What are the MAC addresses and intended IP numbers of all network cards in the same VLAN the DHCP server is in, grouped by subnet and, within these groups, grouped by OS and boot method?" In order to store the answers to these questions, we could create a database.

However, by their nature some of the concepts we try to handle here are more suited to classes as used in object-oriented programming languages than to the rigidity of databases. If the ratio of number of tables to rows per table in a database is any measure of efficiency, the database doesn't give nice figures in this matter, and the effort to create it so that later data might be added, the effort to enforce business logic, and the effort to create and maintain the user interface(s) make this approach seem doomed to futility. Ontologies on the other hand are well suited to an approach that bears likeness to the classes of an object oriented programming language. It can easily handle IP numbers that stem from a list of servers no further specified and IP numbers that come from a list of network cards, handling them as IP numbers in both cases, but still distinguishing between IP numbers that we know as a property of a network card, and IP numbers that we know nothing more of.

An additional advantage of ontologies is that they offer data sharing across multiple repositories. If our customer wants to expand upon our knowlegde base, they are free to do so, and we can even reimport their knowlegde base back into our own, possibly designating it as read-only (see managing imports in protege-owl and also SeparatingClassesAndInstances).

The above links were reached from the Protege WikiHomePage, which is a useful source of information, as is the W3C page on Semantic Web Best Practices and Deployment Working Group. A nice graphical representation of the complexity of some sematic web tools and technologies is the Naive OWL Fragments Map (scroll to bottom of page).

The first issue I ran into is that of user-defined types. The list of available types in OWL is not as rich as that of most RDBs, and certainly not as easily extensible as object oriented programming languages. Where in my PostgreSQL databases I can define a column as being of datatype MAC-address or IP number (be it IPv4 or IPv6), I cannot do so in an OWL ontology. There is an effort to expand and extend the available data types in user-defined datatypes in protege, but it is of limited scale and it uses annotation properties of RDF, which are not visited by the reasoner, which makes the extended data types tag along instead of becoming part of the system. Right now, intricate data types cannot be created in an ontology (just as the cannot be in databases), and a mapping from interface data types to ontology data types is still necessary, as is the implementation of some business logic and data restrictions in that interface. We cannot define something as an IP number, we must store it as a simple string, use it as a string where we can, and when we need to do IP number arithmetic, we must convert it to an actual IP number in our interface. But as I discovered, we may not want these intricate data types anyway...

Another matter is that of unique identifiers. Protege can designate a property of a class as “functional”, which means that two instances with equal values in this property are inferred to represent the same instance. Whether this can be used in the same fashion as “primary key”s in databases needs some more reading and thought on my part. And then there are multi-column primary keys in databases, which allegedly are modeled with “Combined Inverse Functional Properties” in OWL. Haven't found a good source of information for this yet.

One more thing that is highly desirable to a systems administrator is knowing what information is present in what configuration files. As an example: if the IP number of our TFTP server changes, the DHCP configuration file will have to reflect this change. This might seem obvious, but often it isn't trivial to figure out all the places where a single change instantiates. So we would like to have a system that doesn't only generate configuration files, but also `knows' what information it needs to generate them. ^[14] If we are to pull this off, we might have to resort to an OWL full ontology, which cannot be handled by most reasoners, and which cannot be guaranteed to be handled by any reasoner in finite time. Or we can use one of the tricks described in the W3C draft Representing Classes As Property Values on the Semantic Web. In any case, we can reason about classes as values, or we can reason about classes, but we cannot reason directly about the values of properties, and current reasoners cannot reason about instances either. So we probably shouldn't care much about whether we can do AND operations on IP numbers here. That is to be done in the interface.

^[14] Another thing that we would like to keep track of is the interdependencies between machines, services and configuration files. E.g. what other services will stop if our DHCP config is broken. Although we can derive some of that information from a system that knows what information is needed for what config files, we cannot ultimately know what interdependencies are present in the actual system but not in the knowledge base, so we can never rely on such a system.


Some more useful CPU bookmarks		On moving anti-spam and viruschecking from an endangered host