Semantic Web

Semantic Web

This is the paper I presented in a paper presentation competition in Quest, an event organised by the ACM chapter of SVNIT.

The Current Scenario

Before beginning, lets first understand why there is a need for semantic web. Basically the Internet has grown exponentially since its inception. The amount of information being stored in the Internet is tremendous. But with such tremendous amount of information, its processing has become a tedious process. The main reason for this problem is the way pages are being rendered.

External Agents cant Access Information Easily
The way information is represented makes it difficult for an external agent to be able to analyze it and thus obtain any conclusion about it.

Inefficient Searching
The search engines that we use have algorithms that search the web pages based on some keywords. This approach produces quantitative results but the quality of the results is poor. Since keywords are matched, the semantics of the search query is usually not considered in the results.

HTML’s Limitations
HTML, i.e. Hypertext Markup Language is the language being used by millions of web pages to render web pages. Now, HTML, is a wonderful language to describe how a webpage should look. But it gives little information about the page and the various elements of the page. Thus it is very difficult for the machine

ENTER SEMANTIC WEB

With semantic web it is possible for machines(agents) to understand the information contained in web resources. Agents can interpret, analyze and understand the information and hence can be useful in decision making.

Once the Semantic Web exists it can describe what each piece of information is about and give semantic meaning to the content item. What this means is that searching would become easier, also it would encourage resources formed by a combination of multiple other resources. Thus users can find exactly what they are looking for. Organizations that provide web applications can clearly describe their application to the machine; using Web-based software agents, one can dynamically find these services and use them to one’s benefit or in collaboration with other services.


The Approaches Towards Semantic Web

  • Super Intelligent Agents
  • Semantic Publishing
  • Intelligent Agents + Semantic Publishing

Super Intelligent Agents

The first approach towards a semantic web would be the use of super intelligent agents, i.e Agents that can read huge chunks of non-semantically published data, analyze the data, and also, understand it. Various Artificial Intelligence scientists have been trying to develop such agents with successes. But the problem remains, that the results obtained after information processing by these super intelligent agents remains are highly unreliable. Several complex AI techniques are used for the development of these super agents. A super intelligent AI must be able to do:

  • Natural Language Processing
    o Sentiment Analysis
    o Interpreting vagueness using fuzzy logic.
  • Image Processing
  • Pattern Recognition
  • Relation Identification

The development of such a super intelligent agent, and then expecting it to produce reliable results, is a task that is not impossible but definitely very tedious. The feasibility of such an agent is also a question.
Examples : Wolfram computational engine

Semantic Publishing

As was discussed earlier, web pages are rendered using a language called HTML. HTML is great for describing how an element should be displayed but is very poor in describing meta information about that element.

By semantic Publishing it is meant that the creator/publisher of the information would use standardized means to publish the data. These standardized means provide a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient.

For facilitation of communication between different agents and resources on the web semantic interoperability is required. Syntactic interoperability is all about parsing data correctly. It requires mapping between terms, which in turn requires content analysis. It also involves identifying and defining relationships amongst resources.

Metadata
For semantic publishing it is crucial to have meta data(data about data). What it means is that there should be some data describing what the data is, its relations with other data etc. This makes it easy for an agent to detect patterns among data.

XML

(eXtensible Markup Language)
XML is a metalanguage for markup: it doesnot have a fixed set of tags but allows user created tags.
With XML it is possible to create structured web documents. It renders human readable an understandable documents but it doesnt explicitly describe elements making it difficult for agents to understand. It is still a better option than HTML.

RDF

(Resource Descriptive Framework)
RDF is basically a data model. It is used as a general method for description or modeling of information that is implemented in web resources, usually using an XML based syntax.

OWL
Web Ontology Language
“An ontology is an explicit specification of a conceptualization.”

—Tom Gruber, A Translation Approach to Portable Ontology Specifications

The analysis of information requires formal and explicit specifications of domain models, which define the terms used and their relationships. Such formal domain models are sometimes called ontologies. Ontologies define data models in terms of classes, subclasses, and properties.

OWL facilitates greater machine interpretability and understandability of Web content than that supported by XML, RDF, and RDF Schema by providing additional vocabulary along with formal semantics

A layered approach

Downward compatibility: An agent that understands a particular layer must understand the layers below it.
Partial upward understandability: An agent should try and understand try and take partial advantage of higher levels by interpreting understandable knowledge and neglecting elements beyond its schema.

The advantage of semantic publishing is that it is a standard, and thus is universally acceptable. So we can have agents extracting data out semantically published pages with relative ease. Also there are standard query languages like SPARQL that can be used to query semantically generated pages.

Semantic Publishing + Intelligent Agents

The ideal solution for a semantic web is to ensure that the content published is semantically generated. Also the agents should not only be intelligent enough to analyze and understand semantically published web pages, but they should also try and interpret pages that are published non-semantically.

Basically with semantically published data agents will be able to make use of:

  • Metadata: To extract information from web resources and identify them.
  • Ontologies: Will be used to interpret information and communicate with other agents
  • Logic: For drawing conclusions from the analysis.

The conclusions obtained can be used for a variety of purposes like searching, decision making etc.
Also, the combination of Semantic Publishing and intelligent agent enables the extraction of information from various resources and then in turn forming a new web resource out of it.


Phenotropic Approach

Phenotropic Programming
Phenotropic programming is a very abstract concept of programming, its still in very initial stages and is yet to be fully conceived. Jaron Lanier, a famous computer scientist, known for his pioneering work in the field of virtual reality, is known for incepting the idea of Phenotropic programming.
The word Phenotropic loosely translates to “surfaces relating to each other”. So basically various modules or independent programs interact with each other and learn about each other through these interactions.

For Semantic Web

Concepts of phenotropic programming can be used to develop super intelligent agents capable of analyzing and understanding non semantically rendered information.

Also phenotropic programs are designed in such a way that pattern recognition is used to render the program instead of syntactic parsing. Hence the advantage of phenotropic approach over a protocol based approach is that approximate matching is possible, i.e. exact matching is not necessary.

So basically we can have layers of web pages/applications that work seamlessly with each other i.e. There is a parallel communication by a way of surfaces. Thus various web pages develop insights, understand and communicate with each other. This type of interaction can help in achieving semantic-ism over web.

If phenotropic programming is developed, we might see applications and web resources interacting with each other at an User interface level instead of using APIs to facilitate communication.

About Ganesh

Leave a Reply

Your email address will not be published. Required fields are marked *