me

Semantic Web

October 3rd, 2010

This is the paper I presented in a paper presentation competition in Quest, an event organised by the ACM chapter of SVNIT.

The Current Scenario

Before beginning, lets first understand why there is a need for semantic web. Basically the Internet has grown exponentially since its inception. The amount of information being stored in the Internet is tremendous. But with such tremendous amount of information, its processing has become a tedious process. The main reason for this problem is the way pages are being rendered.

External Agents cant Access Information Easily
The way information is represented makes it difficult for an external agent to be able to analyze it and thus obtain any conclusion about it.

Inefficient Searching
The search engines that we use have algorithms that search the web pages based on some keywords. This approach produces quantitative results but the quality of the results is poor. Since keywords are matched, the semantics of the search query is usually not considered in the results.

HTML’s Limitations
HTML, i.e. Hypertext Markup Language is the language being used by millions of web pages to render web pages. Now, HTML, is a wonderful language to describe how a webpage should look. But it gives little information about the page and the various elements of the page. Thus it is very difficult for the machine

ENTER SEMANTIC WEB

With semantic web it is possible for machines(agents) to understand the information contained in web resources. Agents can interpret, analyze and understand the information and hence can be useful in decision making.

Once the Semantic Web exists it can describe what each piece of information is about and give semantic meaning to the content item. What this means is that searching would become easier, also it would encourage resources formed by a combination of multiple other resources. Thus users can find exactly what they are looking for. Organizations that provide web applications can clearly describe their application to the machine; using Web-based software agents, one can dynamically find these services and use them to one’s benefit or in collaboration with other services.


The Approaches Towards Semantic Web

  • Super Intelligent Agents
  • Semantic Publishing
  • Intelligent Agents + Semantic Publishing

Super Intelligent Agents

The first approach towards a semantic web would be the use of super intelligent agents, i.e Agents that can read huge chunks of non-semantically published data, analyze the data, and also, understand it. Various Artificial Intelligence scientists have been trying to develop such agents with successes. But the problem remains, that the results obtained after information processing by these super intelligent agents remains are highly unreliable. Several complex AI techniques are used for the development of these super agents. A super intelligent AI must be able to do:

  • Natural Language Processing
    o Sentiment Analysis
    o Interpreting vagueness using fuzzy logic.
  • Image Processing
  • Pattern Recognition
  • Relation Identification

The development of such a super intelligent agent, and then expecting it to produce reliable results, is a task that is not impossible but definitely very tedious. The feasibility of such an agent is also a question.
Examples : Wolfram computational engine

Semantic Publishing

As was discussed earlier, web pages are rendered using a language called HTML. HTML is great for describing how an element should be displayed but is very poor in describing meta information about that element.

By semantic Publishing it is meant that the creator/publisher of the information would use standardized means to publish the data. These standardized means provide a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient.

For facilitation of communication between different agents and resources on the web semantic interoperability is required. Syntactic interoperability is all about parsing data correctly. It requires mapping between terms, which in turn requires content analysis. It also involves identifying and defining relationships amongst resources.

Metadata
For semantic publishing it is crucial to have meta data(data about data). What it means is that there should be some data describing what the data is, its relations with other data etc. This makes it easy for an agent to detect patterns among data.

XML

(eXtensible Markup Language)
XML is a metalanguage for markup: it doesnot have a fixed set of tags but allows user created tags.
With XML it is possible to create structured web documents. It renders human readable an understandable documents but it doesnt explicitly describe elements making it difficult for agents to understand. It is still a better option than HTML.

RDF

(Resource Descriptive Framework)
RDF is basically a data model. It is used as a general method for description or modeling of information that is implemented in web resources, usually using an XML based syntax.

OWL
Web Ontology Language
“An ontology is an explicit specification of a conceptualization.”

—Tom Gruber, A Translation Approach to Portable Ontology Specifications

The analysis of information requires formal and explicit specifications of domain models, which define the terms used and their relationships. Such formal domain models are sometimes called ontologies. Ontologies define data models in terms of classes, subclasses, and properties.

OWL facilitates greater machine interpretability and understandability of Web content than that supported by XML, RDF, and RDF Schema by providing additional vocabulary along with formal semantics

A layered approach

Downward compatibility: An agent that understands a particular layer must understand the layers below it.
Partial upward understandability: An agent should try and understand try and take partial advantage of higher levels by interpreting understandable knowledge and neglecting elements beyond its schema.

The advantage of semantic publishing is that it is a standard, and thus is universally acceptable. So we can have agents extracting data out semantically published pages with relative ease. Also there are standard query languages like SPARQL that can be used to query semantically generated pages.

Semantic Publishing + Intelligent Agents

The ideal solution for a semantic web is to ensure that the content published is semantically generated. Also the agents should not only be intelligent enough to analyze and understand semantically published web pages, but they should also try and interpret pages that are published non-semantically.

Basically with semantically published data agents will be able to make use of:

  • Metadata: To extract information from web resources and identify them.
  • Ontologies: Will be used to interpret information and communicate with other agents
  • Logic: For drawing conclusions from the analysis.

The conclusions obtained can be used for a variety of purposes like searching, decision making etc.
Also, the combination of Semantic Publishing and intelligent agent enables the extraction of information from various resources and then in turn forming a new web resource out of it.


Phenotropic Approach

Phenotropic Programming
Phenotropic programming is a very abstract concept of programming, its still in very initial stages and is yet to be fully conceived. Jaron Lanier, a famous computer scientist, known for his pioneering work in the field of virtual reality, is known for incepting the idea of Phenotropic programming.
The word Phenotropic loosely translates to “surfaces relating to each other”. So basically various modules or independent programs interact with each other and learn about each other through these interactions.

For Semantic Web

Concepts of phenotropic programming can be used to develop super intelligent agents capable of analyzing and understanding non semantically rendered information.

Also phenotropic programs are designed in such a way that pattern recognition is used to render the program instead of syntactic parsing. Hence the advantage of phenotropic approach over a protocol based approach is that approximate matching is possible, i.e. exact matching is not necessary.

So basically we can have layers of web pages/applications that work seamlessly with each other i.e. There is a parallel communication by a way of surfaces. Thus various web pages develop insights, understand and communicate with each other. This type of interaction can help in achieving semantic-ism over web.

If phenotropic programming is developed, we might see applications and web resources interacting with each other at an User interface level instead of using APIs to facilitate communication.

Open source facebook launching soon

August 31st, 2010

Diaspora is a sort of open source version of facebook. Yeah I am sure it sounds cool…I mean its gotta be cool…its OPEN SOURCE! Well nah! I am not that excited. Its time the open source community started encouraging more innovative and interesting ideas instead of building polished clones of same old ideas.

PS: I am open source enthusiast. I publish open source designs and codes :)

New Design: Eclipse

July 20th, 2010

The design I had been using for this blog is now out. Its free and open source you can download it from Opendesigns. It is third in the Ganesh## series B)

thumbnail1

Preview | Download

The wordpress theme will be up soon :)

Programming using a language? Think again!

July 16th, 2010

I want to concentrate the things I do, not the magical rules of the language, like starting with public void something something something to say, “print hello world.” I just want to say, “print this!” I don’t want all the surrounding magic keywords. I just want to concentrate on the task. That’s the basic idea.

- Yukihiro Matsumoto (Matz) founder of Ruby

We have all been part of programming language debates: Ruby vs Python, PHP vs Perl, C vs C++ and so on… but I guess one thing thats common is that all these are programming languages. Sure they differ in paradigm, in syntax and in lot of other things. But one thing remains common: we are expressing the algorithm in the form of a written code. That remains same in all the languages.

When we write code in a programming language…what we do is write simplified code, abiding by certain principles, that in turn is converted into complex code(machine language code) by the compiler. So instead of writing this supposedly simple code I was thinking about other simpler and efficient alternatives available.

Basically there must be an alternative to writing code. I mean another way execute and express an algorithm.

One of them is Visual Programming. In visual programming, programming is done by using a graphical approach. Lets see what wikipedia has to say:

A visual programming language (VPL) is any programming language that lets users create programs by manipulating program elements graphically rather than by specifying them textually… Many VPLs are based on the idea of “boxes and arrows,” where boxes or other screen objects are treated as entities, connected by arrows, lines or arcs which represent relations.

I was first introduced to Visual programming three years ago in the form of Mindscript. Thinking about an alternative to programming languages I recently started experimenting with it(with little success). It seems to be in an abandoned state right now, considering the last update was four years ago. There is clearly a lack of documentation and support for this project else it could have flourished(could have had wings and flown…OMG!)

mindscript

Its like a flowchart being executable. Wouldnt that be awesome(:P)? I know lot of people(including me) prefer code over flowcharts but that might be because we havent made many flowcharts whereas we’ve written thousands of lines of code.

Sure I dont see visual programming replacing programming languages anytime soon but I do believe that we should give them a shot. Also we should look for other means of programming. And yeah I’d love to hear from you(cliched?) so please comment!