Semantic Web: Who should we tag today?

I've been doing some further reading surrounding the semantic web. I feel FeedTagger has something important to offer and I am trying to work out how and why. Read on for my thoughts on what will make the semantic web tick...

At the moment technorati allows bloggers to specify tags that relate to particular posts they make. Sure, it's a good start, but technorati are approaching the semantic web backwards. The publisher is describing the content of the particular post which is no different to meta tags that have been around for years. I'm more interested in how other people describe the content.

A tagger of the semantic web should be an external entity looking in. I can't trust the author of an article when he says it's written about 'microsoft', but I can trust google when it has established the context of the page in relation to the other 100 pages that link to it. Even better I can trust 1000 separate individuals that have all tagged the article as 'microsoft', 'longhorn'. A certain level of trust about the content of an article can be established when others tag a souce. This is not possible when relying on the author alone.

An Irish blogger made an interesting comment backing up my theory. He says:
"I think the best part of the Technorati tagging system is the harvest done of other sources such as del.icio.us, furl and Flickr. Whenever I click into I find the most useful stuff coming from the self-titled tags on Flickr and the social bookmark systems to be better than the hand-rolled tags made specifically for Technorati."

If we take 1000 people and they all attempt to quantify the semantics of a particular object or data set, a computer has a wealth of knowledge previously unavailable. Not only do we know the article is about 'microsoft' we also have 10's maybe 100's of other keywords that can help establish relationships between this article and other articles with similar keywords. Similar results can be achieved through establishing context ala google, but with tagging - nothing beats a human.

Tagging our semantic web now boils down to:- "How do we get 1000 people to all describe the same object?"

People are naturally lazy, they will only go to the effort of doing something if they see some benefit in doing so. We need to develop interactive web applications that offer incentives to users for applying semantics to information. The main incentive is ease of use in finding information they have previously sought or information they want to track regularly.

del.icio.us has done this well. flickr supports tagging, but will never have such powerful semantics as the author is the tagger. There's not enough incentive for other users to go around and tag other people's photographs. I'm moving towards applying semantics to RSS/Atom feeds. What else is out there we can apply some meaning to?

FeedTagger: Progress Update

After reluctantly taking FeedTagger offline due to numerous hosting issues, I've been fairly quiet. The good news though, is that I've been very busy preparing FeedTagger for re-launching with a brand new interface and many improvements.

I have been generously offered a week for free in an Adelaide data centre to test server load and bandwidth issues. This should provide some solid statistics and debugging information. To give you a rough idea, in the short time FeedTagger was operational there were over 10,000 unique feeds added. Some rough maths highlights my problem:
(10000 feeds) X (25KB average) X (24 updates/day) = 6G incoming traffic/day !

For this reason alone FeedTagger v2 will have adverts provided by Google's Adsense program. I've tried to make them prominent enough so people will click them, but non-intrusive at the same time.

As for a timeline for when FeedTagger will be back up? It's hard to say, but I'm trying to get it up ASAP. Although I'm cautious not to make it available before it's really ready. Hopefully within a week, two weeks at the latest.

To give you an idea of the frontend changes I've taken 3 screenshots:
The backend has had a bunch of improvements, with all the feed processing now being done with Python - a screenshot isn't going to highlight these though :)


Feedtagger colour scheme and layout

Feedtagger is currently using a layout and colour scheme surprisingly similar to this blog. In fact I originally used this blog layout as a template to speed up prototyping the feedtagger concept.
Most of the feedback I have received has supported the layout, but I received one email commenting that he prefers the black text on white background, much like a newspaper. I think he has a valid point, there are many attractions to the simple layouts and colour schemes employed by Google.
I have no qualms about re-designing feedtagger - it will be a good opportunity to start using more screen real estate. I am concerned however that:
a) Existing users accustomed to the current look, will be turned off Feedtagger by a new design (especially so early in the piece)
b) I'm not much of a designer. I know what I want when it comes to establishing a good UI, but making it look good is not a strong point. Hence why I used an existing template in the first place.

There seem to be several options:
  1. Do a complete re-design and make sure it doesn't look too shabby
  2. Mould the extisting design more towards the look I want in progressive steps (hard if major colour changes occur)
  3. Make the whole site (at least the feed display) a template much like blogger. Attempt to offer my new design and the old design as two default templates, while allowing users to build their own.
Option (3) is probably the best solution, but involves the most work. I'll keep working though my (ever growing) feature implementation list and ponder it a little more I think.

Any other suggestions?


Slashdot Effect

My recent project: feedtagger.com - I am generating an alarming amount of content very quickly. I am in the process of establishing how/if I can try to get some publicity without the whole system falling over.
The main issues that need to be resolved include:
  • Managing the 1000's of feeds as they're entered into the system and updated every hour
  • Serving up very database heavy content (every search/tag/feed view is coming from a database)
I figured that the slashdot effect is a good benchmark. If feedtagger.com can survive the slashdot effect then it should be setup well enough for day-to-day operations. I've found a good article at geek.com that discusses their experiences with the /. effect that is quite interesting.

The result? With a modern web server and plenty of bandwidth it should be quite managable. The large amount of database processing involved for feedtagger.com will have a negative effect, but with a whole machine at my disposal I will have many additional options to index the database for speed.


feedtagger.com & PHP5

After "launching" (aka telling a couple of people) feedtagger.com I have come across an annoying problem. I am using an RSS/Atom PHP Library called Magpie that is quite good. It uses the inbuilt XML parsing capabilities of PHP5, but unfortunately there is a bug in these XML libraries. As a result whenever my automatic cron job processes all the RSS/Atom feeds in the database it will occasionally "hang" on some feeds.
By hang it causes my hosting provider to email me saying they've cancelled all my cron jobs as their server load has jumped from 0.5% to 50% (The bug causes an infinite loop).
As the problem lies within PHP5 itself I'm in a really tough position. The bug has been fixed in CVS, but I have to wait for the next "stable" release before it is in a state that my web host will upgrade on their servers.

What am I supposed to do? At the moment I'm relegated to regularly checking if the auto-processing is working or not and flagging offending feeds to "skip".

In the meantime I guess I'll work on improving the UI and adding other cool features. (Maybe even clean up and organise the code too)

.... serves me right for wanting to use PHP5 as soon as it came out (or for not doing the whole thing in Python as I breifly considered)


feedtagger.com launched!

I have just launched feedtagger.com - possibly the fastest concept to creation ever (was it less than a week?)

What's it do?
  • Aggregates news from unlimited sources
  • You can apply multiple tags to each source - helping you find what you're interested in
  • Can browse by feed
  • Server based so feeds are updated automatically so you never miss anything!
  • Unread items are highlighted and displayed first
  • Very quick and dynamic
As it was quickly churned out I still want too do the following ASAP:
  • Add support for importing all your existing feeds
  • Allow use of templates for modifiying feed display
  • Actually provide information on the front page explaining what feedtagger.com is and how it works
Not really sure what this is all about? Sign up for an account and then google for something that interests you. If I was into cycling I would search for:
filetype:rss cycling
filetype:atom cycling

Grab one of the links (ending in .rss or .atom) and place it into your account (when logged in click "feeds" under "manage"). Add any descriptive tags (perhaps "cycling,blogs") and your away!
Just refresh your feed and you should start to understand what this is all about.

Go and sign up now!


Web Application Concept

I've had an idea for a while now (4-6months) of building a web application that would really extend the boundaries - much more so than gmail, google maps, flickr etc. The idea involves combining the power of web databases with rich web interfaces to provide a number of important tasks.

Just for a moment imagine yourself as the manager of a small business that has several IT requirements:
  • Keeping IT infrastructure costs low
  • Managing inventory of products
  • Communicating with customers through email
  • An online store to sell inventory
In addition your staff does not have an IT background, but is still adept with typical office and internet operations.

Current Solution
As the manager of this business you have two main options.
  1. Employ some additional staff to build from scratch or modify off the shelf software to achieve the various requirements
  2. Locate an external contractor to come in and build said requirements, charging two arms and two legs
These two options are very expensive for any small business, but could provide significant enhancement to the business. Especially reducing costs through using an online store and providing enhanced communication with customers via email lists etc.

Alternative Solution
Using the power of existing web technologies the manager of the business mentioned above could use an online business management service. He/She could login to www.mybusiness.com and use an online application to assist building the requirements already outlined.

The application would be similar to Dreamweaver / MS Access being run through a web browser. I could select "New" from a file menu and select from a number of pre-built templates:
  • Online store
  • Inventory manager
  • Mailing list centre
Each of these templates would provide a stock database structure with well established relationships and a corresponding templated web interface. Using a MS Access like interface the user could manipulate the database to add fields specific to their industry, ideally including some business logic. Perhaps they could employ an external contractor for this (cost wouldn't be prohibitive). Next they move to an "Interface" tab where they use a Dreamweaver like interface to restructure the look and feel of their online store - in addition adding support for the new elements now in the database.

Almost overnight this business is now able to do business over the internet, track customers purchasing habits and send emails to regular customers with the latest specials. Additionally the organisation could use very cheap computers (1-2Ghz era) on any Operating System (no MS licensing fees) to run their whole business through a web interface.

I have only very briefly outlined this application without going into too much detail. The complexities in building a desktop application such as Dreamweaver are quite large, let alone trying to do it within a web browser environment.

A lot of working I've been doing has revolved around DHTML with XMLHttpRequest communication, but I doubt that could really withstand such a complex application. Some sort of Flash/XUL basis would probably be much better suited to the task