PHP Classes

Searchonomy, folksonomy based on search keyword tags

Recommend this page to a friend!
  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Searchonomy, folksono...   Post a comment Post a comment   See comments See comments (3)   Trackbacks (0)  

Author:

Viewers: 3

Last month viewers: 1

The PHPClasses site is introducing a new feature that will help the users find what they are looking for.

The site is now able to automatically tag its pages with keywords entered by the users in the site searches.

The top rated class packages are listed in the pages associated to each keyword tag, so the users can easily find the most appreciated packages for the topics they are interested.




Loaded Article
Contents

* I still haven't found what I'm looking for
* Automatic tag assignment to site pages
* Taxonomy versus folksonomy
* Searchonomy
* Improving the results


* I still haven't found what I'm looking for

Once in a while the PHPClasses site receives messages from users complaining that they are not able to find what they are looking for. Although that is a symptom that is easy to understand, the solution for the problem is not so evident.

Using the site search, the users are lead to pages that have the words about what they are searching for. Once in those pages, sometimes the users realize that they have not found what they are looking for. Obviously, what they search may be other pages. But where are those other pages?


* Automatic tag assignment to site pages

Some time ago I wrote an article for Zend DevZone about automatically creating tag clouds from the content of the pages of a site.

devzone.zend.com/node/view/id/1530

The idea consists of using the class Automatic Keyword Generator of Ver Pangolio to automatically assign tags to text document, which can be the content of a page.

phpclasses.org/autokeyword

Once you assign tags to all pages of your site, you can list the most popular tags in a tag cloud. Each tag in the cloud links to a page that lists all pages with the same tag. That way you can see all pages associated to the same tag.

You may also display in each page tag links pointing to the pages that list all site pages with the same tag. This way, the user may find other pages about the same topics, where he may eventually find what he is looking for.


* Taxonomy versus folksonomy

But which are the right tags to assign to each page?

The Automatic Keyword Generator uses a criteria based of the frequency of certain words of the text. That may work well in many cases, but the fact that the resulting tag keywords were not suggested by real site users, in some cases may result in tags that are really not so relevant to the users.

Taxonomy is the practice of classification of object into categories. For instance, in the PHPClasses site, each submitted class is included in a group according to its purpose: databases, e-mail, Web services, e-commerce, etc.. In this case the classification is defined by the site moderator.

The Automatic Keyword Generator class also provides means to classify Web pages. In this case the classification is defined automatically based on the text content.

Some of the so called Web 2.0 sites, like for instance del.icio.us, present groups of tags in tag clouds. These tag clouds are based on tags assigned by the users. This is also a form of taxonomy but it has a special characteristic, it is based on the user input.

Tim O'Reilly introduced the a new word to define a taxonomy based on the user input: folksonomy - taxonomy defined by the "folks" (the users). In his Web 2.0 definition, Tim explains that folksonomy is one of the features that distinguishes Web 1.0 from Web 2.0 sites.

oreillynet.com/pub/a/oreilly/tim/ne ...


* Searchonomy

Content tagging based on user input is a great to achieve useful classification of site content.

In the past, I already thought of providing a content tagging system for the PHPClasses site. However I am afraid it would take a lot of time to build an useful list of tags for each page of the site.

I may still be back to the content tagging system later. Meanwhile I had a different idea to tag the site pages automatically. This idea is also based on the user input. It only took me few days to implement.

The idea that I had is simple. When the users go in the search pages, they enter a few keywords associated to the content they are looking for.

If I associate the pages that appear in the search results to the search keywords, that is also a way of classifying content - taxonomy.

Since it is based on the user input - the search keywords are entered by the users - I guess it is fair to call it a form of folksonomy.

I do not like to invent words to describe things that already have a name. However, just like folksonomy was invented to describe a better way to classify content, I thought I should give my approach a new name too.

Combining the words "search" with "folksonomy" I reached the word "searchonomy". It is not quite a beautiful sounding word, so I am open to better name suggestions.

You can see already several pages in the PHPClasses site tagged with some keywords. Currently only the class package pages are tagged. In the future other types of pages will also be tagged.

For now, you may look for instance here for an example of class page with some tags. The tags appear in the first section with the label "Search tags".

phpclasses.org/browse/package/1.htm ...


* Improving the results

The first experiences with this new content classification approach were not very satisfactory. Some pages got too many tags. So had to limit the number of tag keywords to 4. But which would be the best 4 tags to classify each page?

For now I am considering only the most searched keywords. The results started looking more interesting.

Another problem is that some not so relevant keywords were being assigned to certain pages. So I had to restrict the sections of the pages that would be considered to index the site. The results got better but there are still some open issues.

For instance, should the name of the authors be considered to tag a class package page? Maybe. The truth is that some users search the site using the names of authors that are known to publish in the site. This is a tough question to decide.

Another aspect is what appears in the pages to which the tags are linked. Initially I thought of listing all the site pages with the same tag. However, that would result in a huge list of packages. I do not think that would be useful to help the users finding what they are looking for.

An alternative solution that I tried is to list only the top user rated class packages. Now the package lists are small and useful for the users looking for something well appreciated.

On the other hand many potentially interesting packages are not being listed, despite they have the same tags.

Well, as you may see, this is still an open subject. I am sure there is plenty of room for improvement. If you have more ideas to make this system better, feel free to follow-up posting a comment to this article.



You need to be a registered user or login to post a comment

Login Immediately with your account on:



Comments:

1. Searchonomy - Ulrich Babiak (2007-03-28 23:03)
This solution only turns keywords into tags... - 2 replies
Read the whole comment and replies



  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Searchonomy, folksono...   Post a comment Post a comment   See comments See comments (3)   Trackbacks (0)