Query string-driven URL schemes considered annoying. (Why you should use path info instead.)

Posted on: May 17th, 2005 4:52 AM GMT

By: Greg Reimer (Code Monkey Extraordinaire)

Topic: tech, web development, programming

Perhaps I should have labeled this one “Query String-Driven URL Schemes Considered Harmful”, but you're not shooting yourself in the head by using query strings, just the foot. Let me start by giving you the high-level, nerd-oriented version of the point, and if you're quick on the uptake or I'm just preaching to the choir, then maybe you can skip the explanation:

Query string dependent URL schemes for content-driven websites reduce the site's signal to noise ratio by burying the semantic value of a URL—the pointer—inside the query string. Specifically, such URLs are less usable to the infrastructure of tools that make the web open and transparent.

Explanation:

A query string dependent URL scheme:

http://example.com/showpage?pageID=bad_hair_day

Which contrasts with a path-based URL scheme:

http://example.com/pages/bad_hair_day.html

It's tempting to use the former approach because it appeals to our programmer sensibilities. URLs become a parameterized function call. Plus, the more traditional path-based URL schemes smack of "static" websites. It's therefore tempting to use query strings to flaunt the fact that the website is dynamic and not static (God forbid).

Whatever the case, the temptation should be resisted! A URL that references a piece of content, be it a discussion thread, a blog posting, an article, an image, whatever, should look like a regular path-based URL.

Pollution in the ecosystem

Granted, a user sitting in front of a browser doesn't care about URL schemes. But people surfing web pages through browsers is only the end result of a much larger process. On the web, a whole ecosystem exists of spiders, analyzers, mirrors, caches, fetchers, savers, archivers and other tools. Directly or indirectly, these tools extend the broadcast range of the web to a much greater audience. In a nutshell, this ecosystem of tools puts your content in front of users. Google is a prime example.

Query string based URL schemes are like pollution in the ecosystem. Tools don't know how to deal with them. How does a program know the difference between these two URLs for example:

http://example.com/script?foo=bar&spam=eggs
http://example.com/script?spam=eggs&foo=bar

How can a search indexer be sure that if it spiders every query string link on your site, it won't generage an infinite number of variants and use up all its memory? It doesn't. Query string based URLs are a huge field of uncertainty and are often ignored or truncated, which translates directly into into problems for website owners.

The alternative

If you're writing a content management system, be it a blog, discussion forum, wiki or whatever, consider using pathinfo to map URLs to content. If you have a servlet mapped to /blogs, and a request is made to /blogs/phil/posts/5162005.jsp, then the string "/phil/posts/5162005.jsp" is passed to the /blogs servlet as pathinfo. The code to integrate this into an MVC framework almost writes itself. If I were writing a Java-based content management system from scratch, I'd use path based URLs to link to content (e.g. /blogs/phil/posts/5162005.jsp), and I'd use a blended approach for user actions (e.g. /blogs/phil/console.jsp?command=logout). PHP has the same capability. Visit my archive page and click on one or two items to see it in action.

Finally, here are code samples showing how to access pathinfo in both JSP/Servlet and PHP:

<%
// JSP
String pinfo = request.getPathInfo();
%>

<?php
// PHP
$pinfo = $_SERVER['PATH_INFO'];
?>

weblog home »
show all posts »

Query string-driven URL schemes considered annoying. (Why you should use path info instead.)

Explanation:

Pollution in the ecosystem

The alternative

location:

Navigation