Senator Hiram Johnson famously quipped that
"the first casualty when war comes is the truth." As the war in Iraq
continues, is the White House intentionally preventing search engines
from preserving a record of its statements on the conflict? Or, did
their staff simply make a technical mistake?
When search engines "spider" the web in search of documents for their
indices, web site owners sometimes put a file called robots.txt which
instructs the "spiders" not to index certain files. This can be for
policy reasons, if an author does not want his or her pages to appear
in search listings, or it can be for technical reasons, for example if
a web site is dynamically generated and can not or should not be
downloaded in its entirety.
According to reports,
though, the White House is
requesting that search engines not index certain pages
related to Iraq. In addition to stopping searches, this prevents
archives like Google's cache and
the Internet Archive from
storing copies of pages that may later change. 2600
called the White House to investigate the matter.
According to White House spokesman Jimmy Orr, the blocking of search
engines is not an attempt to ensure future revisions
will remain undetected. Rather, he explained, they "have an Iraq
section [of the website] with a different template than the main
site." Thus, for example, a press release on a meeting between
President Bush and "Special Envoy" Bremer is available in the
Iraq template (blocked from being indexed by search engines) or the
normal White House template (available for indexing by search
engines). The attempt, Mr. Orr said, was that when people search,
they should not get multiple copies of the same information. Most of
the "suspicious" entries in the robots.txt file do, indeed, appear to
have only this effect.
According to the robots.txt of October
24, though, the In Focus: Iraq
section of the site was blocked from search engines. Some of the information
there does not appear
to be available anywhere else on the White House site. However, it
seems that, in response to inquiries from 2600 and other
sources, the White House web team has recently changed their robots.txt so that
these files are no longer blocked. (The current Last-Modified
date on the robots.txt is 23:22 GMT,
October 27th, after work on this article had already begun.)
It is of course open to speculation as to whether the original
blocking of the content in question was malicious or an honest
mistake. Certainly anyone who maintains a large website has made some
sort of technical mistake at least once, and the promptness with which
the error was fixed after it was pointed out suggests that the White
House had no interest in keeping it in place. The White House, as an
entity responsible to the citizenry and an entity that has generated a
lot of criticism over its handling of the situation in Iraq, ought to
take special care to avoid similar mistakes in the
future. Nonetheless, we are pleased to learn that, at least this time,
the issue seems to have been resolved promptly.