Archive of the Technology Category

Microsoft and OOXML

Wednesday, April 18, 2007

I've had two recent run-ins with the new docx files being produced by Word 2007. In both cases, an unsuspecting user created a document and emailed it out thinking that as Word document everyone should be able to read it. Of course, no one was.

At the core of docx is Microsoft's new "open standard" Office Open XML. Being XML, I was able to just strip out the text in the first case. With the second document, those involved wanted the formatting preserved. I poked around a bit and discovered that Microsoft had released a patch for older versions of Word. I installed it and opened the file but all the formatting disappeared. I then tried one of those online conversion pages and received a much better result. Microsoft can't even implement their own standard. That's really not that surprising given the size and complexity of the specification. It seems like they don't want anyone else to implement it either which defeats the whole purpose of an open standard. There's a good write-up of some of the problems with the standard here. Oh, and it also may become an ISO standard soon.

Posted in April 2007 | Comments (2)

Blog as Discourse Medium

Sunday, August 27, 2006

In his book Amusing Ourselves to Death, Neil Postman presents the thesis that communication mediums have natural biases with regard to the discourses that they enable and encourage. For example, text messaging is not conducive for creating significant literature. Also, philosophy only came about when man started to put his thoughts down in writing and thereby allowed them to be analyzed. Postman spends the majority of the book arguing that television as a communication medium does not encourage serious discourse. His best analysis deals with television as mass media. While it is possible to videotape great lectures in biology and history and so on, television has to appeal to the lowest (or at least a much lower) common denominator of the public since it is driven by advertising. According to Postman, this results in television programming being so focused on entertainment that even serious subjects like politics and religion are trivialized by theme songs and an emphasis on visual appearance. Regardless of whether you agree with his conclusions, this is a useful framework for evaluating other discourse mediums such as the blog.

Postman believed that print is the best communication medium because it encourages critical thinking. This does not necessarily mean that all print mediums raise the level of discourse. Blogs are certainly a print medium, and some have offered up the opinion that they are positively affecting public discourse. I want to challenge this claim. I am primarily interested in analyzing blogs as a popular medium (those with thousands of hits a day).

Too many participants
Popular blogs have thousands of readers and hundreds of commenters. The individual or team that authors one of these blogs will typically post at least one new essay a day. The readers of the blog receive rapid notification of the new post through mechanisms like RSS feeds. This often causes a feeding frenzy effect for many, many comments are made in a short period of time. The large number of participants in the discussion makes it difficult for a serious, in-depth discussion. It is impossible for the author(s) to respond to each commenter due to their number. Many choose to respond rarely or not at all. The large number of commenters results in many diverse side topics being introduced. The number of participants also discourages long responses but rather encourages short, witty comments in order for the commenter to attract the attention of fellow readers. I imagine that it sometimes resembles salon discussions when the number of participants grows too large. There ends up being a lot of humorous remarks and laughter, but not much serious discourse.

Rapidity of discourse
With at least one new essay a day being posted on the blog, any single topic has a short lifetime. Even if the same topic is continued by the authors, it rarely extends beyond a few days. This encourages quick responses (as does the previous item) which is antithetical to serious discourse. A serious discourse requires thought and reflection. Facts need to be gathered, arguments sharpened, and assumptions challenged. A short time line makes this near impossible. Instead, a popular blog community can appear like a group of young children who become bored with a game after an hour and move on to the next one and then the next one and so on.

Lack of authority
While the authors of a popular blog often (sometimes?) have some claim to authority in the field of interest, commenters usually don't. Nothing is required to comment on most blogs beyond Internet access and an email address. There is no filter process to prevent the uninformed from taking an active role in the conversation. It is possible for this to be a sort of benefit if everyone would check the veracity of presented claims, but that is quite unlikely. The commenters also participate on a level playing field that has its advantages and disadvantages. The disadvantage being that an authority is not guaranteed of any more notice in the discussion than a junior high student. While the current system for publishing books and journals is far from perfect, it does perform some needed filtering of the noise of the masses.

There are positive things to say about blogs as a discourse medium. They present a vast opportunity for exposure to new ideas and information. These ideas and perspectives can force you to think. I should also mention the low cost for access. For all their good, I don't think we are warranted in claiming any great increase in serious discourse resulting from blogs.

Posted in August 2006 | Comments (5)

Front Door Frustration

Tuesday, January 3, 2006

I recently purchased my first new car. This has opened up a new world of technology to me. The car tells me when to change the oil. It automatically turns on my headlights when it gets dark. The car also came with one of those keychain fobs that locks/unlocks the doors. Unfortunately, I am discovering that the front door of our house does not respond to the keyless remote no matter how many times I try.

Posted in January 2006 | Comments (2)

RSS Bible Verses

Sunday, September 18, 2005

I receive a fair number of visits from people looking for RSS Bible verse services. These are often provided by the publisher of a particular Bible version and send out a new verse each day through RSS. Here are two sites that I have tried or know about:

There are also many that send out a short devotional with one or more verses like the IBS.

Posted in September 2005 | Comments (0)

Honeypots for Spammers

Tuesday, August 30, 2005

I saw this site recently. It is a collaborative effort to identify servers hosting email address harvesters for spammers. They do this by dynamically creating email addresses that encode the time and visitor IP address. They have some interesting statistics on their site along with advice for avoiding spambots. I have seen some nice research efforts in the past using honey pots to analyze hacker techniques and monitor worm activity.

Posted in August 2005 | Comments (0)

Clustering for Web Searching

Saturday, June 11, 2005

In the early days of the web, yellow page-like listings (e.g. Yahoo) were the primary method of locating information. Then came spiders collecting web pages to create repositories that could be searched using keywords (Lycos, AltaVista). These search engines did not take advantage of the connected nature of the web. Google changed that by ranking a page based on the quantity and quality of the pages linking to it.

What is next is still an open question. The most common answer is clustering. Teoma (bought by Ask Jeeves) was one of the earliest search engines to use clustering. Vivisimo and its sister search engine, Clusty, are two more examples. Now Yahoo is jumping on the bandwagon with Mindset.

The advantage of clustering is the possibility of teasing out semantic information by grouping pages based on their content and how they link to each other. If someone were to search for "Lincoln", does she want a biography of Abraham Lincoln or perhaps information about Lincoln, Nebraska or maybe tickets for Lincoln Center. Clustering allows the search engine to present some basic categories so that the user can easily refine her search. In the Lincoln example, the user could limit her search to only pages in the Lincoln Center cluster so she can get directions and concert times. Yahoo's Mindset is simpler in that it restricts the clusters to shopping or informational groupings. Right now, clustering search engines do not look like a threat to Google's dominance.

Posted in June 2005 | Comments (1)

GoogSpy

Tuesday, June 7, 2005

To demonstrate their web data extraction capability, a company has created a search engine called Googspy that allows a user to see who is paying for what keyword in Google's Adwords. I don't know how "smart" their spider is, but grabbing this data through brute force is not that difficult given enough computing power and storage space. The results are interesting. For example, Microsoft pays for keywords ranging from apple software updates to suse linux to spanish culture.

Posted in June 2005 | Comments (0)

Graph Visualization for the Web

Monday, May 9, 2005

Warning: this is addictive.

A colleague of mine showed me this. It uses the same type of visualization technology as the previous technology entry. This one lets you see the connections between web sites. It uses Google's clustering algorithm to create the graph. Java is required.

Posted in May 2005 | Comments (0)

Visual Thesaurus

Tuesday, May 3, 2005

This is not an example of cutting-edge technology, but rather a case of what happens when two interesting technologies are combined. It is an interactive, visual thesaurus. The two technologies that it uses are graph visualization and WordNet, which is a database of synonyms that captures word semantics. I don't think it is something I would use as a serious resource, but it's fun to play with it. To try it out, go here and click on the "try it now" image. Caveat: It requires Java, but most people already have this.

Posted in May 2005 | Comments (1)