Jiglu:博客标签好管家

      我的日志 2007-10-16 10:46

Jiglu:博客标签好管家
Software That Plays Tag

如果你是位作家,想必会希望自己的作品被刻在石碑上流芳千古。如果你是位博客作者,那该乐于见到自己的文章放到网上没几天就被人注意到。我们说博客是一种短命的媒体,其原因一方面在于博客上的文章通常主要由对时事的评论构成,而另一方面也与博客页面的排版有一定关系(最新的文章放在首页,而过去所写的文章就容易被忽视)。鲜有博客被视为永久媒体,文章一旦被放上去,就会埋没在时光中。

就拿我本人做例子,从2002年开始我就有了自己博客。几乎每天我都会早早起床浏览网页或是报纸,直到产生想写点什么的冲动。于是我坐下来写些东西发到网上,然后回到床上。这样做已经成了一种习惯:在写博客的这段日子里,我已经发了2,006篇文章。

但请不要吃惊,我连昨天写了什么都不太记得,更不用说一年前写的。面对博客文章以150万篇的速度增殖(据博客跟踪和研究服务Technorati统计,全球每天新增150万篇博客文章),你就知道其中有多少文章在放置后不久就被遗忘,甚至连作者本人都不记得。但并非每一个博客都是记录作者短暂的心路历程,像刊物一样的风格意味着它们会一直这般保持下去。

当然,博客文章并不是真的消失。你总是可以用谷歌(Google)之类的搜索引擎找到想要的。而用贴标签的办法能便于所有人进行查询和分门别类,比如有人给照片、博客文章或喜欢的音乐添加或长或短的标签。此类标签既可以专门为某个博客而设,也可以是用Technorati之类的搜索引擎将其他博客上的标签汇集在一起。

但这种办法只能在一定程度上起作用。问题出在我的标签可能与你的不一样。而我在给自己的文章添加标签上可能有些懒(最近几个月正是如此)。坦白的说,我的标签是一团糟,这也不是我愿意花太多精力去想的事。其结果是我的博客读者不大可能发现这些标签的用处,也就无法让他们像我想的那样去阅读以前的文章。我的博客若想靠标签来吸引读者,可能性非常渺茫。而光靠博客本身,一般只有很短的“保鲜期”。

英国人奈杰尔•坎宁斯(Nigel Cannings)认为他已经找到解决此问题的办法,那就是自动上标签。他看到了我博客上所有旧文章中蕴藏的价值(可能他是唯一的一个人),并将所有这些旧纸堆视为智慧的宝库,所要做的只是对它进行更好的整理和归类。他认为,由我们自己来贴标签是不够的,因为我们有时意识不到自己的文章有着什么样的更广泛联系。坎宁斯称,手动上标签是更好的整理和保存博客和其他网上内容的第一步。但这仍取决于人们对自己的了解,即他们到底写了些什么,以及他们写的内容多大程度上符合他人所写的。

他的想法是,挖掘博客和其他网上内容中的文字,从中提取出标题、类别和名称用来创造一套替代的标签,即一套类别索引,这样可以便于读者在同一个博客或网上其他地方查找相关的文章。办法是用一种软件来解析文章,从中找出关键词和主题,进而制作成一份博客的目录,让读者更易于浏览。坎宁斯表示,此举目的在于利用作者平时使用的词句和写作习惯来制作标签,随时间推移为更广的读者群打开信息之门。

为此坎宁斯开办了名为Jiglu的网站(www.jiglu.com),并以“会思考的标签”为口号。通过给你的网站、维客(维基之类的共创性网站)或博客添加几行代码,Jiglu便能对网上内容进行筛选并将它认为重要的主题、人名和链接挑选出来列在专门的网页上。这些标题将显示在网页上的一个版面中。点击任何你感兴趣的主题,列有该主题相关文章(博客文章或其他内容)网址的索引就会显示在另外弹出的一个窗口中。你也可以用一个树状分布图(一种由带标注的小方块构成的图表,每个方块的大小取决于该主题在网站内容中出现的频率)来显示指定网站所有的主题。在索引页面,相关标题会带有蓝色下划线:点击其中一项,就会弹出一个窗口列出这篇与分类或关键词相符的文章首段内容以及链接。

所有这些都运行得颇为流畅。用户可以轻松便利地向归属TypePad或Blogger等主要博客社区的网页安装代码(据Jiglu称,其他社区的博客网页也将陆续享受到此服务)。这项服务要花上好几个小时来筛选你的网页,不过一旦工作完成,主题就能显示在一个小版面内,而且用户能将这个版面放置在网页的任何一个地方。此外,窗口弹出程序也运行得很好。

不过关键还在于,Jiglu的自动贴标签功能是否理想。我的回答是:没有我想的那么好。我本来是指望Jiglu能代我做一些自己实在懒得去做的事:给文章添加“上标签”、“Web 2.0”、“分类学”、“Jiglu”、“启动”、“类型化”、“术语提取”、“搜索”等等标签,这样我就不必为此劳神了。但实事却并非如此。它在给我的博客上标签时,反而跑出来一大堆以公司名、产品名,以及“坏主意”(不清楚这是不是我曾写过的内容)、“临界点”(这个词我常用到)这样的主题,甚至因为某些原因还出现了“宇宙”主题。在“人物”这个类别中,人名的提取做得还不错,但昂山素姬(Aung San Suu Kyi)被漏掉了,并被作为一个主题而不是“人”列了出来。

坎宁斯的努力看起来是失败了,但我并不这样认为。其简便的操作意味着可以将其作为贴标签时的帮手。上标签是件带有主观性的工作,这道“热饭”最好由作者去吃,而像Jiglu提供的索引则是最好由别人或别的工具去吃的“冷菜”。Jiglu的标签引擎的确需要改进才能让我真正提起胃口,但我认为,开发出某种工具来帮助我们将现在所写文章与我们(或者其他人)的旧作联系起来,这种工具是有发展潜力的。

虽然博客和网站可能的确讲究时效性,但这并不意味着我们要将过去文章中的智慧火花抛于脑后。尽管博客也许仍被认为是昙花一现,但足够多的事情已让我明白,智慧是永恒的。

Jeremy Wagstaff

(编者按:本文作者Jeremy Wagstaff是《华尔街日报》科技专栏“Loose Wire”的专栏作家,栏目内容涉及科技产品、电脑、软件等相关领域。)  
If you're a writer, you hope your words will be etched in stone for eternity. If you're a blogger, you're happy if someone stumbles on your
Sell 100% Acrylic Woven Scarf writings a few days after you posted them. Blogs, partly because they often consist mainly of commentary on things that have just happened, and partly because of the way they are structured (most recent postings first, making it easy to ignore everything you wrote before), are a transient medium. Rarely is a blog post treated as permanent. We write, then we forget.

Take me, for instance. I've been writing a blog since 2002. Every day, more or less, I get up early and read something online Sell 70% Rayon 30% Polyeaster Woven Scarf or in a newspaper until my blood boils. Then I sit down and write until it's out of my system. Then I usually go back to bed. This is something of a ritual: At the time of writing, I've composed 2,006 posts.

Not surprising, then, that I can barely remember what I wrote yesterday, let alone a year ago. Multiply this by 1.5 million (the volume of global blog posts a day, according to Technorati, a blog tagging and search service) and you get some idea how much is being written Sell Backpack Sell Binocular and promptly forgotten about, even by its authors. While not every blog is a stream of consciousness, the journal-like approach means they can look that way over time.

Of course, a blog post isn't lost. You can always find it with a search engine such as Google. And tagging -- where users label Sell Car Vacuum Cleaner their photos, blog posts or favorite music with single- or multiple-word tags -- has made it easier for everyone to find and group stuff together. These tags can be specific to the blog or they can be lumped together with tags from other blogs, using special search engines such as Technorati.

But this works only up to a point. My tags may be different to your tags. And I may (as I have in recent months) gotten a bit lazy about the tags I add to what I write. Frankly, my tags are a mess and not something I like to think about too Sell Cd Case     much. Result: Readers are unlikely to find them useful and therefore don't flit from new articles to old ones as much as I'd like. And the chances of someone stumbling upon my blog because of tags remain remote. Blog posts, left to themselves, tend to have a short shelf life.

Briton Nigel Cannings thinks he has the solution to this: automatic tagging. He sees value in all those old blog posts of mine (he may be the only one) and reckons all that old content out there is a repository of wisdom that just needs to be Sell customed slipper sorted out better. Tagging it ourselves, he thinks, just isn't enough because we don't always see what we've written in a broader context. 'Manual tagging is the first step' to sorting and storing blogs and other online content better, he says, 'but it still relies upon people understanding themselves -- whatever they've already written about, and how their content fits in with other people's content.'

His idea is to mine the words on blogs and other unstructured online content, to extract from them headings, categories and names that could then create an alternative set of tags -- an index of sorts -- that could help readers find Sell Flip-up Calculator related articles both from the same blog and elsewhere. Using software to delve down into postings to grab the important words and the topics they refer to, a sort of table of contents of blogs is created, making Self-drilling Screw Point Cutting Machine it easier for readers to browse. 'What we want to do,' Mr. Cannings says, 'is use people's ordinary words and the way they write to create tags and therefore open up information to a wider readership over time.'

So he's come up with something called Jiglu (www.jiglu.com), with 'tags that think' as its, ahem, tagline. By adding a few lines of code to your Web site, wiki (collaborative Web site, such as Wikipedia) or blog, Jiglu will sift through your content and dig out what it thinks are key topics, people and links on a particular page. These headings will appear in a box on the page. Click on any topic and a list of the articles (blog posts or whatever) that address that particular topic pops up in a separate window. Sell Gift Calculator     You can also view an overall treemap (a sort of mosaic of labeled squares, the size of each square determined by the frequency that the topic it refers to appears in the Web site's content) of all the main topics on that particular site. In the text itself, dotted blue lines appear under words that are included in the index: Click Self-Priming Jet Pump Sell 100% Acrylic Centipede Scarf on one and a window pops up with links to and the first paragraphs of those posts that match that category or word.

All of this works pretty smoothly. Installing the code on a blog hosted by one of the main blog services such as TypePad or Blogger is pretty easy (other services will follow, Jiglu says). The service takes a few hours to get round to trawling your site, but once it has, the Sell Non-woven tote bag     topics appear in a little box that you can place anywhere on the page. The pop-up window works pretty well too.

But the key issue, of course, is how good a job Jiglu makes of automating the process of assigning tags to your content. Sandstone tiles     The answer is: It's not what I'd expected. I guess I'd been looking for Jiglu to do the work I'm too lazy to do: to assign, for example, the tags 'tagging,' 'Web 2.0', 'taxonomy,' 'Jiglu,' 'start-up,' 'categorization,' 'term extraction,' 'search' and so on to an article like this, so I don't have to. But it doesn't. Instead, in tagging my blog, it came up with stuff like company names, products names, and topics like a 'bad idea' (it's not clear Scarf screen fabric Scrolling Low Tables whether this is what I've been writing about or having), 'tipping point' (a term I use way too much) and, for some reason, 'universe.' In the 'People' category, it did a good job of extracting names, but missed Aung San Suu Kyi, tagging sea grass basket her as a topic rather than a person, perhaps because she insists on having four words in her name, and misinterpreting the old Bangkok airport Don Muang as a person.

This may sound like a failure, but I don't think it is. The simplicity of setting it up means that it can complement existing tags. Tags are subjective, best served hot by the author; a sort of table of contents a la Jiglu is probably best done Sealant-spreading Machine by someone or something else, and served cold. True, the Jiglu tagging engine needs to get smarter before I would get really excited about it, but I think there's potential in anything that helps link what we write today to what we (and others) might have written yesterday.

Blogging, and the Web, may be time-critical, seawave     but that doesn't mean we should forget the wisdom of what we wrote in the past. Blogs may still be regarded as ephemeral but I've learned enough to know that some insights are timeless (but not necessarily mine).

标签集:TAGS:
回复Comments() 点击Count()

回复Comments

{commentauthor}
{commentauthor}
{commenttime}
{commentnum}
{commentcontent}
作者:
{commentrecontent}