Interface advice: Categorize in many many categories

by Lars Pind on August 2, 2006

I have a client that categorizes their documents.

Great, you think. Well, they categorize their documents in one or more of almost 3,000 categories. That’s something that’s hard to build a nice, simple interface for.

Here are some possible approaches that I and the few people I’ve asked have come up with:

1. Present a single page with all 3,000 categories, displayed hierarchically, each with a checkbox next to it. Yeah, that’s what we have today. Let me just say that it doesn’t work great, and that we didn’t start out with 3,000 categories.

2. Let people navigate to the category first (think the Yahoo! directory) and add the document there. We have that, too, but that only helps you choose the first cateogry, not the additional ones.

3. Present a series of drop-downs. First we show you one with the top-level. When you choose there, we show you another one with the subcategories of your first choice. We keep doing that until there are no children or you choose “Add category”. Yes, documents can be added to any category, including those that have subcategories.

4. A variant of this is to use multi-select boxes instead of dropdowns, mimicking the OS X finder interface.

5. Use a dynamic Windows Explorer-style tree, like XLoadTree.

6. Live substring-based search. Good if you know what you’re looking for, not good for browsing. And it short-circuits the structure, searching just the categories and not their relationship. This seems useful, but it’s an add-on to another solution, not a solution in itself.

I’d like to ask your advice. Please send me your suggestions. Screenhots or links to interfaces that solve this well, whether web or not, would be fantastic. Post in the comments or email me, and I’ll put it up and share.

{ 21 comments }

John Sequeira August 2, 2006 at 2:59 pm

Here’s an implementation of faceted navigation that might inspire your many many category problem

http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/nobel/Flamenco

it’s open source http://flamenco.berkeley.edu/download.html

I’m a big believer in faceted navigation:

(see http://www.jsequeira.com/cgi-bin/virtualization
)

Kai August 2, 2006 at 2:59 pm

If you are using a tree structure for your categories and the user needs to select related categories for a content item, you could first present the user with a list of sibling categories (i.e. categories with the same parent as the primary category of the item). Presumably the list of siblings would be relatively short, and relevant too.

Michael Yoon August 2, 2006 at 2:59 pm

Option #4 sounds like the best to me, of the ones you mention, perhaps something like http://johnvey.com/features/deliciousdirector/

Dave Bauer August 2, 2006 at 2:59 pm

How about using the existing category information to suggest categories, with a dynamcially loaded tree as the fallback if that doesn’t help.

That is, once you have choosen the first category, find out what other documents with that category are also categorized under, and suggest those.

Hartvig August 2, 2006 at 2:59 pm

How about less? If you have 3k categories in what scenario are they used – why is there a need for 3.000? Has the need been tested – eventually simulated down to 100 categories?

That would be the first place to look – with 3.000 categories it would probably take longer time to pick categories than to write the original craft.

Thijs van der Vossen August 2, 2006 at 2:59 pm

Why does <a href="http://weblog.greenpeace.org/it/2006/08/advice_sought.html">Greenpeace</a> think they need 3000 categories for organizing their documents?

Tanya August 2, 2006 at 2:59 pm

@1 What John said. I would see if the categories (or perhaps they’re really descriptors) can be sorted to allow parametric browsing.

Lars Pind August 2, 2006 at 2:59 pm

That’s a perfectly valid question.

I think that if the taxonomy is clear enough, that can be completely reasonable. I think the "ICD":http://en.wikipedia.org/wiki/ICD has thousands and thousands of categories. If people understand the taxonomy and know how to navigate it, it can be the right thing.

I don’t know if it’s right for Greenpeace, and frankly, I don’t think it matters. Whether there’s 500 or 3000, we still need a good interface, and the one we have wasn’t good even when the number of categories were in the low hundreds (sorry, Yon!).

But the current set of categories has evolved from a much smaller set, and it hasn’t been completely mindless: There’s been card sorting and stuff :)

Lars Pind August 2, 2006 at 2:59 pm

@Michael Yoon: Thanks a bunch for the link. Seeing something in action as great, and this is a very good way to solve it.

James Melzer August 2, 2006 at 2:59 pm

More use case info would be helpful. Are there catalogers that catalog everything, or are these random business users cataloging for themselves?

If the former, then known-item searching is probably the the fastest and best interface. They’ll know the taxonomy or have a paper copy to refer to taped to the wall of their cube.

On the other hand, if this is for lots of ‘amateur’ end users cataloging their own materials, I’d go the opposite direction. Their first few visits, show them the whole taxonomy (which sucks, as you said) but remember what categories they used. After a few visits, show their favorite categories first, with the option to see or search the entire list. Chances are, they’ll be using the same small set of categories over and over, so this will speed up their work a lot (and make the interface simpler and faster.

Hamilton August 2, 2006 at 2:59 pm

http://developer.yahoo.com/yui/examples/treeview/

If you go with the dynamic tree option, you might find the above useful.

Eric Reiss August 2, 2006 at 2:59 pm

You don’t say how many individual documents are represented by these 3000 categories. However, the sheer number of categories at the top-most level is clearly unwieldy. Most of your proposed solutions build on display mechanisms (Yahoo directory, XLoadTree, dropdowns etc.) But for serendipitous navigation, you need to start by rethinking the basic categories, establishing broader categories at the top – but you already know this.

Depending on the number of individual documents, you might emulate Amazon’s collaborative filtering method. This allows people to search for a term and then surf through related items. It’s not a true faceted classification system, nor is it hierarchical, but it does combine the best of a couple of different information-seeking worlds.

Lars Pind August 2, 2006 at 2:59 pm

There’s 5 categories at the top level.

At the second level, there’s at most 30 subcategories, and they represent geography.

Also I should mention that each local office gets their own corner of the taxonomy that they control, and they don’t generally don’t have to deal with the those controlled by other offices, which cuts down the number of categories a person has to deal with quite a bit.

Note, that I’m not talking about the interface for browsing the documents in these categories.

What I’m looking for is the interface for choosing a category for a document that you already have on the screen.

Thijs van der Vossen August 2, 2006 at 2:59 pm

It would be great if you could post a screenshot of the current interface and/or show us the current list or tree of categories.

Lars Pind August 2, 2006 at 2:59 pm

@Thijs: Unfortunately, I’m not in a position to grant that, and the person that is, isn’t back until Monday.

Thijs van der Vossen August 2, 2006 at 2:59 pm

I know, I’ve been trying to get hold of him myself too… :-)

Koyan August 2, 2006 at 2:59 pm

I would do it like that:
http://developer.yahoo.com/yui/examples/treeview/default.html?mode=dist
With check boxes right before the names.

You can still leave the clients free to do whatever categorisation they want (since I doubt you will get them to make the number of categories smaller), and you can present them with a fast loading page.

Now, if they are willing to pay more, you can add functionality like "remember the last categories I had open" etc

Thijs van der Vossen August 2, 2006 at 2:59 pm

Ok, so you have a tree of hierarchical categories where you must be able to select multiple nodes?

How about <a href="http://stuff.vandervossen.net/external/2006/treeform.mov">this (QuickTime)</a>?

Lars Pind August 2, 2006 at 2:59 pm

@Thijs: That’s pretty neat. You built that? Is there code somewhere?

Thijs van der Vossen August 2, 2006 at 2:59 pm

It’s just a nested list with checkboxes inside and some CSS so show the list item you’re hovering over. Very simple really.

Martin August 2, 2006 at 2:59 pm

Ok, OK, I’ll post a screenshot. It’ll be at http://weblog.greenpeace.org/it shortly…

First – thanks for all the suggestions – much appreciated! To answer some of the questions…

Control of the taxonomy is devolved out to the users of the system because otherwise the system ends up with too few users – and legislating for everything folks come up with is close to impossible.

The 3000 categories represent just over 12000 documents – which isn’t a bad ratio, although some tidying up will probably help.

There is also the issue that 27 local offices have access to the system, effectively creating the need for 27 sets of ‘how to find the office’ or ‘what to do when the photocopier breaks’ documents.

Comments on this entry are closed.

Previous post: Today’s term: Syntactic Vinegar

Next post: How to output PDF, DOC, and RTF formats: The solution