Lars Pind

internet software, coaching, and entrepreneurship

Lars Pind - internet software, coaching, and entrepreneurship
Check out Coach TV, my video blog on happiness and personal development for geeks.

Interface advice: Categorize in many many categories

August 02, 2006 · 21 comments

I have a client that categorizes their documents.

Great, you think. Well, they categorize their documents in one or more of almost 3,000 categories. That’s something that’s hard to build a nice, simple interface for.

Here are some possible approaches that I and the few people I’ve asked have come up with:

1. Present a single page with all 3,000 categories, displayed hierarchically, each with a checkbox next to it. Yeah, that’s what we have today. Let me just say that it doesn’t work great, and that we didn’t start out with 3,000 categories.

2. Let people navigate to the category first (think the Yahoo! directory) and add the document there. We have that, too, but that only helps you choose the first cateogry, not the additional ones.

3. Present a series of drop-downs. First we show you one with the top-level. When you choose there, we show you another one with the subcategories of your first choice. We keep doing that until there are no children or you choose “Add category”. Yes, documents can be added to any category, including those that have subcategories.

4. A variant of this is to use multi-select boxes instead of dropdowns, mimicking the OS X finder interface.

5. Use a dynamic Windows Explorer-style tree, like XLoadTree.

6. Live substring-based search. Good if you know what you’re looking for, not good for browsing. And it short-circuits the structure, searching just the categories and not their relationship. This seems useful, but it’s an add-on to another solution, not a solution in itself.

I’d like to ask your advice. Please send me your suggestions. Screenhots or links to interfaces that solve this well, whether web or not, would be fantastic. Post in the comments or email me, and I’ll put it up and share.

blog comments powered by Disqus

21 responses so far ↓

  • 1 John Sequeira // Aug 02, 2006 at 05:09 PM

    Here's an implementation of faceted navigation that might inspire your many many category problem http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/nobel/Flamenco it's open source http://flamenco.berkeley.edu/download.html I'm a big believer in faceted navigation: (see http://www.jsequeira.com/cgi-bin/virtualization )
  • 2 Kai // Aug 02, 2006 at 05:25 PM

    If you are using a tree structure for your categories and the user needs to select related categories for a content item, you could first present the user with a list of sibling categories (i.e. categories with the same parent as the primary category of the item). Presumably the list of siblings would be relatively short, and relevant too.
  • 3 Michael Yoon // Aug 02, 2006 at 05:47 PM

    Option #4 sounds like the best to me, of the ones you mention, perhaps something like http://johnvey.com/features/deliciousdirector/
  • 4 Dave Bauer // Aug 02, 2006 at 06:40 PM

    How about using the existing category information to suggest categories, with a dynamcially loaded tree as the fallback if that doesn't help. That is, once you have choosen the first category, find out what other documents with that category are also categorized under, and suggest those.
  • 5 Hartvig // Aug 02, 2006 at 09:09 PM

    How about less? If you have 3k categories in what scenario are they used - why is there a need for 3.000? Has the need been tested - eventually simulated down to 100 categories? That would be the first place to look - with 3.000 categories it would probably take longer time to pick categories than to write the original craft.
  • 6 Thijs van der Vossen // Aug 02, 2006 at 09:48 PM

    Why does Greenpeace think they need 3000 categories for organizing their documents?
  • 7 Tanya // Aug 02, 2006 at 10:42 PM

    @1 What John said. I would see if the categories (or perhaps they're really descriptors) can be sorted to allow parametric browsing.
  • 8 Lars Pind // Aug 02, 2006 at 11:11 PM

    That's a perfectly valid question. I think that if the taxonomy is clear enough, that can be completely reasonable. I think the "ICD":http://en.wikipedia.org/wiki/ICD has thousands and thousands of categories. If people understand the taxonomy and know how to navigate it, it can be the right thing. I don't know if it's right for Greenpeace, and frankly, I don't think it matters. Whether there's 500 or 3000, we still need a good interface, and the one we have wasn't good even when the number of categories were in the low hundreds (sorry, Yon!). But the current set of categories has evolved from a much smaller set, and it hasn't been completely mindless: There's been card sorting and stuff :)
  • 9 Lars Pind // Aug 02, 2006 at 11:19 PM

    @Michael Yoon: Thanks a bunch for the link. Seeing something in action as great, and this is a very good way to solve it.
  • 10 James Melzer // Aug 03, 2006 at 01:51 AM

    More use case info would be helpful. Are there catalogers that catalog everything, or are these random business users cataloging for themselves? If the former, then known-item searching is probably the the fastest and best interface. They'll know the taxonomy or have a paper copy to refer to taped to the wall of their cube. On the other hand, if this is for lots of 'amateur' end users cataloging their own materials, I'd go the opposite direction. Their first few visits, show them the whole taxonomy (which sucks, as you said) but remember what categories they used. After a few visits, show their favorite categories first, with the option to see or search the entire list. Chances are, they'll be using the same small set of categories over and over, so this will speed up their work a lot (and make the interface simpler and faster.
  • 11 Hamilton // Aug 03, 2006 at 07:21 AM

    http://developer.yahoo.com/yui/examples/treeview/ If you go with the dynamic tree option, you might find the above useful.
  • 12 Eric Reiss // Aug 03, 2006 at 11:29 AM

    You don’t say how many individual documents are represented by these 3000 categories. However, the sheer number of categories at the top-most level is clearly unwieldy. Most of your proposed solutions build on display mechanisms (Yahoo directory, XLoadTree, dropdowns etc.) But for serendipitous navigation, you need to start by rethinking the basic categories, establishing broader categories at the top – but you already know this. Depending on the number of individual documents, you might emulate Amazon’s collaborative filtering method. This allows people to search for a term and then surf through related items. It’s not a true faceted classification system, nor is it hierarchical, but it does combine the best of a couple of different information-seeking worlds.
  • 13 Lars Pind // Aug 03, 2006 at 11:34 AM

    There's 5 categories at the top level. At the second level, there's at most 30 subcategories, and they represent geography. Also I should mention that each local office gets their own corner of the taxonomy that they control, and they don't generally don't have to deal with the those controlled by other offices, which cuts down the number of categories a person has to deal with quite a bit. Note, that I'm not talking about the interface for browsing the documents in these categories. What I'm looking for is the interface for choosing a category for a document that you already have on the screen.
  • 14 Thijs van der Vossen // Aug 03, 2006 at 12:00 PM

    It would be great if you could post a screenshot of the current interface and/or show us the current list or tree of categories.
  • 15 Lars Pind // Aug 03, 2006 at 02:25 PM

    @Thijs: Unfortunately, I'm not in a position to grant that, and the person that is, isn't back until Monday.
  • 16 Thijs van der Vossen // Aug 03, 2006 at 03:28 PM

    I know, I've been trying to get hold of him myself too... :-)
  • 17 Koyan // Aug 03, 2006 at 04:54 PM

    I would do it like that: http://developer.yahoo.com/yui/examples/treeview/default.html?mode=dist With check boxes right before the names. You can still leave the clients free to do whatever categorisation they want (since I doubt you will get them to make the number of categories smaller), and you can present them with a fast loading page. Now, if they are willing to pay more, you can add functionality like "remember the last categories I had open" etc
  • 18 Thijs van der Vossen // Aug 03, 2006 at 09:03 PM

    Ok, so you have a tree of hierarchical categories where you must be able to select multiple nodes? How about this (QuickTime)?
  • 19 Lars Pind // Aug 03, 2006 at 10:51 PM

    @Thijs: That's pretty neat. You built that? Is there code somewhere?
  • 20 Thijs van der Vossen // Aug 04, 2006 at 12:12 AM

    It's just a nested list with checkboxes inside and some CSS so show the list item you're hovering over. Very simple really.
  • 21 Martin // Aug 15, 2006 at 02:07 PM

    Ok, OK, I'll post a screenshot. It'll be at http://weblog.greenpeace.org/it shortly... First - thanks for all the suggestions - much appreciated! To answer some of the questions... Control of the taxonomy is devolved out to the users of the system because otherwise the system ends up with too few users - and legislating for everything folks come up with is close to impossible. The 3000 categories represent just over 12000 documents - which isn't a bad ratio, although some tidying up will probably help. There is also the issue that 27 local offices have access to the system, effectively creating the need for 27 sets of 'how to find the office' or 'what to do when the photocopier breaks' documents.