Lars Pind

internet software, coaching, and entrepreneurship

Lars Pind - internet software, coaching, and entrepreneurship
Check out Coach TV, my video blog on happiness and personal development for geeks.

A (busy) day in the life of ...

September 19, 2006 · 1 comment

In a single day, we’ve managed to fly to New York, along with our 1-year-old daughter, had an afternoon meeting with Martin Niesenholtz, the head of New York Times Digital, with my partner Christina Wodtke, and finally had dinner with Meg Hourihan and Liz Danzico. What a day! Needless to say, I’m beat, and that’s nothing compared to Flora. But it’s been quite exciting.

Cell # while I’m here: (+1) 347 840 4008. Make note, because I plan to keep this number for the next few years. More on my contact info page

1 comment

Solutions for the missing Rails-Commit mailing list

September 15, 2006 · 0 comments

The Rails-Commit mailing list has been down since August 19, and though it was a nice relief from the ADHD of the core team for a while, it is really useful to follow what’s being committed, especially since I have 3 production applications running Edge.

I asked Marcel Molina and Jamis Buck, and it turns out the old rails-commit mailing list is simply dead, and the only alternative is the Timeline RSS feed from Trac.

There’s no way to get a feed that includes complete diffs of each change, but Marcel offered the tip, that if you install Thunderbird, it’ll actually grab the page that’s being linked to, so you can in fact see the diffs right in the RSS reader.

It’s not perfect. There’s a slight delay while it fetches the page, and I don’t use Thunderbird for anything else, so I’d have to install and keep it open just for this, but it does do the job, and it’s the only known option currently.

It would be great with an RSS feed that had the diffs in the body. There’s be many ways to accomplish this. Maybe someone has written something for Apache that produces an RSS feed from Subversion. Maybe one could make a patch for Trac.

Or, with even less work, we could change the post-commit hook on dev.rubyonrails.org to email a Google group, which people could subscribe to by email, or use something like MailBucket or SocialMail to get it in their RSS reader. I personally this is a perfect match for RSS.

0 comments

The Lulu Quiz

September 13, 2006 · 0 comments

I’m a big fan of companies that tackle hiring differently, like the print-on-demand company Lulu with The Lulu Quiz.

Note in particular the Renaissance skill set and the living abroad/multilingual part. Nice touch. (Via Rasmus)

0 comments

Troubleshooting an out-of-memory problem

September 10, 2006 · 2 comments

I’ve had a problem for a while with the server that runs both this site and Boxes and Arrows where it would all of a sudden just race into an out-of-memory situation, and the so-called oom-killer would start shooting processes at random, which would quickly take the server down. Meanwhile the load would be so high that logging in to try and figure out what was going on was impossible. The only recourse was to reboot the server.

The problem was amplified by its irregularity. It would happen sometimes twice in a row, sometimes 2-3 weeks would pass between occurrences. The first time it ever happened was when I was vacationing on Italy’s Amalfi coast with no reception on my cell phone. That took all my sites down for 24 hours. Ouch.

It took me quite some time and effort to figure out the root of the problem, so I’d like to share how I did it and hopefully save someone else some cycles.

My first confusion was with the Linux cache. I noticed that when updatedb, the command that updates the locate database, was running, memory consumption would shoot up. A kind soul on #debian explained to me how to use the free command:

$ free -m
             total       used       free     shared    buffers     cached
Mem:          2027       1810        217          0        132       1157
-/+ buffers/cache:        520       1506
Swap:         1027         55        972

What this means is that the box has 2 Gb RAM, of which 1.8 Gb is used. But if you subtract the amount of RAM used for disposable caching, it turns out only 520 Mb is used. That’s pretty healthy. So we’re not starved for RAM per se.

Next, he suggested I setup a cron job to monitor RAM consumption, so I could go back in afterwards and find the bad process. At first, however, I set it up to log only at 5-minute intervals. That turned out to be too coarse, it completely missed the incident (though I couldn’t know that for a fact). I then changed it to log every 1 minute.

Here’s the crontab entry:

*/1 * * * *     root /usr/local/bin/memstat

And here’s the script:

date >> /var/lib/memstat/memstat.log
vmstat >> /var/lib/memstat/memstat.log
ps ax --cols 200 -o size,pid,start,user,rss,resident,share,vsize,command O-s |head >> /var/lib/memstat/memstat.log
free -m >> /var/lib/memstat/memstat.log

I also logged a couple other things, taking a shotgun approach, but the above turned out to be the essential part.

Now the next time it happened, I could actually see the process swelling. One minute it was normal, the next it was huge, and the next again, it was gone, killed by the oom-killer. That’s how fast it went. Amazing.

It turned out to be one of my application processes. Good, that meant it was likely to be something I had some control over. Next question, of course, was finding out what caused it. My theory was that it was probably caused by some request, given that there was no signs of leaks, and that it happened so quickly and so irregularly.

What I did in the end was correlate with the server logs. That’s where the date command in my memstat script came in handy. Based on the timestamp I could find that same 2-minute window in my application log file and see what was going on, most notably which requests were coming in.

My first mistake was to look at completed requests. That was dumb, because the offending request was precisely likely to never finish. Second time around, I looked at requests that never finished. Bingo!

It turned out to be when people did a search on Boxes and Arrows with a query string that included an ampersand (&) character. For some reason, that caused Ferret, our Lucene-like search engine, to start spinning indefinitely, gobbling up all memory in the process. Just to be sure, I tried running the query again, with my hand on the “kill -9” trigger, and indeed, the process started swelling quickly.

I’m so glad to have solved this, and I feel much more confident going into my next problem. For a long time I wasn’t sure if it might actually be a hardware or a kernel problem, but it feels good in the end to have pinpointed the exact problem, and to know that it’s fixed. I can now sleep more relaxed at night. I still keep my cell phone on and not on silent in case I get an SMS notification in the middle of the night, but they just got a lot less likely.

2 comments

Why won't this GIF show in IE?

September 09, 2006 · 8 comments

I’m at a loss here. The image on the right shows fine on my Mac, and it shows fine in FireFox on my Windows XP PC, but on IE6, it shows as a broken image.

Here’s what identify has to say about it:

UI11-badge.gif GIF 90x57 90x56+0+0 PseudoClass 256c 2kb 0.000u 0:01

Does anything look wrong here? What to do, where to look?

8 comments