Lars Pind

internet software, coaching, and entrepreneurship

Lars Pind - internet software, coaching, and entrepreneurship
Check out Coach TV, my video blog on happiness and personal development for geeks.

Converting HTML to PDF

February 15, 2006 · See comments

When I’m looking at a web page with Safari, all I have to do is hit Command-P and save as PDF, and I have a nice PDF of the page.

Is there a way to do this automated and server-side on a standard linux system? I need to have a web app generate these PDFs dynamically on the fly. And they need to look good, with different designs, etc., so it’s more than just getting text to show on a page.

I’ve come across HTMLDOC, and it does the job in theory, but only for crummy HTML—supports most of HTML 3.2, some of HTML 4.0, and no CSS. Not great, if I also want to produce reasonable HTML output.

Are there any other options out there that you’re aware of? Is there code in Webkit for this, and would it be possible to get some of that to run on a Debian box? Hm, doesn’t sound likely.

blog comments powered by Disqus

Comments ↓

  • 1 Mark Aufflick // Feb 15, 2006 at 03:48 AM

    First render the html to postscript. There are a few options for this: One I just found (but haven't tried) is: http://user.it.uu.se/~jan/html2ps.html The way I always used to do it was to use netscape in batch print mode and direct the output to a file. I assume you can do the same with firefox. The beauty of this approach is that you know you will get the same rendering as firefox, the downside is that you can't install firefox on a server without X libraries and it's a fairly heavy way of doing it. Once you have the postscript, ps2pdf will make it into a pdf for you - it's available as a default package in every linux distro I've ever used.
  • 2 Mark Aufflick // Feb 15, 2006 at 03:50 AM

    Looks like typo just typo'd the tilde in the url. It should read: http://user.it.uu.se/~jan/html2ps.html
  • 3 Andreas Haugstrup // Feb 15, 2006 at 12:00 PM

    Prince, which I haven't tried, has a command line interface. Don't know much about it. http://www.princexml.com/
  • 4 Lars Pind // Feb 15, 2006 at 12:30 PM

    Mark, thanks for the tip, it's amazing how helpful you always are, it's deeply appreciated. Thank you! Andreas, Prince looks like a perfect fit, except, of course, for the $3800 server license, which is a little steep. But at a first glance, it looks clean.
  • 5 Andreas Haugstrup // Feb 15, 2006 at 12:49 PM

    Heh, that's what I get for not looking at the price before commenting. :o)
  • 6 Malte Sussdorff // Feb 15, 2006 at 03:20 PM

    You could install OpenOffice, run a vncserver so OpenOffice has something to connect to in the background, and have a Macro to convert HTML to PDF using OpenOffice (which would allow you to convert anything OO can read to PDF).
  • 7 Lars Pind // Feb 18, 2006 at 03:02 AM

    Hi Malte Thanks for the tip. Thanks to your suggestion, I found "this":http://www.xml.com/pub/a/2006/01/11/from-microsoft-to-openoffice.html, which has some example macros, and spells out in more detail how this could be done. /Lars