Wednesday, March 4, 2009

If you're going to use pdf...

I have a professor who is using PDF to distribute scanned text for the class to read and review. Each file is around 10 megs or larger, and after a bit of modification, the files can be reduced (with no apparent quality loss) to around 1 meg, saving space in the cache, download times, etc.

Anyway, this may not be the most interesting topic, but there is a best practices on how to scan/distribute pdf, so I thought I would pass on a solution:

1. Straighten the text by using OCR Text Recognition.

After the document is scanned, use the batch tool OCR Text Recognition (Document -> OCR Text Recognition -> Recognize Text Using OCR). This allows for all text in the document to be converted from a raster (image) format to a searchable, highlightable format. It also rotates each document page so the text is easier to read.

2. Use reduce file size.


Perhaps the most efficient way to make the largest impact in reducing pdf file size is the reduce file size option located in Acrobat (Document -> Reduce File Size...). Do this after step one because it sometimes will cause step one not to work - this is especially irritating after you've saved over the older, bloated file. I recommend saving the new file with a "_reduced" ending on the name (oldfile_reduced.pdf) until after all pdf processing is complete.

3. Allow all users to highlight the now searchable text by Enable for Commenting and Analysis in Adobe Reader...

Since the document will be used in the future by all students, not just those with Acrobat, why not enable the highlighting/comment/strikethrough/etc. feature for them as well. To do this, on the machine with Acrobat, go to Comments -> Enable for Commenting and Analysis in Adobe Reader.... Once completed, save the file down as the final save. To enable commenting in Reader, Right-Click in the bar at the top with all the buttons and check on the commenting tools.

Now all readers can annotate and search the now rotated document in a reduced file size format. Cool.

If you have any suggestions, leave them in the comments -