SharePoint 2007

How do you get SharePoint 2007 to read and index content inside a PDF file?

on

This is an easy one but requires a little bit of work to get working correctly.  SharePoint uses a feature called Index Server to search documents but it doesn’t search within PDFs by default. Searching inside PDF documents requires an iFilter from Adobe which they designed for 3rd party systems to read the PDF file format. Adobe includes this filter with Adobe Reader or you can download iFilter separately from Adobe’s site if you don’t want Reader installed on your SharePoint servers. 

http://www.adobe.com/products/reader – Latest version of Adobe Reader

or

http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611 – x86 iFilter
http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025 – x64 iFilter

 
CENTRAL ADMINISTRATION
Now in SharePoint itself, you need to configure the search service to index files with the .pdf extension:

1. Go to CA and open up the Shared Service under Shared Services Administration.
2. Click Search Administration under the Search section.
3. Click File Types in the left nav bar and then click New File Type.
4. Enter “pdf” and click OK.

ICONS
You will also want to display the PDF icon next to PDF Documents in SharePoint.  You can download the icon from here:

http://www.adobe.com/images/pdficon_small.gif

and copy it into the 12 hive folder here:

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\TEMPLATE\IMAGES

Then open up this XML template file:

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions \12 \TEMPLATE\XML\DOCICON.XML

and add the this line in the <DocIcons.ByExtension> section if it isn’t there already:

REGISTRY
Now on to the registry changes you need to make on each index server.  Make sure to backup your registry before making any changes.  These two changes will register the Adobe PDF iFilter with the Office Search service.  The values that need to be changed are:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

Both values should be changed to:

{E8978DA6-047F-4E3D-9C78-CDBE46041603}

Then go to:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\{Random GUID}\Gather\Search\Extensions\ExtensionList

and add “pdf” to this list. You will have to create a new String Value for this. Just number it the next number in the list, should be 38 on most Sharepoint installs.

SYSTEM PATH
Now you need to add the Adobe install directory to the System Path environmental variable so that the search service can find the dll which provides the iFilter service:

1. Right click My Computer
2. Click Properties
3. Click Advanced
4. Click Environment Variables
5. In the bottom half of the window, find the Path variable and double click it.
6. At the end of the value, add:

;C:\Program Files\Adobe\Reader 9.0\Reader

RESTART SEARCH SERVICES
Now you need to restart the Office Search service so that all changes are reflected. Open up cmd prompt and type

sc stop osearch [press enter]
sc start osearch [press enter]

Or just restart it via the Services MMC.

If you already have PDF documents in SharePoint that you want to search inside, you have to “Reset all crawled content” in Search Settings and then begin a new “Full Crawl” under Content Sources.

UPDATE 9/20/2010: Installing SP2 or cumulative updates to your Sharepoint farm may sometimes reset your registry changes. Specifically your {E8978DA6-047F-4E3D-9C78-CDBE46041603} registry key will be reset to the old {4C904448-74A9-11D0-AF6E-00C04FD8DC02} value or it might include both keys. This will cause your PDF indexing to stop. Just edit the registry values above and put the correct value back in, restart search services & IIS, then run a full crawl. Your PDFs will begin indexing correctly again.

About Jason Samuel

Jason Samuel is a Solutions Architect and Security Practice Lead working at Alchemy Tech Group in Houston, TX with a primary focus on mobility, virtualization, and cloud technologies from Citrix, Microsoft, & VMware. He also has an extensive background in web architecture and information security. He is certified in several technologies and is 1 of 50 people globally that is a recipient of the prestigious Citrix Technology Professional (CTP) award. He is 1 of 28 people in the world that is an Atlantis Community Expert (ACE). He is a featured author on DABCC which provides the latest IT Community News on Cloud, Data Center, Desktop, Mobility, Security, Storage, & Virtualization. In his spare time Jason enjoys writing how-to articles and evangelizing the technologies he works with.

Recommended for you

Leave a Reply

Your email address will not be published. Required fields are marked *