Connect with us

Hi, what are you looking for?

SharePoint 2007

How do you get Microsoft SharePoint 2007 to read and index content inside a PDF file?

This is an easy one but requires a little bit of work to get working correctly.  SharePoint uses a feature called Index Server to search documents but it doesn’t search within PDFs by default. Searching inside PDF documents requires an iFilter from Adobe which they designed for 3rd party systems to read the PDF file format. Adobe includes this filter with Adobe Reader or you can download iFilter separately from Adobe’s site if you don’t want Reader installed on your SharePoint servers. 

http://www.adobe.com/products/reader – Latest version of Adobe Reader

or

http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611 – x86 iFilter
http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025 – x64 iFilter

 
CENTRAL ADMINISTRATION
Now in SharePoint itself, you need to configure the search service to index files with the .pdf extension:

1. Go to CA and open up the Shared Service under Shared Services Administration.
2. Click Search Administration under the Search section.
3. Click File Types in the left nav bar and then click New File Type.
4. Enter “pdf” and click OK.

ICONS
You will also want to display the PDF icon next to PDF Documents in SharePoint.  You can download the icon from here:

http://www.adobe.com/images/pdficon_small.gif

and copy it into the 12 hive folder here:

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\TEMPLATE\IMAGES

Then open up this XML template file:

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions \12 \TEMPLATE\XML\DOCICON.XML

and add the this line in the <DocIcons.ByExtension> section if it isn’t there already:

REGISTRY
Now on to the registry changes you need to make on each index server.  Make sure to backup your registry before making any changes.  These two changes will register the Adobe PDF iFilter with the Office Search service.  The values that need to be changed are:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

Both values should be changed to:

{E8978DA6-047F-4E3D-9C78-CDBE46041603}

Then go to:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\{Random GUID}\Gather\Search\Extensions\ExtensionList

and add “pdf” to this list. You will have to create a new String Value for this. Just number it the next number in the list, should be 38 on most Sharepoint installs.

SYSTEM PATH
Now you need to add the Adobe install directory to the System Path environmental variable so that the search service can find the dll which provides the iFilter service:

1. Right click My Computer
2. Click Properties
3. Click Advanced
4. Click Environment Variables
5. In the bottom half of the window, find the Path variable and double click it.
6. At the end of the value, add:

;C:\Program Files\Adobe\Reader 9.0\Reader

RESTART SEARCH SERVICES
Now you need to restart the Office Search service so that all changes are reflected. Open up cmd prompt and type

sc stop osearch [press enter]
sc start osearch [press enter]

Or just restart it via the Services MMC.

If you already have PDF documents in SharePoint that you want to search inside, you have to “Reset all crawled content” in Search Settings and then begin a new “Full Crawl” under Content Sources.

UPDATE 9/20/2010: Installing SP2 or cumulative updates to your Sharepoint farm may sometimes reset your registry changes. Specifically your {E8978DA6-047F-4E3D-9C78-CDBE46041603} registry key will be reset to the old {4C904448-74A9-11D0-AF6E-00C04FD8DC02} value or it might include both keys. This will cause your PDF indexing to stop. Just edit the registry values above and put the correct value back in, restart search services & IIS, then run a full crawl. Your PDFs will begin indexing correctly again.

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Apache

Today I would like to go over proper URL redirection when using SSL but first I would like to preface this by describing what...

Citrix Workspace

You can use FIDO2 hardware security keys plugged into your physical desktop over the Citrix HDX remoting protocol for use with virtualized Windows Desktop...

Exchange 2003

A useful Exchange 2003 guide I wrote for a friend’s blog originally but I am posting it here on mine now for your viewing...

Apache

In a worst case scenario and all your web servers have failed, what do you do? You could have a standby group of servers...

JasonSamuel.com began in 2008 as a way for me to give back to the IT community. This website features the latest news and how-to's on enterprise mobility, security, virtualization, cloud architecture, and other technologies I work with. This website has evolved over time to become a go-to reference hub for these technologies. It receives hundreds of thousands of unique visitors from all over the world each month. More details on the About Me page.
Copyright © 2008-2023 JasonSamuel.com