How do you get SharePoint 2007 to read and index content inside a PDF file?
This is an easy one but requires a little bit of work to get working correctly. SharePoint uses a feature called Index Server to search documents but it doesn’t search within PDFs by default. Searching inside PDF documents requires an iFilter from Adobe which they designed for 3rd party systems to read the PDF file format. Adobe includes this filter with Adobe Reader or you can download iFilter separately from Adobe’s site if you don’t want Reader installed on your SharePoint servers.
http://www.adobe.com/products/reader – Latest version of Adobe Reader
or
http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611 – x86 iFilter
http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025 – x64 iFilter
CENTRAL ADMINISTRATION
Now in SharePoint itself, you need to configure the search service to index files with the .pdf extension:
1. Go to CA and open up the Shared Service under Shared Services Administration.
2. Click Search Administration under the Search section.
3. Click File Types in the left nav bar and then click New File Type.
4. Enter “pdf” and click OK.
ICONS
You will also want to display the PDF icon next to PDF Documents in SharePoint. You can download the icon from here:
http://www.adobe.com/images/pdficon_small.gif
and copy it into the 12 hive folder here:
C:Program FilesCommon FilesMicrosoft SharedWeb Server Extensions12TEMPLATEIMAGES
Then open up this XML template file:
C:Program FilesCommon FilesMicrosoft SharedWeb Server Extensions 12 TEMPLATEXMLDOCICON.XML
and add the this line in the <DocIcons.ByExtension> section if it isn’t there already:
<Mapping Key=”pdf” Value=”pdficon_small.gif”/>
REGISTRY
Now on to the registry changes you need to make on each index server. Make sure to backup your registry before making any changes. These two changes will register the Adobe PDF iFilter with the Office Search service. The values that need to be changed are:
HKEY_LOCAL_MACHINESOFTWAREMicrosoftOffice Server12.0SearchSetupContentIndexCommonFiltersExtension.pdf
HKEY_LOCAL_MACHINESOFTWAREMicrosoftShared ToolsWeb Server Extensions12.0SearchSetupContentIndexCommonFiltersExtension.pdf
Both values should be changed to:
{E8978DA6-047F-4E3D-9C78-CDBE46041603}
SYSTEM PATH
Now you need to add the Adobe install directory to the System Path envrionmental veriable so that the search service can find the dll which provides the iFilter service:
1. Right click My Computer
2. Click Properties
3. Click Advanced
4. Click Environment Variables
5. In the bottom half of the window, find the Path variable and double click it.
6. At the end of the value, add:
;C:Program FilesAdobeReader 9.0Reader
RESTART SEARCH SERVICES
Now you need to restart the Office Search service so that all changes are reflected. Open up cmd prompt and type
sc stop osearch [press enter]
sc start osearch [press enter]
Or just restart it via the Services MMC.
If you already have PDF documents in SharePoint that you want to search inside, you have to ”Reset all crawled content” in Search Settings and then begin a new ”Full Crawl” under Content Sources.
More of my posts you might like:
