Overview
When using OCRed PDFs, an issue may occur when searching for exact words.
For instance:
Exact search term used: "biolog"
Results in hits highlighted from OCRed PDFs: "biolog", "biology", "biologist" etc
This behavior is *not* seen with TXT or otherwise directly generated files. Searching for the exact word "biolog" against TXT files will always return just where it finds "biolog".
Solution
This is an unfortunate limitation of the current level of OCR technology. OCR just goes through and sees each individual character and doesn't make a distinction between characters or words. This means for OCRed PDFs as above, it matches against all of those words because it sees "biolog" in each of them so matches and returns it.
There is a current improvement with ID 168386 in our systems to find a better behavior for this against OCRed PDFs. At this time, it does not look like there is any way to change this behavior.
