HomeProductsPurchase SupportLocal ResellersDownloads Contact
 
dtSearch
dtSearch
dtSearch
dtSearch
dtSearch
 

Products > dtSearch > Common Questions > OCR and Imaging

 
OCR and Imaging

dtSearch supports the PDF "image with hidden text" format, and can highlight right on the scanned image in this format.
dtSearch also supports combined text and image displays in HTML.
dtSearch Desktop and Network include a built-in image viewer.
dtSearch recommends using fuzzy searching for sifting through possible OCR errors.

OCR and PDF

The Adobe PDF file format provides two ways to combine in a single file images and OCR’ed text, or images that have been converted to text through Optical Character Recognition (OCR) software.

(1) The "image with hidden text format" stores the complete original image of a scanned document, along with the text obtained through OCR. The text is "hidden" in the sense that simply opening the PDF file displays only the scanned image, not the underlying OCR'ed text. Because the OCR'ed text is "hidden" in the file, however, dtSearch can index and search it.



After a search, when a user clicks on an "image with hidden text format" PDF document, the dtSearch product will display the scanned image. Because the actual OCR’ed text is "hidden," the display will appear to highlight hits directly on the image. Click here for a dtSearch Web demo showing hidden text highlighting.

(2) Another option for combining scanned images and OCR’ed text in a single PDF file uses "small images" for the parts of each scanned page that do not appear to be text. For example, the format would store a picture or a signature as a small image embedded in the page. The format would store the non-picture portion of the page only as OCR’ed text.

While the "small images" alternative does not preserve the "true" image of the original document, it does produce much more compact files than the "image with hidden text" option. The "small images" PDF file usually stores only a few images for each page, instead of a complete image of the whole document. The text detected through OCR in the "small images" format can also be more readable because the resulting PDF file stores it as text with font information rather than as an image.

For "The New Paper Trail: Alternative Information Distribution" article, click here.

For more information on both PDF / OCR options, including a list of some additional third-party products that OCR into the PDF format, click here.

 

 
 
Fast, Precision Searching
Provides over two dozen indexed and unindexed text search options for all popular file types.
Supports full-text as well as field searching in all supported file types. Has multiple relevancy-ranking and other search sorting options.
 
Browser-Optimized Display
The dtSearch product line displays retrieved files in a browser with highlighted hits and convenient hit and file navigation options — next hit, previous hit, next document, etc.
For HTML, XML and PDF, the products highlight hits while keeping embedded formatting, links and intact.
For all other supported file types ("Office," Unicode, ZIP, etc.), the product line has built-in HTML file converters for displaying these files in a browser with highlighted hits.
 
Organization-Wide Reach
Supports
"Office" (word processor, database, spreadsheet, presentation)
email
HTML
PDF
XML
ZIP
CSV
RTF
ANSI
Unicode files and more.
   
  Screen Captures
Click to return to the top
 
     
For information on how indiaprime.com can help you click above on Contact Now
dtSearch is registered with dtSearch Corp.All rights reserved.
For information about how this site uses personal information, please read our Privacy Policy.