 |
| |
| OCR and Imaging |
|
 |
dtSearch supports the PDF "image with hidden
text" format, and can highlight
right on the scanned image
in this format. |
 |
dtSearch also supports combined text and
image displays in HTML. |
 |
dtSearch Desktop and Network include a built-in
image viewer. |
 |
dtSearch recommends using fuzzy searching
for sifting through possible
OCR errors. |
|
|
| OCR and PDF
The Adobe
PDF file format provides two ways
to combine in a single file images
and OCR’ed text, or images
that have been converted to text
through Optical Character Recognition
(OCR) software.
(1) The "image with hidden text
format" stores the complete original
image of a scanned document, along
with the text obtained through OCR.
The text is "hidden" in the sense
that simply opening the PDF file
displays only the scanned image,
not the underlying OCR'ed text.
Because the OCR'ed text is "hidden"
in the file, however, dtSearch can
index and search it.

After a search, when a user clicks
on an "image with hidden text format"
PDF document, the dtSearch product
will display the scanned image.
Because the actual OCR’ed
text is "hidden," the display will
appear to highlight hits directly
on the image. Click here for a dtSearch Web demo showing
hidden text highlighting.
(2) Another option for combining
scanned images and OCR’ed
text in a single PDF file uses "small
images" for the parts of each scanned
page that do not appear to be text.
For example, the format would store
a picture or a signature as a small
image embedded in the page. The
format would store the non-picture
portion of the page only as OCR’ed
text.
While the "small images" alternative
does not preserve the "true" image
of the original document, it does
produce much more compact files
than the "image with hidden text"
option. The "small images" PDF file
usually stores only a few images
for each page, instead of a complete
image of the whole document. The
text detected through OCR in the
"small images" format can also be
more readable because the resulting
PDF file stores it as text with
font information rather than as
an image.
| • |
For
"The New Paper Trail: Alternative
Information Distribution"
article, click here. |
| • |
For
more information on both
PDF / OCR options, including
a list of some additional
third-party products that
OCR into the PDF format,
click here. |
|
| |
|
|
| |
| |
| Fast, Precision Searching |
| Provides over two dozen
indexed and unindexed
text search options
for all popular file
types. |
| Supports full-text
as well as field searching
in all supported file
types. Has multiple
relevancy-ranking
and other search sorting
options. |
|
| |
| Browser-Optimized Display |
| The dtSearch product
line displays retrieved
files in a browser
with highlighted
hits and convenient
hit and file navigation
options — next
hit, previous hit,
next document, etc. |
For HTML, XML and PDF,
the products highlight
hits while keeping
embedded formatting,
links and intact. |
| For all other supported
file types ("Office,"
Unicode, ZIP, etc.),
the product line has
built-in HTML file
converters for displaying
these files in a browser
with highlighted
hits. |
|
| |
| Organization-Wide Reach |
| Supports |
| • |
"Office" (word processor,
database,
spreadsheet,
presentation) |
| • |
email |
| • |
HTML |
| • |
PDF |
| • |
XML |
| • |
ZIP |
| • |
CSV |
| • |
RTF |
| • |
ANSI |
| • |
Unicode files and more.
|
|
|
| |
|
| |
Screen Captures
Click
to return to the top |
|
|
 |
|