|
|
Invariant™ Overview:
Quick Summary
- Built
from the Ground Up to Support the Unicode Standard
- Foreign
Language Extraction and Precision OCR (122
languages)
- Support
for Deeply Embedded Documents and Images
- Open
Software Architecture Allows Rapid Customization
- Scalable
System – Relieves Bottlenecks With
Massive Throughput
- Dynamic
Searching and Filtering
With No Reloads Required
- Recognizes
Over 2,500 File Types From Multiple Operating
Systems
- Extremely
Accurate Rendering Independent of Print Driver
Limitations
- Export
as TIFF or Searchable PDF (Full Color
with Interactive Links)

Technical Overview
System Features
- Fully
Distributed Processing
Workloads for projects
are shared across multiple multi-threaded worker
machines. Discovery, data extract, TIFF'ing and OCR
are all distributed and load balanced.
- Full
Unicode Support
Designed for Unicode
from the ground up. Original client data
which is not in Unicode is up-converted to Unicode. Data
delivery to 3rd party applications can be accomplished
in Unicode, ANSI, RTF or more than two dozen other
text encodings depending upon the application requirements
of the end user.
- Internationalized
Full-Text Search
Full text searches
on extracted text can currently be performed in
122 languages.
Discovery
- File
Identification
Invariant can identify
over 2,500 types of files from multiple operating
systems and examines file content rather
than file extension to determine file type.
- Metadata
Invariant never assumes a predefined set
of metadata fields. Instead, our software first
walks files in order to preserve volatile metadata
while examining and capturing any/all metadata
found within native files. Once new metadata fields
are discovered they are given a unique ID for cataloging
and added to our progressive metadata library (currently
more than 12,000 metadata fields).
- File
Formats
Invariant™ currently supports over 2,500 file types including all major email and data files. Below are just a few examples:
• All standard files and file types
• Adobe Acrobat including packages and portfolios
• Bloomberg emails
• Microsoft Snapshot files
• Can extract CAB file archives
• Can extract RAR compressed files
• Full Office 2007 support including customized extraction and rendering
• XPS fixed and flowed documents
• Can extract annotations, comments and attachments from PDF files
• Can typically address rare or exotic file types with custom code in 2 days
- Deep
Embedding
Invariant uncovers
deeply embedded objects and sub-documents
(for example, an email with attachments dragged
and dropped into a Word document; an embedded
Word document inside an Excel spreadsheet)
through a process of infinite recursion.
- Dates & Times
Dates and times
are stored in UTC. Time zone adjustments can be
made during the data export phase rather than having
to preset the time zone before discovery begins.
- TIFF'ing
and OCR
The OCR engine is only
utilized on pages where an image is present.
If a page of a document contains a mixture of
text and graphics, the text is extracted separately
and then the graphics are OCR'd. The resulting
text for that page will physically separate what
was OCR'd versus what was text extracted.
OCR engine supports 122 languages, including Chinese Traditional, Chinese Simplified,
Korean and Japanese.
Exporting & Load Files
- Highly
Customizable
Invariant’s powerful
export features allow the combination of export
tasks or split export tasks into separate work
units. Load files can be built rapidly without
having to copy the native files or images, allowing
for quick work verification. This technology
also allows us the ability to resume an export
without overwriting the existing destination
files.
- Rich-Text
Supported
Invariant™ supports
rich-text so complete hit-highlighting from search
requests can be included in the exported text.
- Image
Endorsements on TIFF/PDF
Export documents as TIFF
images or PDFs with full color and interactive
links. Endorsements on PDF exports are fully-searchable
and can be completely customized.
- Internationalized
Output
Export text in RTF, UTF-8,
UTF-16, Unicode, ANSI or more than two dozen
other text encodings.
- Wide
Range of Built-in Export Definitions
Extensible Export Mechanism
means we can provide a wide range of built-in
export definitions with Invariant. We
deliver a customized export solution to fit
any third-party application requirements including,
but not limited to: Concordance®, CT Summation
and Ringtail®.
|
|
|