IntelliExtract - Features
Information sources
IntelliExtract can deal with all content sources
where text is available in a machine readable format e.g. World
Wide Web, SEC filings, Proprietary databases and company intranet.
Navigation
IntelliExtract traverses from home page to a
designated page to extract specific pages and pieces of information
from within a website as well as information from across multiple
websites e.g. information about product or company management
from different websites.
Understands structure
IntelliExtract understands structure of a webpage and decide whether it is a news, forum, review, bulletin boards or blog.
IntelliExtract also understands whether it is dealing with advertisements
or with text required to be extracted from the pages and hence
output of this tool is much cleaner.
Understands entities
IntelliExtract understands entities like person names, roles, organizations, locations, biographical info, email, phone, address.
IntelliExtract can be configured to take user defined/domain
specific entities e.g. names of products, technologies, diseases,
molecules etc.
Associates Entities
IntelliExtract with its analytical capabilities, validates, disambiguates and associates entities from extracted text which allows relationship building.
User Input
User input may be a home or designated page URL
or a company name or document corpus.
System output
The output may be entered directly in to a database
without further manual intervention.