Current features

Supported document formats

  • (X)HTML
  • Arbitrary XML
  • Plain text
  • OpenDocument Text *
  • PDF *
  • MS Word 97/2000/XP *
  • MS Excel 97/2000/XP *
  • MS PowerPoint 97/2000/XP *
  • Email message (RFC 822)*

Supported document locations

  • Local directories (recursive)
  • Web-sites (recursive) *
  • Syndication feeds (RSS/Atom) *
  • Del.icio.us bookmarks *
  • Local mail folders *

* - pluggable feature

Supported document languages

  • Danish
  • Dutch
  • English
  • French
  • German
  • Italian
  • Norwegian
  • Portuguese
  • Russian
  • Spanish
  • Swedish

Indexing

  • Autoidentification of a document format by a file name pattern
  • Monitoring the indexed locations to track new, modified and deleted documents and keep the collection up-to-date.
  • Autoindentification of a document language
  • Analyzing content accordingly to specific language rules for stems extraction and stopwords filtering.
  • Caching parsed documents
  • Parsing document metadata properties, if available
  • Guessing basic metadata properties (title, description) from a document content

Search

  • Built-in search capabilities based on Apache Lucene search engine
  • Quick search on document text and metadata properties (with ability of using special query syntax for experienced users)
  • Advanced search using visual query constructor
  • Possibility to save the advanced search queries for repeating use
  • Search of the documents similar to a specified one (”pattern search”).
  • Automatic suggestion of the terms relevant to the previous search query (”See more” or “associative search”)

Tagging and text analysis

  • Manual tags assigning and editing
  • Text analysis functions for extraction and prompting the relevant tags
  • Automated documents tagging based on text analysis mechanism
  • Optional transparent auto-tagging for new or modified documents (on indexing stage)
  • Automated assigning a tag to the relevant documents (”tag auto-population”)
  • Adjustable analysis/auto-tagging parameters
  • Navigating the document collection with the “tags cloud”
  • Finding the groups of related tags and highlighting them dynamically
  • Removing the tags

User interface

  • Viewing the documents list in two modes: List view (brief) and Table view (detailed)
  • Customizable set of properties to display in Table view
  • Sorting the documents list by selected document property.
  • Filtering the documents list by specified filter query
  • Browsing the documents collection with four metadata facets (path, author, date and language)
  • In-place editing the document properties (in table view)
  • Editing the properties and tags for multiple documents in a batch mode
  • Annotating the documents with the Notes dialog
  • Opening the documents with an external application defined for each document type
  • Visual query editor for Advanced search
  • Calendar plugin to browse the documents by creation dates
  • TagClusters plugin to visualize the tags with a graphical clusters map
  • Plugins management framework for remote installation and upgrading the plugins