“… the abundance of information will be such that either you have reached such a level of maturity that you are able to be your own filter, or you will desperately need a filter… some professional filter.”

Umberto Eco, A Conversation on Information

SCAN (Smart Content Aggregation and Navigation) is a personal semantic content manager for desktop users. It combines search, text analysis, tagging and metadata functions to provide new user experience of desktop navigation and personal document management.

SCAN aims at problems of personal content organization and findability in information overload age. For that,  SCAN solution provides an integrated set of tools and techniques:

Aggregation • SCAN erases the boundaries put on information by different storage systems. Information flows from different sources are aggregated into single searchable and explorable semantic space where files, web-pages, emails, other content items are equal documents organized by their natural semantic properties, rather than by their physical locations.

Metadata • Unified metadata framework is provided to describe, classify and annotate the documents.

Tagging • A simplest and intuitive way to organize your content. Label any document with tags and navigate the collection with the tags cloud.

Text analysis • Unlike other tools, SCAN is aware of what your documents are about. Really.

Search • Powerful full-text and metadata search over your document collection.

SCAN is an open source software, available for free download.

SCAN is a cross-platform software, independent of specific operation system and computer hardware.

SCAN is designed as a flexible component framework easily configurable for specific user needs. It is extensible by integration the plugins for new document locations and formats, as well as the user interface add-ons.

Scan everywhere

SCAN repository aggregates content from different sources: local folders, web syndication feeds, mailboxes, del.icio.us bookmarks and possibly from other locations if the plugins are available. A user only need to point SCAN to a location and the application will find and add every document from there. Added document locations will be monitored for changes (new, modified or deleted documents) to keep the repository up-to-date.

The documents repository may keep records on thousands of documents independently of their original formats. A number of popular document formats is supported either natively or via the plugins, including HTML, PDF, OpenOffice, MS Office and email messages.

Annotations and metadata

SCAN provides a rich set of metadata properties associated with the documents, including document title, description/annotation, author, creation date and others. The properties are set automatically on document adding and can be quickly edited later.

Metadata properties can be used in search queries to find the documents matching specified criteria. In addition, some properties (author, path, date and language) serve as navigation facets to browse the documents by their values.

Tag everything

The documents collection is structured with a system of tags, similar to the services like del.icio.us or Flickr. Tags are keywords or labels attached to the items to identify them for quick navigation and finding. All tags together form a taxonomy representing the semantics of the documents collection. The taxonomy can be viewed as a “tags cloud” for navigating through the documents repository.

Text analysis

SCAN brings the power of text mining and analysis to discover document semantics and extract basic concepts from a document content.

Text analysis greatly simplifies the process of tagging. It helps a user to pick the most relevant terms identifying a document and assign them as the document tags. It makes manual document tagging as simple as selecting the tags from the suggested candidates. Also, a user can entrust the process of tagging entirely to the system, so that the documents would be tagged automatically with the relevant terms.

Other features based on text analysis are finding the documents conceptually similar to a specific one and suggestion of the terms related to last search query (”associative search”).

Find everything fast

When adding new documents into the repository, their content is indexed for full-text search. The search is performed either with simple text queries or using special forms for advanced search both on text and metadata properties. Advanced search queries can be saved for repeatable use.

After a search is performed, SCAN analyses the results to build a “see also” terms list. A user can select the terms from the suggested list to refine the query and find more documents on a subject of interest (”associative search”).

Another type of search is finding the documents by similarity. This search algorithm takes a specific document as a pattern and find every another document, similar to it. It allows fast finding of everything on a topic of a given document.

See also: