Sanchay ⇔ संचय

Sanchay (संचय) is an open source platform for working on languages, especially South Asian languages, using computers. It can also be useful for those developing Natural Language Processing (NLP) or other text processing applications. It consists of various tools and APIs for this purpose. It is still in the development stage and the design has not yet stabilized, but components like a text editor with customizable support for languages and encodings, annotation interfaces, etc. were first released as part of an experimental version (0.1) on Sourceforge.net. The next version (0.2) has been available on the Internet and has also been released on Sourceforge.net, along with the latest version (0.3). Sanchay is meant to be complementary to the other existing NLP tools and libraries.

Some of the components in the released version are: Syntactic annotation interface, generalised table and tree components, SSF (Shakti Standard Format) API, feature structure API, parallel corpus markup interface, customizable language and encoding support, Sanchay text editor, language and encoding identification, file splitter and format converter, task setup generator (only for syntactic annotation), a simple but powerful data structure called Properties Manager along with a GUI for purposes like customization of applications, a find/replace/extract tool, a CRF based automatic annotation tool, a tree visualizer for phrase structure and dependency relations and an XML based frameset editor that is linked to the Syntactic Annotation Interface to allow PropBank kind of annotation. User documentation has been provided for some of these components. More will be added soon. Some API doc umentation for programmers will also be provided later.

Many other components are in the pipeline. Hopefully other people will get involved with the development so that Sanchay can provide much needed support for South Asian languages for as many purposes as possible. However, it is meant to be useful for other languages too and any help for that will be welcome too.

Sanchay has an object oriented architecture where the emphasis is on a design based on things like modularity, reusability, extensibility and maintainability. The implementation is purely in Java, which means it is platform independent and can be used on Windows as well as Linux without needing any extra setup except installing JDK or JRE.

The last formal release of Sanchay (version 0.3.0) is available for download here. A much more recent (informal) version available here. For the latest version, please contact me.

Alternatively: Download the latest version.

More information is available at the Sanchay Home

Updates about Sanchay are available on the Sanchay News Blog

So They Say

Words signify man's refusal to accept the world as it is.

— Walter Kaufmann