• Quality estimation for machine translation
  • Modelling and application of linguistic similarity
  • Computational modeling and processing of tense, aspect, modality and temporal information in natural languages
  • Aligning parallel corpora
  • Language and encoding identification
  • A computational phonetic model for Indian language scripts based on the highly phonetic and well organized nature of Brahmi based scripts. It is being used to build applications like a spell checker, cognate identifier, transliteration tool, etc. for Indian languages.
  • Phonetic processing of text: transliteration, letter-to-phoneme conversion, model of phonetic space
  • Crosslingual information retrieval
  • Building GUI based interfaces for corpora annotation in Java
  • Building APIs for language resources like dictionaries and corpora
  • A multi-purpose editor specialized for NLP and Indian languages
  • Etc.

Note: Most of the above and several others like APIs for N-Gram modelling, corpus compilation, find/replace/extract tools for corpora, file splitter, tree viewer, etc. have been integrated as a small open-source Java based platform for NLP, especially focusing on Indian languages. Parts of Sanchay are already being used by many people for working with South Asian languages. Some others are now also contributing to the development of some Sanchay modules.

The last formal release of Sanchay (version 0.4.1) is available for download here. The latest builds are usually put here. You can also contact me.

So They Say

Children, don't speak so coarsely," said Mr. Webster, who had a vague notion that some supervision should be exercised over his daughters' speech, and that a line should be drawn, but never knew quite when to draw it. He had allowed his daughters to use his library without restraint, and nothing is more fatal to maidenly delicacy of speech than the run of a good library.

— Robertson Davies

Tempest Tost