Text-Garden -- Text-Mining Software Tools
Department of Knowledge Technologies
Jozef Stefan Institute,
Text-Garden is a software library and collection of software tools for solving large scale tasks
dealing with structured, semi-structured and unstructured data - emphasis of functionality is on dealing with text.
It can be used in various ways covering research and applicative scenarios.
Text-Garden is being used by several institutions including British Telecom, Carnegie Mellon University, Microsoft Research, Cycorp.
The development of Text-Garden started in 1996 as a set of C++ classes for dealing with text
in order to perform text-learning tasks.
There were two people working on it until 2002 and it was developed slowly according to the academic tasks being on our agenda.
From 2003 on Text-Garden became central software platform in our
research group at J. Stefan Institute.
Text-Garden is used in a number of research and applicative projects (~10 people contributing).
Text Garden is almost entirely written in portable C++.
- it compiles under Windows (Microsoft Visual C++, Borland C++) and Unix/Linux (GNU C)
- it runs under 32bit and 64bit platforms
- it consists of ~200.000 relatively compact lines of code
Using Text-Garden Functionality
Text-Garden functionality can be accessed in a number of ways:
- As plain C++ classes giving complete functionality.
- As DLL library of ~250 functions giving simplified extract of major functionality.
- As command line utilities with ~60 command line utilities getting connected in pipeline.
Basic utilities covering document classification, clustering and
visualization can be downloaded
under LGPL license.
- Through GUI tools developed on the top of Text-Garden, including Document Atlas, OntoGen.
- Through interfaces to several platforms with the same API:
The API has ~40 classes and ~250 functions.
Interfaces to the all above platforms are generated automatically from the master Text-Garden header file.
- C/C++ - through simplified DLL & native C++
- Java – through JNI
- .NET – e.g. accessible through C#, VB, …
- Matlab – through standard Matlab interface
- Python – through standard Python interface
- Mathematica, Prolog, R – in preparation