Marko Grobelnik, Dunja Mladenic
Text-Garden is a software library and collection of software tools for solving large scale tasks dealing with structured, semi-structured and unstructured data – emphasis of functionality is on dealing with text. It can be used in various ways covering research and applicative scenarios. Text-Garden is being used by several institutions including British Telecom, Carnegie Mellon University, Microsoft Research, Cycorp.
Some history
The development of Text-Garden started in 1996 as a set of C++ classes for dealing with text in order to perform text-learning tasks. There were two people working on it until 2002 and it was developed slowly according to the academic tasks being on our agenda. From 2003 on Text-Garden became central software platform in our research group at J. Stefan Institute. Text-Garden is used in a number of research and applicative projects (~10 people contributing).
Technical Aspects
Text Garden is almost entirely written in portable C++.
- It compiles under Windows (Microsoft Visual C++, Borland C++) and Unix/Linux (GNU C)
- It runs under 32bit and 64bit platforms
- It consists of ~200.000 relatively compact lines of code
Using Text-Garden Functionality
Text-Garden functionality can be accessed in a number of ways:
- As plain C++ classes giving complete functionality.
- As DLL library of ~250 functions giving simplified extract of major functionality.
- As command line utilities with ~60 command line utilities getting connected in pipeline. Basic utilities covering document classification, clustering and visualization can be downloaded under LGPL license.
- Through GUI tools developed on the top of Text-Garden, including Document Atlas, OntoGen.
- Through interfaces to several platforms with the same API:
- C/C++ – through simplified DLL & native C++
- Java � through JNI
- .NET � e.g. accessible through C#, VB, �
- Matlab � through standard Matlab interface
- Python � through standard Python interface
- Mathematica, Prolog, R � in preparation
The API has ~40 classes and ~250 functions. Interfaces to the all above platforms are generated automatically from the master Text-Garden header file.