Three main file formats for document representation are used by the tools. They cover different ways of handling text documents:

  1. Compact-Documents format with the file extension “.Cpd”
  2. Text-Base format with the file extension “.TBs”
  3. Bag-Of-Words format with the file extension “.Bow”