---------------------------------------- ----------------------------------------

:-) ICML-99 Workshop: Machine Learning in Text Data Analysis

---------------------------------------- ----------------------------------------

We hope you'll find some interesting information on this home-page of the ICML-99 Workshop
on Machine Learning in Text Data Analysis co-organized by Dunja Mladenic and Marko Grobelnik.

Particular topics of interest include but are not limited to: text representation, feature subset selection, domain characteristics influence and domain tailored text learning, scalability of developed approaches, text mining and text classification methods, natural language processing for automated text analysis, extensions of the developed methods for handling different natural languages, result evaluation measures, text learning on the Web, inovative applications of text learning and analysis.

Please check Call for Papers for more details about the workshop. This workshop is one of the ICML-99 workshops. ----------------------------------------

Invited talk: Supervised Text Learning from Unlabeled Data

Tom M. Mitchell,Carnegie Mellon University, Pittsburgh, PA, USA

Most computational models of supervised learning rely only on labeled training examples, and ignore the possible role of unlabeled data. This is true for much research in machine learning, including work on learning over text. This talk will explore the potential role of unlabeled data in supervised learning over text. We present an algorithm and experimental results demonstrating that unlabeled data can significantly improve learning accuracy in problems such as learning to classify web pages. We then identify the abstract problem structure that enables the algorithm to successfully utilize this unlabeled data, and prove that unlabeled data will boost learning accuracy for problems in this class. The problem class we identify includes problems where the features describing the examples are redundantly sufficient for classifying the example; a notion we make precise in the paper. This problem class includes many learning problems involving text, such as learning a semantic lexicon over noun phrases, learning to classify web pages, and learning word sense disambiguation rules. We conclude that current research on text learning should consider more strongly the potential role of unlabeled data.

(More can be found in the paper) ----------------------------------------

List of accepted papers

----------------------------------------

List of participants (as registered)

  1. Naoki Abe
  2. Thorsten Joachims
  3. Mathias Kirsten (see also)
  4. Helena Ahonen-Myka
  5. Ian Witten
  6. Peter Jansen
  7. Ervin Dobler
  8. Dunja Mladenic, Personal WebWatcher project page
  9. Marko Grobelnik, Yahoo planet project page
  10. Sam Scott
  11. Nicolas Lachiche
  12. Laurent Miclet
  13. Ryszard Michalski
  14. Tom Mitchell, WebKB project page
  15. Donato Malerba
  16. Werner Dilger
  17. Jason Rennie, Cora project page
  18. Phil Long
  19. Natasa Milic-Frayling
  20. Morin Johanne
  21. Ken Barker
  22. Shivakumar Vaithyanathan
  23. Kazuya Chiba
----------------------------------------

Program Committee

This Workshop is supported by Slovenian Language Technologies Society


----------------------------------------
Dunja Mladenic
Last modified: Tue Jul 13 16:12:16 METDST