We hope you'll find some interesting information on this home-page of
on Machine Learning in Text Data Analysis co-organized by Dunja Mladenic and Marko Grobelnik.
Particular topics of interest include but are not limited to: text representation, feature subset selection, domain characteristics influence and domain tailored text learning, scalability of developed approaches, text mining and text classification methods, natural language processing for automated text analysis, extensions of the developed methods for handling different natural languages, result evaluation measures, text learning on the Web, inovative applications of text learning and analysis.
Please check Call for Papers for more details about the workshop. This workshop is one of the ICML-99 workshops.
Most computational models of supervised learning rely only on labeled training examples, and ignore the possible role of unlabeled data. This is true for much research in machine learning, including work on learning over text. This talk will explore the potential role of unlabeled data in supervised learning over text. We present an algorithm and experimental results demonstrating that unlabeled data can significantly improve learning accuracy in problems such as learning to classify web pages. We then identify the abstract problem structure that enables the algorithm to successfully utilize this unlabeled data, and prove that unlabeled data will boost learning accuracy for problems in this class. The problem class we identify includes problems where the features describing the examples are redundantly sufficient for classifying the example; a notion we make precise in the paper. This problem class includes many learning problems involving text, such as learning a semantic lexicon over noun phrases, learning to classify web pages, and learning word sense disambiguation rules. We conclude that current research on text learning should consider more strongly the potential role of unlabeled data.
(More can be found in the paper)
This Workshop is supported by Slovenian Language Technologies Society