What it Does
The pipeline performs the following main steps:
- Periodically crawl a list of RSS feeds and a subset of Google News and obtain links to news articles
- Download the articles, taking care not to overload any of the hosting servers
- Parse each article to obtain
- Potential new RSS sources mentioned in the HTML, to be used in step (1)
- Cleartext version of the article body
- Process articles with Enrycher (English and Slovene only)
- Expose two streams of news articles (cleartext and Enrycher-processed) to end users.
Visit http://newsfeed.ijs.si/visual_demo/ for a real-time visualization of the news stream.
For more information on stream contents, availability and the API please visit newsfeed.ijs.si.
The pipeline has been developed and is being maintained by the Artificial Intelligence Laboratory at Jozef Stefan Institute, Slovenia. In case of questions, contact Mitja Trampus and/or Blaz Novak at *protected email*.