IJS newsfeed

a clean, continuous, real-time aggregated stream of semantically enriched news articles from RSS-enabled sites across the world.

What it Does

The pipeline performs the following main steps:

  1. Periodically crawl a list of RSS feeds and a subset of Google News and obtain links to news articles
  2. Download the articles, taking care not to overload any of the hosting servers
  3. Parse each article to obtain
    1. Potential new RSS sources mentioned in the HTML, to be used in step (1)
    2. Cleartext version of the article body
  4. Process articles with Enrycher (English and Slovene only)
  5. Expose two streams of news articles (cleartext and Enrycher-processed) to end users.

Demo Visualization

Visit http://newsfeed.ijs.si/visual_demo/ for a real-time visualization of the news stream.

More Info

For more information on stream contents, availability and the API please visit newsfeed.ijs.si.

About

The pipeline has been developed and is being maintained by the Artificial Intelligence Laboratory at Jozef Stefan Institute, Slovenia. In case of questions, contact Mitja Trampus and/or Blaz Novak at .

The development was supported in part by the RENDER, X-Like and MetaNet EU FP7 projects.