Enrycher DEMO Enrycher JAVA API |
Enrycher is a service-oriented system, providing shallow as well as deep text processing functionality at the text document level.
Shallow text processing:
- topic and keyword detection
- named entity extraction: names of people, locations and organizations, dates, percentages and money amounts
Deep text processing:
- named entity resolution with respect to existing Linked datasets: DBpedia, YAGO, OpenCyc
- named entity merging: co-reference and anaphora resolution
- word sense disambiguation into WordNet
- assertion extraction, by identifying subject – predicate – object sentence elements together with their modifiers (adjectives, adverbs) and negations
Enrycher Services
The Fact Extraction Service is composed of 9 services, depicted in the figure below. The services are grouped in 3 types: Enrycher Pre-processing Services, marked in green, Enrycher Extraction Services marked in blue and Enrycher Transformation Services marked in red. The figure shows the dependencies between services, some of them mandatory (depicted with a filled line), some optional (depicted with a dashed line). All services rely on a text pre-processing step handled within the Text Pre-Processing Service, and consisting of sentence splitting and tokenization.
Example Calls
The Enrycher Web Service API exposes each service functionality.
In order to call the Enrycher services, we provide the following URLs:
- for XML output: enrycher.ijs.si/run
- for RDF output: enrycher.ijs.si/run-rdf
To execute the service, one should send an HTTP POST request, with the raw text in the body:
curl -d “Enrycher was developed at JSI, a research institute in Ljubljana. Ljubljana is the capital of Slovenia.” http://enrycher.ijs.si/run |
The Java API for calling the Enrycher services can be now found on GitHub.
Publications
- ŠTAJNER, Tadej, RUSU, Delia, DALI, Lorand, FORTUNA, Blaž, MLADENIĆ, Dunja, GROBELNIK, Marko. A service oriented framework for natural language text enrichment. Informatica (Ljublj.), 2010, vol. 34, no. 3, 307-313.
- ŠTAJNER, Tadej, MLADENIĆ, Dunja. Entity resolution from texts using statistical learning and ontologies. In Proceedings of the 4th Asian Conference on The Semantic Web, 2009, 91-104.
- RUSU, Delia, FORTUNA, Blaž, MLADENIĆ, Dunja, GROBELNIK, Marko, SIPOŠ, Ruben. Document visualizayion based on semantic graphs. In Proceedings : Information Visualization, IV 2009, 15-17 July 2009, Barcelona, Spain. 292-297.
- RUSU, Delia, FORTUNA, Blaž, GROBELNIK, Marko, MLADENIĆ, Dunja. Semantic graphs derived from triplets with application in document summarization. Informatica (Ljublj.), 2009, vol. 33, no. 3, 357-362.
- ŠTAJNER, Tadej: From unstructured to linked data: entity extraction and disambiguation by collective similarity maximization. In Identity and Reference in web–based Knowledge Representation (IR-KR): Proceedings of the IJCAI-09 workshop, 29-34.
- RUSU, Delia, LORAND, Dali, FORTUNA, Blaž, GROBELNIK, Marko, MLADENIĆ, Dunja. Triplet extraction from sentences. In Proceedings of the 10th International Multiconference Information Society 2007, 218-222.
- GROBELNIK, Marko, MLADENIĆ, Dunja. Simple classification into large topic ontology of web documents. CIT. Journal of Comput. Inf. Technol., 2005, vol. 13, 279-285.