Semantic Reasoning Evaluation Challenge (SemREC)

21st International Semantic Web Conference (ISWC 2022)

Challenge Description

Despite the development of several ontology reasoning optimizations, the traditional methods either do not scale well or only cover a subset of OWL 2 language constructs. As an alternative, neuro-symbolic approaches are gaining significant attention. However, the existing methods can not deal with very expressive ontology languages. Other than that, some SPARQL query engines also support reasoning, but their performance also is still limited. To find and improve these performance bottlenecks of the reasoners, we ideally need several real-world ontologies that span the broad spectrum in terms of their size and expressivity. However, that is often not the case. One of the potential reasons for the ontology developers to not build ontologies that vary in terms of size and expressivity is the performance bottleneck of the reasoners. SemREC aims to deal with this chicken and egg problem.
The second edition of this challenge includes the following tasks-

  • Task-1 - Ontologies. Submit a real-world ontology that is a challenge in terms of the reasoning time or memory consumed during reasoning. We expect a detailed description of the ontology along with the analysis of the reasoning performance, the workarounds if any, that were used to make the ontology less challenging (for example, dropping of a few axioms, redesigning the ontology, etc.), and the (potential) applications in which the ontology could be used. We will be evaluating the submitted ontologies based on the time consumed for a reasoning task, such as classification, and the memory consumed during reasoning.

  • Task-2 - Systems
    • Ontology/RDFS Reasoners. Submit an ontology/RDFS reasoner that uses neural-symbolic techniques for reasoning and optimization. In terms of technique used, the submissions could fall under any of the below (or related) categories.
      1. Using learning-based techniques for performance optimization of traditional reasoning algorithms [6].
      2. Inductive reasoning techniques based on a subsymbolic representation of entities and relations learned through maximization of an objective function over valid triples [4, 5].
      3. Techniques that can learn the deductive reasoning aspect using the ontology axioms [1, 2, 3].
      4. Neural Multi-hop reasoners to deal with reasoning where multi-hop inference is required [7, 8].
      Based on precision and recall, we will evaluate the submitted systems on the test datasets for scalability (performance evaluation on large and expressive ontologies) and transfer capabilities (ability to reason over ontologies from different domains). We expect a detailed description of the system, including an evaluation of the system on the provided datasets.
    • SPARQL query engines that support entailment regimes such as RDF, RDFS, or OWL 2. We expect a detailed description of the system, including an evaluation of the system on the provided datasets.

    This challenge will be collocated with the 21st International Semantic Web Conference.

    We have a discussion group for the challenge where we share the latest news with the participants and discuss issues related to the evaluation rounds.

    Dataset Details

    We will use the following datasets for evaluating the reasoners submitted to Tasks 2 and 3.

    1. ORE 2015 dataset.
    2. Ontologies of varying sizes and complexities generated using our benchmark, OWL2Bench. It supports all the OWL 2 profiles.
    3. Datasets submitted to Task-1.

    Submission Details

    We have not yet categorized the tasks based on different profiles, reasoning techniques, or reasoning tasks (entailment, class subsumption, class membership, type prediction, and link prediction). We will decide on this aspect based on the submissions we get.

    • Task-1. To generate the leaderboard, the submitted ontologies will be run on some of the traditional description logic reasoners such as Konclude1, ELK2, and Openllet3 . We will use reasoning (classification) time and the memory consumed as the primary metrics. We will use a timeout value of 6 hours, and the limit on the memory will be 96 GB.
    • Task-2.
      • To generate the leaderboard, the neural-symbolic reasoners will be provided with training, validation, and test datasets. The participants will include all the results in the submitted papers and provide their trained models/embeddings. We will further be evaluating the provided models on another small test dataset. The evaluation metrics will be reasoning time, memory consumed, precision and recall.
      • To generate the leaderboard, the SPARQL query engines will be evaluated for the scalability aspect in terms of load time, query response time, and memory consumed. We will provide the datasets, and the participants will discuss their evaluations in the submitted paper. For fair evaluation, we will be re-evaluating the submitted systems on our hardware.



  1. M. Ebrahimi, M.K. Sarker, F. Bianchi, N. Xie, D. Doran, and P. Hitzler, Reasoning over RDF knowledge bases using deep learning, arXiv preprint, arXiv:1811.04132, 2018.
  2. P. Hohenecker and T. Lukasiewicz, Deep learning for ontology reasoning, CoRR, arXiv:1705.10342, 2017.
  3. B. Makni and J. Hendler, Deep learning for noise-tolerant RDFS reasoning, SemanticWeb 10(5) (2019), 823–862.
  4. J. Chen, P. Hu, E. Jimenez-Ruiz, O. M. Holter, D. Antonyrajah, and I. Horrocks, OWL2Vec*: embedding of OWL ontologies. Machine Learning, 2021.
  5. S. Mondal, S. Bhatia, and R. Mutharaju. EmEL++: Embeddings for EL++ Description Logic. Spring Symposium on Combining Machine Learning and Knowledge Engineering (AAAI-MAKE), 2021.
  6. R. Mehri, V. Haarslev, and H. R. Chinaei, A machine learning approach for optimizing heuristic decision‐making in Web Ontology Language reasoners. Computational Intelligence. 37. 10.1111/coin.12404, 2020.
  7. B. Peng, Z. Lu, H. Li, and K.-F.Wong, Towards neural network-based reasoning. arXiv preprint arXiv:1508.05508, 2015.
  8. X. V. Lin, R. Socher, and C. Xiong, Multi-Hop Knowledge Graph Reasoning with Reward Shaping. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3243–3253, 2018.