The following test data sets are based on the Mooney Natural Language Learning Data provided to us by Ray Mooney and his group from the University of Texas at Austin (http://www.cs.utexas.edu/users/ml/nldata.html).
The data comprises three data sets, each supplying a knowledge base in OWL and English questions. The three OWL knowledge bases pertain to three different domains: geographical data, job data, and restaurant data.
We translated the original Prolog knowledge bases to OWL knowledge bases that can be found here:
- Geography OWL Data:
- Job OWL Data:
- Restaurant OWL Data:
Each data set provides data-appropriate English questions, which were composed by undergraduate students of the computer science department of the University of Texas in Austin and gathered from “real” people using a Web interface provided by Mooney’s research group [Tang, L. R. and Mooney, R. J. (2001). Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing. In 12th European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 466–477].
- English Geography Questions (877 questions):
- download text file (38 KB)
English Job Questions (620 questions):
- download text file (31 KB)
- English Restaurant Questions (251 questions):
- download text file (14.6 KB)
For each English question, there is also a corresponding logical representation stated as Prolog terms. The logical represantations can be found at: http://www.cs.utexas.edu/users/ml/nldata.html