WP4 – Query generation

This work package will be led by ULEI and will aim to devise approaches that allow transforming natural-language queries into structured queries (e.g., SPARQL).

QAMEL’s architecture is based on loosely coupled REST-interface modules. Thus, the query generation workpackage will aim to develop a module able to convert natural language input generated in WP3 into a SPARQL representation that is able to retrieve an answer set from the store developed in WP5 under the restrictions brought about by mobile platforms.

Since the uptake of the digital information age knowledge is spread over the internet in unstructured (textual) and structured (e.g. RDF) knowledge bases. Thus, the generated SPARQL query will be able to make use of the multimodal, hybrid data sources available to QAMEL.

Moreover, the query generation will be based on structure induction via predicate-argument and dependency-based parse tree to overcome the limits of template-based approaches such as Apple’s SIRI. In addition, the query generation pipeline will make use of novel NLP chunking modules to detect noun phrases from single words, i.e., combining “Hollywood’s” “highest-paid”, “actress” to the proper noun “Hollywood’s highest-paid actress” as well as retrieving named entities from NL input via modern, light-weight and high-quality entity annotation systems.
Overall, the query generation pipeline will consist of classical NLP tools like POS-tagger, Named Entity annotators, dependency parse trees that need to be evolved to be able to work on the limited resources of mobile devices as well as Semantic Web technologies, i.e., template-free SPARQL query induction and SPARQL query ranking algorithms to cover complex natural language information needs of users in a efficient and effective way on mobile devices.

Each part of the pipeline will be continuously evaluated against the state-of-the-art benchmarks provided by the scientific as well as industry partners.