BA/MA: Web Table Understanding

(BA/MA, Supervisors: Basil Ell, Sherzod Hakimov)

Large amounts of information are available on the Web. However, most information is not processable by machines in a way which would allow machines to perform semantic search on this content or to answer questions using this data. Having data represented in the RDF (Resource Description Format) format would be one possibility towards this goal.Despite the progress made in the field of Natural Language Understanding, extracting information from textual documents and representing the content in RDF remains limited due to the complexity of natural language.Besides natural language texts, the Web also contains a plethora of tables and it might be easier to extract information from tables due to their inherent structure (e.g., rows in a table may be similar to each other) than from text.

Within our research group we have began to tackle the problem of Web Table Understanding and published a paper: Towards a Large Corpus of Richly Annotated Web Tables for Knowledge Base Population and made annotated data available.

Goal of this thesis is to build on the basic table interpretation tasks described there and on the dataset produced in the context of that project to define and implement higher-level tasks towards Web Table Understanding. This deeper understanding of tables could be reached via some form of Data Mining or Machine Learning.

Having attended the Semantic Web lecture is a plus but no prerequisite. However, some programming skills are expected. This thesis can be framed as a bachelor thesis or as a master thesis.

Contact Basil Ell or Sherzod Hakimov for more information.