LREC 2010 Workshop on

Web Logs and Question Answering (WLQA2010)

Malta, Saturday 22nd May 2010

 

Senglea (Myriam Thyes / Wikimedia Commons)

 

Motivation & Scope

An Information Retrieval system takes a user query and returns a ranked list of documents. On the other hand, a Question Answering system provides an exact answer [1]. There has been quite a long period of research in factoid QA driven by annual tracks at CLEF [2], TREC [3] and NTCIR [4]. The result of this work has been that it is possible to construct systems which can answer simple factoid queries with high accuracy. This has led to the belief that QA is a "solved problem" where no more research is required. However, the questions are not usually from real users, they are devised by the assessors at CLEF, TREC etc. Secondly, they are restricted to certain well-known simple types which are only a small subset of the real questions which people wish to ask. Thirdly, questions are considered in isolation (or in some tracks a fixed group) and not in a dialogue context whereas in our interactions with people all questions are answered in context and with the possibility for clarification. Thus, there is a need to inject new ideas into QA research.

 

Recently there has been much interest in Web query logs and in particular methods for analysing these in order to extract information which can be used to improve IR systems [5,6]. Logs are typically extremely large and contain naturally occurring and noisy data. Automatic techniques (using for example statistical approaches or machine learning algorithms) are therefore necessary since manual approaches are not generally feasible.

 

The purpose of the workshop, therefore, is to investigate how some of the methods developed for analysing web logs within an implicit IR context can be applied to QA. For example:

 

·         Can the meaning of IR queries in logs be deduced automatically in order to extract the corresponding questions from them?

 

·         Can NLP techniques developed within QA, e.g. Named Entity recognition be applied to the analysis of query logs?

 

·         Can logs be used to deduce useful new forms of question (i.e. not simple factoids) which could be looked at next by QA researchers?

 

·         Can questions grouped into sessions be comprehended in such a way as to deduce the underlying implicit natural language dialogue consisting of a coherent sequence of questions where each follows logically from both the previous ones and the system's responses to them?

 

·         Are there logs from real (or experimental) QA systems like lexxe.com which can be obtained and what can be learned from them from the perspective of designing evaluation tasks? What about logs from sites like answers.com (where queries are answered by human respondents)?

 

·         Are QA query logs different from IR query logs? Do users behave differently in QA systems? Can QA-style questions be identified within an IR log?

 

·         Can click-through data - where the aim of a question can be inferred from the returned documents which are inspected - be used for the development of QA systems for example for the deduction of important query types and their links to IR queries?

 

·         Are there logs of transcribed speech made from telephone QA systems which can be obtained and what analysis could be carried out on those, using for example techniques developed at related tracks at CLEF such as Cross-Language Speech Retrieval (CL-SR) and Question Answering on Script Transcription (QAST)?

 

Historically, QA was a combination of NLP and IR. Much web log analysis is a form of IR in which the same problem of retrieval is being approached from a different direction, namely the queries themselves. Thus we are proposing here a new combination, namely QA and log analysis. These fields are complementary and share the goal of building better systems for users.

 

1. Prager, J. (2006). Open-Domain Question Answering (2006). Foundations and

   Trends in Information Retrieval, 1 (2), 1-141.

2. CLEF (2009). http://www.clef-campaign.org. Accessed 2009.

3. TREC (2009). http://trec.nist.gov/. Accessed 2009.

4. NTCIR (2009). http://research.nii.ac.jp/ntcir/. Accessed 2009.

5. Jansen, J., Taksa, I., & Spink, A. (eds.) (2008). Handbook of Web Log

   Analysis. Hershey, PA: IGI Global.

6. QLA Workshop (2009). http://ir.shef.ac.uk/cloughie/qlaw2009.

 

Submissions

Authors are invited to submit original research papers addressing questions on the lines listed above. Papers must be related to QA and must involve the use of a query log (but not necessarily of a QA system). Submissions will be reviewed by members of the programme committee and judged on technical quality, clarity and relevance to the workshop.

 

Papers should be no longer than 8 pages, set in accordance with LREC guidelines and using the LaTeX or Word templates which are available here: http://www.lrec-conf.org/lrec2010/?Author-s-Kit-and-Templates .

 

Papers should be submitted in pdf via the WLQA2010 START system: https://www.softconf.com/lrec2010/WLQA2010/ . When using START, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research.  For further information on this new iniative, please refer to http://www.lrec-conf.org/lrec2010/?LREC2010-Map-of-Language-Resources .

 

Proceedings will be produced at the workshop and it is intended that selected papers will be published in a journal special issue after LREC has taken place.

 

Important Dates

First Call for Papers: December 2009

Second Call for Papers: January 2010

Third Call for Papers: February 2010

Fourth Call for Papers: March 2010

Submission deadline: 12th March 2010 (EXTENDED)

Notification of acceptance: 19th March 2010

Final versions of papers: 23rd March 2010

Workshop: At LREC, Saturday 22nd May 2010

 

Organisers

Richard Sutcliffe

University of Limerick

Richard.Sutcliffe at ul dot ie

 

Udo Kruschwitz

University of Essex

udo at essex dot ac dot uk

 

Thomas Mandl

University of Hildesheim

mandl at uni-hildesheim dot de

 

Programme Committee

Bettina Berendt

Katholieke Universiteit Leuven, Belgium

 

Gosse Bouma

Rijksuniversiteit Groningen, The Netherlands

 

Paul Clough

University of Sheffield, UK

 

Giorgio Di Nunzio

University of Padoa, Italy

 

Jim Jansen

Pennsylvania State University, USA

 

Johannes Leveling

Dublin City University, Ireland

 

Fabrizio Silvestri

ISTI-CNR, Italy

 

Tomek Strzalkowski

SUNY Albany, USA

 

José Luis Vicedo

University of Alicante, Spain

 

Kieran White

University of Limerick, Ireland

 

 

Street in Valletta (Väsk / Wikimedia Commons)