35021 |
Creator |
56412434212aaec1af1a675d5f017f1e |
35021 |
Creator |
4419806f183d406a234964f040dbb7da |
35021 |
Creator |
ext-1bb8933dae13f43a97468da3dfa1683b |
35021 |
Creator |
ext-6a1a7dcf609cfdfdd46b0897064736c7 |
35021 |
Date |
2009 |
35021 |
Date |
2009 |
35021 |
Is Part Of |
p03029743 |
35021 |
Is Part Of |
repository |
35021 |
abstract |
Intuitively, any `bag of words' approach in IR should benefit from taking term dependencies
into account. Unfortunately, for years the results of exploiting such dependencies
have been mixed or inconclusive. To improve the situation, this paper shows how the
natural language properties of the target documents can be used to transform and enrich
the term dependencies to more useful statistics. This is done in three steps. The
term co-occurrence statistics of queries and documents are each represented by a Markov
chain. The paper proves that such a chain is ergodic, and therefore its asymptotic
behavior is unique, stationary, and independent of the initial state. Next, the stationary
distribution is taken to model queries and documents, rather than their initial distributions.
Finally, ranking is achieved following the customary language modeling paradigm. The
main contribution of this paper is to argue why the asymptotic behavior of the document
model is a better representation then just the document's initial distribution. A
secondary contribution is to investigate the practical application of this representation
in case the queries become increasingly verbose. In the experiments (based on Lemur's
search engine substrate) the default query model was replaced by the stable distribution
of the query. Just modeling the query this way already resulted in significant improvements
over a standard language model baseline. The results were on a par or better than
more sophisticated algorithms that use fine-tuned parameters or extensive training.
Moreover, the more verbose the query, the more effective the approach seems to become. |
35021 |
abstract |
Intuitively, any `bag of words' approach in IR should benefit from taking term dependencies
into account. Unfortunately, for years the results of exploiting such dependencies
have been mixed or inconclusive. To improve the situation, this paper shows how the
natural language properties of the target documents can be used to transform and enrich
the term dependencies to more useful statistics. This is done in three steps. The
term co-occurrence statistics of queries and documents are each represented by a Markov
chain. The paper proves that such a chain is ergodic, and therefore its asymptotic
behavior is unique, stationary, and independent of the initial state. Next, the stationary
distribution is taken to model queries and documents, rather than their initial distributions.
Finally, ranking is achieved following the customary language modeling paradigm. The
main contribution of this paper is to argue why the asymptotic behavior of the document
model is a better representation then just the document's initial distribution. A
secondary contribution is to investigate the practical application of this representation
in case the queries become increasingly verbose. In the experiments (based on Lemur's
search engine substrate) the default query model was replaced by the stable distribution
of the query. Just modeling the query this way already resulted in significant improvements
over a standard language model baseline. The results were on a par or better than
more sophisticated algorithms that use fine-tuned parameters or extensive training.
Moreover, the more verbose the query, the more effective the approach seems to become. |
35021 |
authorList |
authors |
35021 |
status |
peerReviewed |
35021 |
uri |
http://data.open.ac.uk/oro/document/102374 |
35021 |
uri |
http://data.open.ac.uk/oro/document/98599 |
35021 |
uri |
http://data.open.ac.uk/oro/document/98600 |
35021 |
uri |
http://data.open.ac.uk/oro/document/98601 |
35021 |
uri |
http://data.open.ac.uk/oro/document/98602 |
35021 |
uri |
http://data.open.ac.uk/oro/document/98603 |
35021 |
uri |
http://data.open.ac.uk/oro/document/98604 |
35021 |
volume |
5766 |
35021 |
volume |
5766 |
35021 |
type |
AcademicArticle |
35021 |
type |
Article |
35021 |
label |
Hoenkamp, Eduard; Bruza, Peter; Song, Dawei and Huang, Qiang (2009). An effective
approach to verbose queries using a limited dependencies language model. Lecture
Notes in Computer Science, 5766 pp. 116–127. |
35021 |
label |
Hoenkamp, Eduard; Bruza, Peter; Song, Dawei and Huang, Qiang (2009). An effective
approach to verbose queries using a limited dependencies language model. Lecture
Notes in Computer Science, 5766 pp. 116–127. |
35021 |
Title |
An effective approach to verbose queries using a limited dependencies language model |
35021 |
Title |
An effective approach to verbose queries using a limited dependencies language model |
35021 |
in dataset |
oro |