subject	predicate	object
35021	Creator	56412434212aaec1af1a675d5f017f1e
35021	Creator	4419806f183d406a234964f040dbb7da
35021	Creator	ext-1bb8933dae13f43a97468da3dfa1683b
35021	Creator	ext-6a1a7dcf609cfdfdd46b0897064736c7
35021	Date	2009
35021	Date	2009
35021	Is Part Of	p03029743
35021	Is Part Of	repository
35021	abstract	Intuitively, any `bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.
35021	abstract	Intuitively, any `bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.
35021	authorList	authors
35021	status	peerReviewed
35021	uri	http://data.open.ac.uk/oro/document/102374
35021	uri	http://data.open.ac.uk/oro/document/98599
35021	uri	http://data.open.ac.uk/oro/document/98600
35021	uri	http://data.open.ac.uk/oro/document/98601
35021	uri	http://data.open.ac.uk/oro/document/98602
35021	uri	http://data.open.ac.uk/oro/document/98603
35021	uri	http://data.open.ac.uk/oro/document/98604
35021	volume	5766
35021	volume	5766
35021	type	AcademicArticle
35021	type	Article
35021	label	Hoenkamp, Eduard; Bruza, Peter; Song, Dawei and Huang, Qiang (2009). An effective approach to verbose queries using a limited dependencies language model. Lecture Notes in Computer Science, 5766 pp. 116–127.
35021	label	Hoenkamp, Eduard; Bruza, Peter; Song, Dawei and Huang, Qiang (2009). An effective approach to verbose queries using a limited dependencies language model. Lecture Notes in Computer Science, 5766 pp. 116–127.
35021	Title	An effective approach to verbose queries using a limited dependencies language model
35021	Title	An effective approach to verbose queries using a limited dependencies language model
35021	in dataset	oro

About

Resources

Support

Follow us