Vol: 54(68) No: 1 / March 2009 Searching for Similar Documents using Keywords and Taxonomies in Mobile Device Environments Kristof Csorba Department of Automation and Applied Informatics, Budapest University of Technology and Economics, 1111 Budapest, Goldmann Gy. Tér 3., Hungary, phone: +36 1 463-2870, e-mail: kristof@aut.bme.hu, web: http://www.aut.bme.hu/ Istvan Vajk Department of Automation and Applied Informatics, Budapest University of Technology and Economics, 1111 Budapest, Goldmann Gy. Tér 3., Hungary, e-mail: vajk@aut.bme.hu Keywords: document similarity, taxonomy, mobile device, topic representation Abstract This paper presents a new extension for a keyword list based document similarity comparison system which was developed for applications in mobile device environments. It was designed to support users of mobile devices searching for documents in a peer-to-peer network which have similar topic to the ones on the users own device. The method is designed for slower processors, fewer memory and small data traffic between the mobile devices to conform the requirements of mobile devices like phones or PDA-s. The similarity measure is based on the number of common keywords which is now extended with a taxonomic support. This allows comparing documents which have similar topics but which are far enough not to have any common keywords. Details of the taxonomic extension and the creation method of topic hierarchy specific keyword taxonomies are explained in this paper in details. References [1] D. Cai and X. He, “Orthogonal locality preserving indexing.” SIGIR \'05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 3-10, New York, NY, USA, 2005. ACM Press. [2] K. Csorba and I. Vajk. “Supervised term cluster creation for document clustering” Scientific Bulletin of Politehnica University of Timisoara, Romania, Transactions on Automatic Control and Computer Science, Vol. 51(65)(No 3./2006.), 2006. [3] C. Fellbaum, editor. “WordNet: An Electronic Lexical Database”. The MIT Press, Cambridge, Massachusetts, 1989. [4] B. Forstner and H. Charaf, “Neighbor selection in peer-to-peer networks using semantic relations.” WSEAS Transactions on Information Science and Applications, Volume 2(Issue 2):239-244, February 2005. ISSN 1790-0832. [5] B. Forstner, I. Kelenyi, and G. Csucs. “Towards Cognitive and Cooperative Wireless Networking: Techniques, Methodologies and Prospects”, chapter Peer-to-Peer Information Retrieval Based on Fields of Interest, pp. 311-325. ISBN 978-1-4020-5968-1. Springer Verlag, 2007. [6] G. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. Harshman, L. A. Streeter, and K. E. Lochbaum. “Information retrieval using a singular value decomposition model of latent semantic structure.” In Y. Chiaramella, editor, Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 465-480, Grenoble, France, 1988. ACM. [7] K. Lang. “NewsWeeder: learning to filter netnews.” In A. Prieditis and S. J. Russell, editors, Proceedings of ICML-95, 12th International Conference on Machine Learning, pp 331-339, Lake Tahoe, US, 1995. Morgan Kaufmann Publishers, San Francisco, US. [8] L. R. Oded Maimon, editor. “The Data Mining and Knowledge Discovery Handbook” Springer, 2005. [9] N. Slonim and N. Tishby. “Document clustering using word clusters via the information bottleneck method.” In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Clustering, pp 208-215, 2000. |