Vol: 57(71) No: 3 / September 2012 Projective Dimension of Text Documents in Multidimensional Space using PART Neural Network Roman Krakovsky Department of Informatics, Catholic university in Ruzomberok, Faculty of Pedagogy, Hrabovska cesta 1, 034 01 Ruzomberok, Slovakia, phone: (421) 44-4326844, e-mail: roman.krakovsky@ku.sk Igor Mokris Institute of Informatics, Slovak Academy of Sciences, Dubravska cesta 9, 845 07 Bratislava, Slovakia, e-mail: igor.mokris@savba.sk Keywords: PART neural network, clustering, multi-dimensional space, outlier cluster Abstract The paper aim to clustering of text documents by neural networks. Text documents in our proposal model are saved in Vector Space (VS) model, described by VS matrix. Conventional clustering algorithm have problem with clustering in multidimensional data space because of inherent sparsity of data. The presented approach for creation of subspaces of multidimensional spaces uses the Projective Adaptive Resonance Theory (PART) neural network that enables this way of reduction of multidimensional text document space and also the text document clustering. Efficiency of the text document clustering by subspaces of multidimensional space it is influenced by properties of PART. It means that optimal parameters of PART have to be set. Thanks to exact settings of distance and vigilance parameter of PART it is possible to find the clusters, their centers in the projective dimensions of subspaces and creates outlier cluster for noisy datasets. References [1] G. Salton, A. Singhal, J. Allan, “Automatic Text Decomposition and Structuring,” Information Processing and Management, pp.127-138, 1996. [2] G. G. Chowdhury, Introduction to Modern Information Retrieval. Facet Publishing, 2004. [3] C.C. Aggarwal, C. Procoius, J.L. Wolf, P.S. Yu and J.S. Park, “Fast Algorithm for Projected Clustering,” SIGMOD ’99, pp. 61-72, 1999. [4] Y. Cao and J. Wu, “Projective ART for clustering datasets in high dimensional spaces,” Neural networks, vol. 15, no. 1, pp. 105-120, January 2002. [5] R. Ch. Chen, Ch. H. Chuang, “Automating Construction of a Domain Ontology Using a Projective Adaptive Resonance Theory Neural Network and Bayesian Network,” Expert systems, Vol. 25, No. 4, pp. 414-430, 2008. [6] R. Krakovsky and R. Forgac, “Neural network approach to multidimensional data classification via clustering,” 9th International Symposium on Intelligent Systems and Informatics SISY 2011, Subotica, Serbia, pp. 169-174, September 2011. [7] J. D. Hunter, J. Wu and J. G. Milton, “Clustering Neural Spike Trains with Transient Responses,“ 47th IEEE Conference on Decision and Control, Cancun, Mexico, pp. 2000-2005, December 2008. [8] W. J. Krzanowski and F. H. Marriot, Multivariate Analysis: Classification, Covariance Structure and Repeated Measurements, Wiley-Interscience, October 1998. [9] K. Lin and R. Kondadadi, “A similarity based soft clustering algorithm for documents,” DASFAA-2001, pp. 40-47, April 2001. [10] L. Parson, E. Haque and H. Liu, “Subspace Clustering for High Dimensional Data,” SIKDD Explorations, pp. 90-105, 2004. [11] G. A. Carpenter, S. Grossberg and D. B. Rosen, “ART2-A: An adaptive resonance algorithm for rapid category learning and recognition,” Neural Networks, vol. 4, pp. 493-504, 1991. [12] L. Liu and L. Huang, “Projective ART with Buffers for the High Dimensional Space Clustering and an Application to Discover Stock Associations,” Neurocomputing, pp. 1283-1295, 2009. [13] Y. Cao and J. Wu, “Dynamics of Projective Adaptive Resonance Theory Model: the Foundation of PART Algorithm,” IEEE Transl. Neural network, vol. 15, pp.245-260, March 2004. [14] C. C. Aggarwal and P.S. Yu, “Outlier Detection for High Dimensional Data,” ACM SIGMOD 2001 international conference on Management of Data, pp. 37-46, ACM Press, 2001. [15] S. Grossberg and G. A. Carpenter, “Adaptive resonance theory”, The Handbook of Brain Theory and Neural Networks, MIT Press, 2002. [16] S. Grossberg and G. A. Carpenter, “The ART of adaptive pattern recognition by self-organizing neural network”, Computer, vol. 21, pp. 77-88, 1988. [17] T. Kawamura T., H. Takahashi and H. Honda, “Proposal of New Gene Filtering Method, BagPART for Gene Expression Analysis with Small Sample”, Journal of Bioscience and Bioengineering, vol. 105, no. 1, pp. 81-84, 2008. [18] R. Krakovsky and I. Mokris, “Clustering of Text Documents by Projective Dimension of Subsapces using PART Neural Network”, 7th International Symposium on Applied Computational Intelligence and Informatics, SACI 2012, Timisoara, Romania, pp. 203-208, May 2012. [19] R. Agrawal, J. Gehrke, D. Gunopilos and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications”, SIGMOD 98, Washington, USA, pp. 94-105, 1998. [20] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Adison Wesley, 1999. [21] G. Gan and J. Wu, “Subspace Clustering for High Dimensional Categorical Data”, ACM SIGKDD Explorations Newsletter, vol. 6, no. 2, pp. 87-94, December 2004. |