Title Learning similarity measures in non-orthogonal space
Authors Liu, Ning
Zhang, Benyu
Yan, Jun
Yang, Qiang
Yan, Shuicheng
Chen, Zheng
Bai, Fengshan
Ma, Wei-Ying
Affiliation Department of Mathematical Science, Tsinghua University, Beijing, 100084, China
Microsoft Research Asia, 49 Zhichun Road, Beijing 100080, China
Department of Information Science, School of Mathematical Science, Peking University, Beijing 100871, China
Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Issue Date 2004
Citation CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management.Washington, DC, United states.
Abstract Many machine learning and data mining algorithms crucially rely on the similarity metrics. The Cosine similarity, which calculates the inner product of two normalized feature vectors, is one of the most commonly used similarity measures. However, in many practical tasks such as text categorization and document clustering, the Cosine similarity is calculated under the assumption that the input space is an orthogonal space which usually could not be satisfied due to synonymy and polysemy. Various algorithms such as Latent Semantic Indexing (LSI) were used to solve this problem by projecting the original data into an orthogonal space. However LSI also suffered from the high computational cost and data sparseness. These shortcomings led to increases in computation time and storage requirements for large scale realistic data. In this paper, we propose a novel and effective similarity metric in the non-orthogonal input space. The basic idea of our proposed metric is that the similarity of features should affect the similarity of objects, and vice versa. A novel iterative algorithm for computing non-orthogonal space similarity measures is then proposed. Experimental results on a synthetic data set, a real MSN search click-thru logs, and 20NG dataset show that our algorithm outperforms the traditional Cosine similarity and is superior to LSI. Copyright 2004 ACM.
URI http://hdl.handle.net/20.500.11897/329098
Indexed EI
Appears in Collections: 数学科学学院

Files in This Work
There are no files associated with this item.

Web of Science®


0

Checked on Last Week

百度学术™


0

Checked on Current Time




License: See PKU IR operational policies.