NDLI: Fast document clustering based on weighted comparative advantage

Content Provider	IEEE Xplore Digital Library
Author	Jie Ji Chan, T.Y.T. Qiangfu Zhao
Copyright Year	2009
Description	Author affiliation: Intelligent System Lab, The University of Aizu, Aizuwakamatsu, Fukushima, Japan (Jie Ji; Qiangfu Zhao) \|\| School of Computing, The University of Akureyri, Iceland (Chan, T.Y.T.)
Abstract	Document clustering is the process of partitioning a set of unlabeled documents into clusters such that documents within each cluster share some common concepts. To help with this analysis, concepts are conveniently represented using some key terms. For clustering algorithm, the most costly CPU time has to do with the classification phase. Using words as features, text data are represented in a very high dimensional vector space. We have studied a comparative advantage based algorithm for clustering sparse data in this space, it used one “ruler” instead of k centers to identify the comparative advantage of each cluster and define the cluster label for each document. However, this algorithm only considered the relative strength between clusters, the relationship between terms was ignored. In this paper, we proposed a weighted comparative advantage based clustering algorithm. The experimental results based on SMART system databases show that the new algorithm is better than simple comparative advantage algorithm, without any extra computation time. Compare with k-means, not only can it get comparable results but it can also significantly accelerate the clustering procedure.
Starting Page	541
Ending Page	546
File Size	257151
Page Count	6
File Format	PDF
ISBN	9781424427932
ISSN	1062922X
DOI	10.1109/ICSMC.2009.5346877
Language	English
Publisher	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher Date	2009-10-11
Publisher Place	USA
Access Restriction	Subscribed
Rights Holder	Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subject Keyword	Clustering algorithms Frequency Intelligent systems Partitioning algorithms Databases Cybernetics USA Councils Acceleration Data mining Virtual manufacturing weighted comparative advantage Document clustering dimension reduction key term extraction sparsity k-means
Content Type	Text
Resource Type	Article

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Libarray of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Comparative Advantage Approach for Sparse Text Data Clustering

K-means Clustering Algorithm in Projected Spaces

A study on criteria for extracting key terms in document clustering

Hybrid clustering algorithm

A Modified K-means Algorithm for Sequence Clustering

Using semantic and structural similarities for indexing and searching scientific papers

The feature extraction and dimension reduction research based on weighted FCM clustering algorithm

An intelligent Weighted Kernel K-Means algorithm for high dimension data

A study on cluster validity using intelligent evolutionary K-means approach

Fast document clustering based on weighted comparative advantage

Similar Documents

Comparative Advantage Approach for Sparse Text Data Clustering

K-means Clustering Algorithm in Projected Spaces

A study on criteria for extracting key terms in document clustering

Hybrid clustering algorithm

A Modified K-means Algorithm for Sequence Clustering

Using semantic and structural similarities for indexing and searching scientific papers

The feature extraction and dimension reduction research based on weighted FCM clustering algorithm

An intelligent Weighted Kernel K-Means algorithm for high dimension data

A study on cluster validity using intelligent evolutionary K-means approach

Fast document clustering based on weighted comparative advantage