IEcons: A New Consensus Approach Using Multi-Text Representations for Clustering Task

Karima Boutalbi; Rafika Boutalbi; Hervé Verjus; Kave Salamatian; David Telisson; Olivier Le Van

Communication Dans Un Congrès Année : 2024

IEcons: A New Consensus Approach Using Multi-Text Representations for Clustering Task

, , , , (1) ,

Karima Boutalbi

Fonction : Auteur
PersonId : 1375335
ORCID : 0009-0004-9571-3325

Rafika Boutalbi

Fonction : Auteur
PersonId : 1375336
ORCID : 0000-0002-5884-2898

Hervé Verjus

Fonction : Auteur
PersonId : 752014
IdHAL : herve-verjus
ORCID : 0000-0003-3258-1826
IdRef : 147732344

Kave Salamatian

Fonction : Auteur
PersonId : 1328
IdHAL : salamatian-kave
ORCID : 0000-0001-5557-9134
IdRef : 154190500

David Telisson

Fonction : Auteur
PersonId : 1427305
IdHAL : dteli
ORCID : 0009-0003-1417-954X

Laboratoire d'Informatique, Systèmes, Traitement de l'Information et de la Connaissance

Olivier Le Van

Fonction : Auteur
PersonId : 1429129

Résumé

Today we are able to generate a large set of text representations from the simple Bag-of-word (BOW) to the recent transformers capturing the semantic and the contextual text meaning. It was proven that there is no best text representation for text clustering task. Consequently, some works combined text representations using a consensus clustering approach. Two consensus approach types exist, namely explicit and implicit consensus. In the explicit consensus, also known as ensemble clustering, the consensus function is applied a posterior after obtaining cluster labels from each text representation clustering allowing to capture global mutual information between the partitions of all text representations. On the other hand, implicit consensus uses tensor clustering to optimize the clustering consensus partition that deals with similarity matrices of text representations. In this paper, we propose a new consensus text clustering algorithm named IEcons (Implicit-Explicit consensus) that optimizes explicit and implicit consensus clustering simultaneously through text embeddings and tensor representation of texts through similarity matrices. We compare our algorithm with others from the literature on five different textual datasets using several algorithm performance criteria. The comparison results reveal that our algorithm best suits most situations.

Mots clés

Unsupervised learning • Clustering → Consensus Clustering • NLP → Word embedding • Representation learning → Tensor Clustering Embeddings Consensus Implicit consensus Tensor data • Clustering → Consensus Clustering Clustering CCS CONCEPTS Unsupervised learning • Clustering → Graph clustering • NLP → Word embedding • Representation learning → Tensor Hierarchical clustering Tensor Graphs Data representation CCS CONCEPTS Unsupervised learning • Clustering → Graph clustering • NLP → Word embedding • Representation learning → Tensor Hierarchical clustering

Domaines

Informatique [cs]

Fichier principal

CIKM_IEcons__An_Approach_Using_Multi_Text_Representations_for_Clustering_Task (2).pdf (646.07 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

karima boutalbi : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04741799

Soumis le : jeudi 17 octobre 2024-15:19:54

Dernière modification le : mardi 22 octobre 2024-03:17:47

Dates et versions

hal-04741799 , version 1 (17-10-2024)

Identifiants

HAL Id : hal-04741799 , version 1

Citer

Karima Boutalbi, Rafika Boutalbi, Hervé Verjus, Kave Salamatian, David Telisson, et al.. IEcons: A New Consensus Approach Using Multi-Text Representations for Clustering Task. CIKM24: 33rd ACM International Conference on Information and Knowledge Management, Oct 2024, BOISE, United States. pp.613 - 616. ⟨hal-04741799⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-SAVOIE LISTIC

0 Consultations

0 Téléchargements

IEcons: A New Consensus Approach Using Multi-Text Representations for Clustering Task

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager