publications
2025
- DSCDense Subgraph Clustering and a New Cluster Ensemble MethodThe-Anh Vu-Le, João Alfredo Cardoso Lamy, Tomás Alessi, Ian Chen, Minhyuk Park, Elfarouk Harb, George Chacko, and Tandy WarnowIn Proceedings of 14th International Conference on Complex Networks & Their Applications, 2025
We propose DSC-Flow-Iter, a new community detection algorithm that is based on iterative extraction of dense subgraphs. Although DSC-Flow-Iter leaves many nodes unclustered, it is competitive with leading methods and has high-precision and low-recall, making it complementary to modularity-based methods that typically have high recall but lower precision. Based on this observation, we introduce a novel cluster ensemble technique that combines DSC-Flow-Iter with modularity-based clustering, to provide improved accuracy. We show that our proposed pipeline, which uses this ensemble technique, outperforms its individual components and improves upon the baseline techniques on a large collection of synthetic networks.
- SBM+WCCUsing stochastic block models for community detectionThe-Anh Vu-Le*, Minhyuk Park*, Ian Chen, and Tandy WarnowIn Applied Network Science, 2025
A recent study reported by Park et al. (Improved community detection using stochastic block models, Springer, Heidelberg, 2025) in Complex Networks and their Applications 2024 showed that clusterings from three Stochastic Block Models (SBMs) in graph-tool, a popular software package, often had internally disconnected clusters when used on large real-world or synthetic networks. To address this issue, Park et al. (Improved community detection using stochastic block models, Springer, Heidelberg, 2025) presented a simple technique, Well-Connected Clusters (WCC), that repeatedly finds and removes small edge cuts of size at most $\log_10(n)$ in clusters, where n is the number of nodes in the cluster, and showed that treatment of graph-tool SBM clusterings with WCC improves accuracy. Here we examine the question of cluster connectivity for clusterings computed using other SBM software or nested SBMs within graph-tool. Our study, using a wide range of real-world and synthetic networks ranging up to more than a million nodes, shows that all tested SBM clustering methods frequently produce communities that are disconnected, and that graph-tool improves on PySBM. We provide insight into why graph-tool degree-corrected SBM clustering produces disconnected clusters by examining the description length formula it uses, and explore the impact of modifications to the description length formula. Finally, we show that WCC generally provides an improvement in accuracy for both flat and nested SBMs, except for cases where nearly all nodes in the network are in very sparse ground-truth clusters. We also demonstrate that WCC scales to networks with millions of nodes.
- EC-SBMEC-SBM Synthetic Network GeneratorThe-Anh Vu-Le, Lahari Anne, George Chacko, and Tandy WarnowIn Applied Network Science, 2025
Generating high-quality synthetic networks with realistic community structure is vital to effectively evaluate community detection algorithms. In this study, we propose a new synthetic network generator called the Edge-Connected Stochastic Block Model (EC-SBM). The goal of EC-SBM is to take a given clustered real-world network and produce a synthetic network that resembles the clustered real-world network with respect to both network and community-specific criteria. In particular, we focus on simulating the internal edge connectivity of the clusters in the reference clustered network. Our extensive performance study on large real-world networks shows that EC-SBM has high accuracy in both network and community-specific criteria, and is generally more accurate than current alternative approaches for this problem. Furthermore, EC-SBM is fast enough to scale to real-world networks with millions of nodes.
- RECCSRECCS: Realistic Cluster Connectivity Simulator for Synthetic Network GenerationLahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, and George ChackoIn Advances in Complex Systems, 2025
The limited availability of useful ground-truth communities in real-world networks presents a challenge to evaluating and selecting a "best" community detection method for a given network or family of networks. The use of synthetic networks with planted ground-truths is one way to address this challenge. While several synthetic network generators can be used for this purpose, Stochastic Block Models (SBMs), when provided input parameters from real-world networks and clusterings, are well suited to producing networks that retain the properties of the network they are intended to model. We report, however, that SBMs can produce disconnected ground truth clusters; even under conditions where the input clusters are connected. In this study, we describe the REalistic Cluster Connectivity Simulator (RECCS), which, while retaining approximately the same quality for other network and cluster parameters, creates an SBM synthetic network and then modifies it to ensure an improved fit to cluster connectivity. We report results using parameters obtained from clustered real-world networks ranging up to 13.9 million nodes in size, and demonstrate an improvement over the unmodified use of SBMs for network generation.
2024
- RECCSSynthetic Networks That Preserve Edge ConnectivityLahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, and George ChackoIn Proceedings of 13th International Conference on Complex Networks & Their Applications, 2024
Since true communities within real-world networks are rarely known, synthetic networks with planted ground truths are valuable for evaluating the performance of community detection methods. Of the synthetic network generation tools available, Stochastic Block Models (SBMs) produce networks with ground truth clusters that well approximate input parameters from real-world networks and clusterings. However, we show that SBMs can produce disconnected ground truth clusters, even when given parameters from clusterings where all clusters are connected. Here we describe the REalistic Cluster Connectivity Simulator (RECCS), a technique that modifies an SBM synthetic network to improve the fit to a given clustered real-world network with respect to edge connectivity within clusters, while maintaining the good fit with respect to other network and cluster statistics. Using real-world networks up to 13.9 million nodes in size, we show that RECCS, applied to stochastic block models, results in synthetic networks that have a better fit to cluster edge connectivity than unmodified SBMs, while providing roughly the same quality fit for other network and clustering parameters as unmodified SBMs.
- SBM+WCCImproved Community Detection using Stochastic Block ModelsMinhyuk Park, Daniel Wang Feng, Siya Digra, The-Anh Vu-Le, George Chacko, and Tandy WarnowIn Proceedings of 13th International Conference on Complex Networks & Their Applications, 2024
Community detection approaches resolve complex networks into smaller groups (communities) that are expected to be relatively edge-dense and well-connected. The stochastic block model (SBM) is one of several approaches used to uncover community structure in graphs. In this study, we demonstrate that SBM software applied to various real-world and synthetic networks produces poorly-connected to disconnected clusters. We present simple modifications to improve the connectivity of SBM clusters, and show that the modifications improve accuracy using simulated networks.
2023
- GroundedBERTExpand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial AlignmentThe-Anh Vu-Le*, Cong-Duy Nguyen*, Thong Nguyen, Tho Quan, and Anh-Tuan LuuIn Proceedings of the 31st ACM International Conference on Multimedia, 2023
Language models have been supervised with both language-only objective and visual grounding in existing studies of visual-grounded language learning. However, due to differences in the distribution and scale of visual-grounded datasets and language corpora, the language model tends to mix up the context of the tokens that occurred in the grounded data with those that do not. As a result, during representation learning, there is a mismatch between the visual information and the contextual meaning of the sentence. To overcome this limitation, we propose GroundedBERT - a grounded language learning method that enhances the BERT representation with visually grounded information. GroundedBERT comprises two components: (i) the original BERT which captures the contextual representation of words learned from the language corpora, and (ii) a visual grounding module which captures visual information learned from visual-grounded datasets. Moreover, we employ Optimal Transport (OT), specifically its partial variant, to solve the fractional alignment problem between the two modalities. Our proposed method significantly outperforms the baseline language models on various language tasks of the GLUE and SQuAD datasets.
2022
- m-POTImproving Mini-batch Optimal Transport via Partial TransportationKhai Nguyen*, Dang Nguyen*, The-Anh Vu-Le, Tung Pham, and Nhat HoIn Proceedings of the 39th International Conference on Machine Learning, 2022
Mini-batch optimal transport (m-OT) has been widely used recently to deal with the memory issue of OT in large-scale applications. Despite their practicality, m-OT suffers from misspecified mappings, namely, mappings that are optimal on the mini-batch level but are partially wrong in the comparison with the optimal transportation plan between the original measures. Motivated by the misspecified mappings issue, we propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures, which we refer to as mini-batch partial optimal transport (m-POT). Leveraging the insight from the partial transportation, we explain the source of misspecified mappings from the m-OT and motivate why limiting the amount of transported masses among mini-batches via POT can alleviate the incorrect mappings. Finally, we carry out extensive experiments on various applications such as deep domain adaptation, partial domain adaptation, deep generative model, color transfer, and gradient flow to demonstrate the favorable performance of m-POT compared to current mini-batch methods.
2021
- SHREC21SHREC 2021: Retrieval of cultural heritage objectsIvan Sipiran, Patrick Lazo, Cristian Lopez, Milagritos Jimenez, Nihar Bagewadi, Benjamin Bustos, Hieu Dao, Shankar Gangisetty, Martin Hanik, Ngoc-Phuong Ho-Thi, Mike Holenderski, Dmitri Jarnikov, Arniel Labrada, Stefan Lengauer, Roxane Licandro, Dinh-Huan Nguyen, Thang-Long Nguyen-Ho, Luis A. Perez Rey, Bang-Dang Pham, Minh-Khoi Pham, Reinhold Preiner, Tobias Schreck, Quoc-Huy Trinh, Loek Tonnaer, Christoph Tycowicz, and The-Anh Vu-LeIn Computers & Graphics, 2021
This paper presents the methods and results of the SHREC’21 track on a dataset of cultural heritage (CH) objects. We present a dataset of 938 scanned models that have varied geometry and artistic styles. For the competition, we propose two challenges: the retrieval-by-shape challenge and the retrieval-by-culture challenge. The former aims at evaluating the ability of retrieval methods to discriminate cultural heritage objects by overall shape. The latter focuses on assessing the effectiveness of retrieving objects from the same culture. Both challenges constitute a suitable scenario to evaluate modern shape retrieval methods in a CH domain. Ten groups participated in the challenges: thirty runs were submitted for the retrieval-by-shape task, and twenty-six runs were submitted for the retrieval-by-culture task. The results show a predominance of learning methods on image-based multi-view representations to characterize 3D objects. Nevertheless, the problem presented in our challenges is far from being solved. We also identify the potential paths for further improvements and give insights into the future directions of research.
2020
- SHREC20SHREC 2020 Track: Extended Monocular Image Based 3D Model RetrievalWenhui Li, Dan Song, Anan Liu, Weizhi Nie, Ting Zhang, Xiaoqian Zhao, Mingsheng Ma, Yuqian Li, Heyu Zhou, Beibei Zhang, Shengjie Le, Dandan Wang, Tongwei Ren, Gangshan Wu, The-Anh Vu-Le, Xuan-Nhat Hoang, E-Ro Nguyen, Thang-Long Nguyen-Ho, Hai-Dang Nguyen, Trong-Le Do, and Minh-Triet TranIn Proceedings of the 13th Eurographics Workshop on 3D Object Retrieval, 2020
Monocular image based 3D object retrieval has attracted more and more attentions in the field of 3D object retrieval. However, the research of 3D object retrieval based on 2D image is still challenging, mainly because of the gap between data from different modalities. To further support this research, we extend the previous track SHREC19’MI3DOR to organize this track, and we construct the expanded monocular image based 3D object retrieval benchmark. Compared with SHREC19’MI3DOR, this benchmark adds 19 categories for both 2D images and 3D models to the original 21 categories, taking into account the lack of categories for practical applications. Two groups participated, proposed three kinds of supervised methods and submitted 20 runs in total, and 7 commonly-used criteria are used to evaluate the retrieval performance. The results show that supervised methods still achieve satisfying retrieval results (Best NN is 96.7% for 40 categories), which are comparable to the results of SHREC19’MI3DOR. In the future, unsupervised methods are encouraged to discover in monocular image based 3D model retrieval.
2019
- SHREC19SHREC 2019 - Monocular Image Based 3D Model RetrievalWenhui Li, Anan Liu, Weizhi Nie, Dan Song, Yuqian Li, Weijie Wang, Shu Xiang, Heyu Zhou, Ngoc-Minh Bui, Yunchi Cen, Zenian Chen, Huy-Hoang Chung-Nguyen, Gia-Han Diep, Trong-Le Do, Eugeni L. Doubrovski, Anh-Duc Duong, Jo M. P. Geraedts, Haobin Guo, Trung-Hieu Hoang, Yichen Li, Xing Liu, Zishun Liu, Duc-Tuan Luu, Yunsheng Ma, Vinh-Tiep Nguyen, Jie Nie, Tongwei Ren, Mai-Khiem Tran, Son-Thanh Tran-Nguyen, Minh-Triet Tran, The-Anh Vu-Le, Charlie C. L. Wang, Shijie Wang, Gangshan Wu, Caifei Yang, Meng Yuan, Hao Zhai, Ao Zhang, Fan Zhang, and Sicheng ZhaoIn Proceedings of the 12th Eurographics Workshop on 3D Object Retrieval, 2019
Monocular image based 3D object retrieval is a novel and challenging research topic in the field of 3D object retrieval. Given a RGB image captured in real world, it aims to search for relevant 3D objects from a dataset. To advance this promising research, we organize this SHREC track and build the first monocular image based 3D object retrieval benchmark by collecting 2D images from ImageNet and 3D objects from popular 3D datasets such as NTU, PSB, ModelNet40 and ShapeNet. The benchmark contains classified 21,000 2D images and 7,690 3D objects of 21 categories. This track attracted 9 groups from 4 countries and the submission of 20 runs. To have a comprehensive comparison, 7 commonly-used retrieval performance metrics have been used to evaluate their retrieval performance. The evaluation results show that the supervised cross domain learning get the superior retrieval performance (Best NN is 97.4 %) by bridging the domain gap with label information. However, there is still a big challenge for unsupervised cross domain learning (Best NN is 61.2%), which is more practical for the real application. Although we provided both view images and OBJ file for each 3D model, all the participants use the view images to represent the 3D model. One of the interesting work in the future is directly using the 3D information and 2D RGB information to solve the task of monocular Image based 3D model retrieval.