publications | The-Anh Vu-Le (Nah)

2025

SASCA-s

Very Large Scale Simulations of Network Growth with the Scalable Agent-Based Simulator for Citation Analysis with Sampling (SASCA-s)

Minhyuk Park, Joǎo AC Lamy, Esther CC Rodrigues, Felipe M Ferreira, The-Anh Vu-Le, Tandy Warnow, and George Chacko

In Proceedings of 14th International Conference on Complex Networks & Their Applications, 2025

Abs PDF Code 2

Modeling the growth of citation networks is challenging since existing theories of citation are not easy to capture quantitatively and the complex social interactions underlying citation behavior are not well captured by narrowly specified mathematical models. In this respect, agent-based models (ABM) leveraging randomness offer a complementary option. We have previously designed an ABM, implemented in Python, in which agents make citations through a combination of preferential attachment, recency, and fitness. A limitation of this ABM is that it does not scale much beyond networks of a million nodes. We have since developed the Scalable Agent-based Simulator for Citation Analysis with sampling (SASCA-s). Written in C++, SASCA-s uses a refined citation model and scales to over 140 million nodes. We present results from simulations using SASCA-s.
DSC

Dense Subgraph Clustering and a New Cluster Ensemble Method

The-Anh Vu-Le, João Alfredo Cardoso Lamy, Tomás Alessi, Ian Chen, Minhyuk Park, Elfarouk Harb, George Chacko, and Tandy Warnow

In Proceedings of 14th International Conference on Complex Networks & Their Applications, 2025

Abs PDF Code 0

We propose DSC-Flow-Iter, a new community detection algorithm that is based on iterative extraction of dense subgraphs. Although DSC-Flow-Iter leaves many nodes unclustered, it is competitive with leading methods and has high-precision and low-recall, making it complementary to modularity-based methods that typically have high recall but lower precision. Based on this observation, we introduce a novel cluster ensemble technique that combines DSC-Flow-Iter with modularity-based clustering, to provide improved accuracy. We show that our proposed pipeline, which uses this ensemble technique, outperforms its individual components and improves upon the baseline techniques on a large collection of synthetic networks.
SBM+WCC

Using stochastic block models for community detection

The-Anh Vu-Le^*, Minhyuk Park^*, Ian Chen, and Tandy Warnow

In Applied Network Science, 2025

Abs PDF Code 1

A recent study reported by Park et al. (Improved community detection using stochastic block models, Springer, Heidelberg, 2025) in Complex Networks and their Applications 2024 showed that clusterings from three Stochastic Block Models (SBMs) in graph-tool, a popular software package, often had internally disconnected clusters when used on large real-world or synthetic networks. To address this issue, Park et al. (Improved community detection using stochastic block models, Springer, Heidelberg, 2025) presented a simple technique, Well-Connected Clusters (WCC), that repeatedly finds and removes small edge cuts of size at most $\log_10(n)$ in clusters, where n is the number of nodes in the cluster, and showed that treatment of graph-tool SBM clusterings with WCC improves accuracy. Here we examine the question of cluster connectivity for clusterings computed using other SBM software or nested SBMs within graph-tool. Our study, using a wide range of real-world and synthetic networks ranging up to more than a million nodes, shows that all tested SBM clustering methods frequently produce communities that are disconnected, and that graph-tool improves on PySBM. We provide insight into why graph-tool degree-corrected SBM clustering produces disconnected clusters by examining the description length formula it uses, and explore the impact of modifications to the description length formula. Finally, we show that WCC generally provides an improvement in accuracy for both flat and nested SBMs, except for cases where nearly all nodes in the network are in very sparse ground-truth clusters. We also demonstrate that WCC scales to networks with millions of nodes.
EC-SBM

EC-SBM Synthetic Network Generator

The-Anh Vu-Le, Lahari Anne, George Chacko, and Tandy Warnow

In Applied Network Science, 2025

Abs PDF Code 4

Generating high-quality synthetic networks with realistic community structure is vital to effectively evaluate community detection algorithms. In this study, we propose a new synthetic network generator called the Edge-Connected Stochastic Block Model (EC-SBM). The goal of EC-SBM is to take a given clustered real-world network and produce a synthetic network that resembles the clustered real-world network with respect to both network and community-specific criteria. In particular, we focus on simulating the internal edge connectivity of the clusters in the reference clustered network. Our extensive performance study on large real-world networks shows that EC-SBM has high accuracy in both network and community-specific criteria, and is generally more accurate than current alternative approaches for this problem. Furthermore, EC-SBM is fast enough to scale to real-world networks with millions of nodes.
RECCS

RECCS: Realistic Cluster Connectivity Simulator for Synthetic Network Generation

Lahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, and George Chacko

In Advances in Complex Systems, 2025

Abs PDF Code 5

The limited availability of useful ground-truth communities in real-world networks presents a challenge to evaluating and selecting a "best" community detection method for a given network or family of networks. The use of synthetic networks with planted ground-truths is one way to address this challenge. While several synthetic network generators can be used for this purpose, Stochastic Block Models (SBMs), when provided input parameters from real-world networks and clusterings, are well suited to producing networks that retain the properties of the network they are intended to model. We report, however, that SBMs can produce disconnected ground truth clusters; even under conditions where the input clusters are connected. In this study, we describe the REalistic Cluster Connectivity Simulator (RECCS), which, while retaining approximately the same quality for other network and cluster parameters, creates an SBM synthetic network and then modifies it to ensure an improved fit to cluster connectivity. We report results using parameters obtained from clustered real-world networks ranging up to 13.9 million nodes in size, and demonstrate an improvement over the unmodified use of SBMs for network generation.

2024

RECCS

Synthetic Networks That Preserve Edge Connectivity

Lahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, and George Chacko

In Proceedings of 13th International Conference on Complex Networks & Their Applications, 2024

Abs PDF Code 4

Since true communities within real-world networks are rarely known, synthetic networks with planted ground truths are valuable for evaluating the performance of community detection methods. Of the synthetic network generation tools available, Stochastic Block Models (SBMs) produce networks with ground truth clusters that well approximate input parameters from real-world networks and clusterings. However, we show that SBMs can produce disconnected ground truth clusters, even when given parameters from clusterings where all clusters are connected. Here we describe the REalistic Cluster Connectivity Simulator (RECCS), a technique that modifies an SBM synthetic network to improve the fit to a given clustered real-world network with respect to edge connectivity within clusters, while maintaining the good fit with respect to other network and cluster statistics. Using real-world networks up to 13.9 million nodes in size, we show that RECCS, applied to stochastic block models, results in synthetic networks that have a better fit to cluster edge connectivity than unmodified SBMs, while providing roughly the same quality fit for other network and clustering parameters as unmodified SBMs.
SBM+WCC

Improved Community Detection using Stochastic Block Models

Minhyuk Park, Daniel Wang Feng, Siya Digra, The-Anh Vu-Le, George Chacko, and Tandy Warnow

In Proceedings of 13th International Conference on Complex Networks & Their Applications, 2024

Abs PDF Code 8

Community detection approaches resolve complex networks into smaller groups (communities) that are expected to be relatively edge-dense and well-connected. The stochastic block model (SBM) is one of several approaches used to uncover community structure in graphs. In this study, we demonstrate that SBM software applied to various real-world and synthetic networks produces poorly-connected to disconnected clusters. We present simple modifications to improve the connectivity of SBM clusters, and show that the modifications improve accuracy using simulated networks.

2023

GroundedBERT

Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment

The-Anh Vu-Le^*, Cong-Duy Nguyen^*, Thong Nguyen, Tho Quan, and Anh-Tuan Luu

In Proceedings of the 31st ACM International Conference on Multimedia, 2023

Abs PDF 3

Language models have been supervised with both language-only objective and visual grounding in existing studies of visual-grounded language learning. However, due to differences in the distribution and scale of visual-grounded datasets and language corpora, the language model tends to mix up the context of the tokens that occurred in the grounded data with those that do not. As a result, during representation learning, there is a mismatch between the visual information and the contextual meaning of the sentence. To overcome this limitation, we propose GroundedBERT - a grounded language learning method that enhances the BERT representation with visually grounded information. GroundedBERT comprises two components: (i) the original BERT which captures the contextual representation of words learned from the language corpora, and (ii) a visual grounding module which captures visual information learned from visual-grounded datasets. Moreover, we employ Optimal Transport (OT), specifically its partial variant, to solve the fractional alignment problem between the two modalities. Our proposed method significantly outperforms the baseline language models on various language tasks of the GLUE and SQuAD datasets.

2022

m-POT

Improving Mini-batch Optimal Transport via Partial Transportation

Khai Nguyen^*, Dang Nguyen^*, The-Anh Vu-Le, Tung Pham, and Nhat Ho

In Proceedings of the 39th International Conference on Machine Learning, 2022

Abs PDF Code 70

Mini-batch optimal transport (m-OT) has been widely used recently to deal with the memory issue of OT in large-scale applications. Despite their practicality, m-OT suffers from misspecified mappings, namely, mappings that are optimal on the mini-batch level but are partially wrong in the comparison with the optimal transportation plan between the original measures. Motivated by the misspecified mappings issue, we propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures, which we refer to as mini-batch partial optimal transport (m-POT). Leveraging the insight from the partial transportation, we explain the source of misspecified mappings from the m-OT and motivate why limiting the amount of transported masses among mini-batches via POT can alleviate the incorrect mappings. Finally, we carry out extensive experiments on various applications such as deep domain adaptation, partial domain adaptation, deep generative model, color transfer, and gradient flow to demonstrate the favorable performance of m-POT compared to current mini-batch methods.

2021

SHREC21

SHREC 2021: Retrieval of cultural heritage objects

Ivan Sipiran, Patrick Lazo, Cristian Lopez, Milagritos Jimenez, Nihar Bagewadi, Benjamin Bustos, Hieu Dao, Shankar Gangisetty, Martin Hanik, Ngoc-Phuong Ho-Thi, Mike Holenderski, Dmitri Jarnikov, Arniel Labrada, Stefan Lengauer, Roxane Licandro, Dinh-Huan Nguyen, Thang-Long Nguyen-Ho, Luis A. Perez Rey, Bang-Dang Pham, Minh-Khoi Pham, Reinhold Preiner, Tobias Schreck, Quoc-Huy Trinh, Loek Tonnaer, Christoph Tycowicz, and The-Anh Vu-Le

In Computers & Graphics, 2021

Abs PDF Code 26

This paper presents the methods and results of the SHREC’21 track on a dataset of cultural heritage (CH) objects. We present a dataset of 938 scanned models that have varied geometry and artistic styles. For the competition, we propose two challenges: the retrieval-by-shape challenge and the retrieval-by-culture challenge. The former aims at evaluating the ability of retrieval methods to discriminate cultural heritage objects by overall shape. The latter focuses on assessing the effectiveness of retrieving objects from the same culture. Both challenges constitute a suitable scenario to evaluate modern shape retrieval methods in a CH domain. Ten groups participated in the challenges: thirty runs were submitted for the retrieval-by-shape task, and twenty-six runs were submitted for the retrieval-by-culture task. The results show a predominance of learning methods on image-based multi-view representations to characterize 3D objects. Nevertheless, the problem presented in our challenges is far from being solved. We also identify the potential paths for further improvements and give insights into the future directions of research.

2020

SHREC20

SHREC 2020 Track: Extended Monocular Image Based 3D Model Retrieval

Wenhui Li, Dan Song, Anan Liu, Weizhi Nie, Ting Zhang, Xiaoqian Zhao, Mingsheng Ma, Yuqian Li, Heyu Zhou, Beibei Zhang, Shengjie Le, Dandan Wang, Tongwei Ren, Gangshan Wu, The-Anh Vu-Le, Xuan-Nhat Hoang, E-Ro Nguyen, Thang-Long Nguyen-Ho, Hai-Dang Nguyen, Trong-Le Do, and Minh-Triet Tran

In Proceedings of the 13th Eurographics Workshop on 3D Object Retrieval, 2020

Abs PDF 8

Monocular image based 3D object retrieval has attracted more and more attentions in the field of 3D object retrieval. However, the research of 3D object retrieval based on 2D image is still challenging, mainly because of the gap between data from different modalities. To further support this research, we extend the previous track SHREC19’MI3DOR to organize this track, and we construct the expanded monocular image based 3D object retrieval benchmark. Compared with SHREC19’MI3DOR, this benchmark adds 19 categories for both 2D images and 3D models to the original 21 categories, taking into account the lack of categories for practical applications. Two groups participated, proposed three kinds of supervised methods and submitted 20 runs in total, and 7 commonly-used criteria are used to evaluate the retrieval performance. The results show that supervised methods still achieve satisfying retrieval results (Best NN is 96.7% for 40 categories), which are comparable to the results of SHREC19’MI3DOR. In the future, unsupervised methods are encouraged to discover in monocular image based 3D model retrieval.
iTASK

iTASK-Intelligent traffic analysis software kit

Minh-Triet Tran, Tam V. Nguyen, Trung-Hieu Hoang, Trung-Nghia Le, Khac-Tuan Nguyen, Dat-Thanh Dinh, Thanh-An Nguyen, Hai-Dang Nguyen, Trong-Tung Nguyen, Xuan-Nhat Hoang, Viet-Khoa Vo-Ho, Trong-Le Do, Lam Nguyen, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Trong-Thang Pham, Xuan-Vy Nguyen, E-Ro Nguyen, Quoc-Cuong Tran, Hung Tran, Hieu Dao, Mai-Khiem Tran, Quang-Thuc Nguyen, The-Anh Vu-Le, Tien-Phat Nguyen, Gia-Han Diep, and Minh N. Do

In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020

Abs PDF Code 13

Traffic flow analysis is essential for intelligent transportation systems. In this paper, we introduce our Intelligent Traffic Analysis Software Kit (iTASK) to tackle three challenging problems: vehicle flow counting, vehicle re-identification, and abnormal event detection. For the first problem, we propose to real-time track vehicles moving along the desired direction in corresponding motion-of-interests (MOIs). For the second problem, we consider each vehicle as a document with multiple semantic words (i.e., vehicle attributes) and transform the given problem to classical document retrieval. For the last problem, we propose to forward and backward refine anomaly detection using GAN-based future prediction and backward tracking completely stalled vehicle or sudden-change direction, respectively. Experiments on the datasets of traffic flow analysis from AI City Challenge 2020 show our competitive results, namely, S1 score of 0.8297 for vehicle flow counting in Track 1, mAP score of 0.3882 for vehicle re-identification in Track 2, and S4 score of 0.9059 for anomaly detection in Track 4. All data and source code are publicly available on our project page.

2019

SHREC19

SHREC 2019 - Monocular Image Based 3D Model Retrieval

Wenhui Li, Anan Liu, Weizhi Nie, Dan Song, Yuqian Li, Weijie Wang, Shu Xiang, Heyu Zhou, Ngoc-Minh Bui, Yunchi Cen, Zenian Chen, Huy-Hoang Chung-Nguyen, Gia-Han Diep, Trong-Le Do, Eugeni L. Doubrovski, Anh-Duc Duong, Jo M. P. Geraedts, Haobin Guo, Trung-Hieu Hoang, Yichen Li, Xing Liu, Zishun Liu, Duc-Tuan Luu, Yunsheng Ma, Vinh-Tiep Nguyen, Jie Nie, Tongwei Ren, Mai-Khiem Tran, Son-Thanh Tran-Nguyen, Minh-Triet Tran, The-Anh Vu-Le, Charlie C. L. Wang, Shijie Wang, Gangshan Wu, Caifei Yang, Meng Yuan, Hao Zhai, Ao Zhang, Fan Zhang, and Sicheng Zhao

In Proceedings of the 12th Eurographics Workshop on 3D Object Retrieval, 2019

Abs PDF 38

Monocular image based 3D object retrieval is a novel and challenging research topic in the field of 3D object retrieval. Given a RGB image captured in real world, it aims to search for relevant 3D objects from a dataset. To advance this promising research, we organize this SHREC track and build the first monocular image based 3D object retrieval benchmark by collecting 2D images from ImageNet and 3D objects from popular 3D datasets such as NTU, PSB, ModelNet40 and ShapeNet. The benchmark contains classified 21,000 2D images and 7,690 3D objects of 21 categories. This track attracted 9 groups from 4 countries and the submission of 20 runs. To have a comprehensive comparison, 7 commonly-used retrieval performance metrics have been used to evaluate their retrieval performance. The evaluation results show that the supervised cross domain learning get the superior retrieval performance (Best NN is 97.4 %) by bridging the domain gap with label information. However, there is still a big challenge for unsupervised cross domain learning (Best NN is 61.2%), which is more practical for the real application. Although we provided both view images and OBJ file for each 3D model, all the participants use the view images to represent the 3D model. One of the interesting work in the future is directly using the 3D information and 2D RGB information to solve the task of monocular Image based 3D model retrieval.