project ยท network-generation

Synthetic Network Generators

A gallery of generators walked stage by stage on the same small input. A community detector cannot be stress-tested without synthetics whose ground truth is known: each generator here takes a reference network and its community structure, and produces fresh draws keyed to that ground truth.

Given a network \(G\) (undirected, unweighted, simple) with a community structure \(\mathcal{C}\) on its nodes, we want to sample a family of networks that are statistically similar to \(G\) and \(\mathcal{C}\) without being identical. \(\mathcal{C}\) can be ground-truth or produced by a detection algorithm; the generator does not care which. What "statistically similar" means differs from one generator to the next: each freezes a different summary of \(G\) and \(\mathcal{C}\) and randomises the rest.

Scoring a community-detection algorithm requires networks whose answer can be checked. An empirical network gives only one labelling, which may itself be wrong, and a single network is a sample of one. Synthetics fix that: each generator emits a network and its ground-truth clustering together, across as many seeds, sizes, and noise levels as the test demands.

Each page below walks one generator from input to output on the same 20-node example, keeps one interactive widget per stage, and closes with a plain note on what holds and what drifts.

Two labels recur across the pages: a node is either inside a non-trivial community or standing alone, and generators treat the two cases differently.

clustered: node that belongs to a community of size ≥ 2.
outlier: node that is unclustered, or the sole member of a singleton community.

Every generator runs the same three-stage pipeline: profile the input to extract the parameters the generator needs, generate a fresh graph from those parameters, then post-process the output. Post-processing covers up to three steps. An optional match-degree rewire aligns the generated degree sequence to the input (see degree matchers). A simplify pass collapses parallel edges and drops self-loops. A singleton drop removes size-1 clusters and relabels the lone node as an outlier.

the example
20 nodes · 40 edges · 3 clusters
node breakdown
  • 18 clustered
  • 2 outliers
edge breakdown
  • 27 intra-cluster
  • 8 inter-cluster
  • 4 clustered-outlier
  • 1 outlier-outlier
cluster breakdown
C1 (8) C2 (6) C3 (4) outliers (2)
hover for stats

Block-model generators

Central objects: the cluster assignment \(\mathcal{C}\), the block-to-block edge-count matrix, and the per-node degree sequence (every variant here uses the degree-corrected SBM). Variants add further structural constraints (per-cluster edge connectivity) on top.

SBM stochastic block model
paper · code
The degree-corrected variant of the classic SBM. Keeps the cluster assignment, the block-to-block edge counts, and the per-node degree sequence; the rest is random. The baseline the other block-model variants build on.
EC-SBMv1 edge-connected SBM, v1
paper · code
SBM with an edge-connectivity guarantee: a hand-built \(k\)-edge-connected core is stitched into each cluster before sampling, so each output cluster ends up at least as edge-connected as its input counterpart. Outliers are treated as singleton clusters, which replicates the edges between them; a naive greedy matcher fills any residual degree deficit.
EC-SBMv2 edge-connected SBM, v2
unpublished · code
Same edge-connectivity guarantee as v1, with two changes. Treating outliers as singleton clusters can accidentally regenerate community-like structure among them; v2 folds all outliers into a single block instead. Residual accounting is simpler too: one SBM call, a block-preserving rewire, and a true_greedy degree-matching pass.
EC-SBMv3 edge-connected SBM, v3
unpublished · code
Same downstream stages as v2 (residual SBM + block-preserving rewire + block-aware matcher). Stage 2 swaps the rigid \(K_{k+1}\) seed for a per-cluster PSO build on a hyperbolic disk, with a per-cluster temperature search that drives the simulated clustering coefficient toward each input cluster's empirical target.

Mixing-parameter generators

Central object: a single scalar setting the fraction of edges that cross cluster boundaries. Variants differ in what drives the degrees and cluster sizes (resampled from a power law, or taken as given).

LFR Lancichinetti–Fortunato–Radicchi benchmark
paper · code
Fits power laws to the degree distribution and the cluster sizes, resamples both from scratch, and discards the original. Mixing is the mean of a per-node fraction \(\mu\). The long-standing community-detection benchmark.
ABCD artificial benchmark for community detection
paper · code
A faster, more tractable alternative to LFR. Takes the degree sequence and cluster sizes as given; a single global mixing parameter \(\xi\) sets the proportion of edges that cross cluster boundaries.
ABCD+o ABCD with outliers
paper · code
ABCD with an explicit outlier block. The sampler picks the outlier set at random; outlier-outlier edges are unlikely but not forbidden.

Geometric generators

Nodes placed in a latent geometric space; edges drawn from proximity. The only family here where clustering coefficient is a design goal rather than a side effect.

nPSO nonuniform popularity-similarity optimisation
paper · code
Embeds nodes on a hyperbolic disk and connects pairs that lie close in the geometry; a temperature knob trades clustering coefficient against randomness.