LFR
Lancichinetti–Fortunato–Radicchi benchmark
Fit one power law to the degree distribution, another to the cluster sizes. Resample both from scratch. Configuration model per cluster + global rewire to hit the per-node mixing parameter \(\mu\). The long-standing community-detection benchmark.
LFR's design choice is the opposite of ABCD's. ABCD takes your degree sequence and cluster sizes verbatim; LFR refuses to. It assumes the shapes are power laws and distils your input to two scalar exponents plus a handful of range parameters.
Stage 2 resamples new degrees from a truncated power law with the fitted exponent and range, and new cluster sizes from a second power law under LFR's size-floor constraints. The output graph shares statistics with the input but not identity: node 5 in the output is not node 5 in the input; cluster 3 is not cluster 3.
Mixing is controlled by a scalar \(\mu\) per node: LFR tries to arrange each node's neighbours so \(\mu_i = d^{\text{out}}_i / d_i \approx \mu\). A several-hundred-rewire loop reshuffles edge endpoints (never nodes) until each node's internal/external split is within a small tolerance of \(\mu\). The paper's Fig. 3 shows the resulting \(\mu_i\) distribution as bell-shaped around \(\mu\), not a delta.
LFR targets graphs with controlled degree + size distributions, sweepable across \(\mu\). ABCD fits when the input itself is the target. The two models share a configuration-model-plus-rewire backbone but differ on what counts as input.
Shared input graph: 18 clustered + 2 singletons.
The 20-node synthetic LFR profiles to derive its scalars: C1 (8), C2 (6), C3 (4), plus outliers 19 + 20 each in their own 1-node cluster. Under the wrapper's outlier_mode=singleton, the cluster-size list reads [8, 6, 4, 1, 1]; the size-1 entries are below LFR's hard \(c_{\min} = 3\) floor and will not survive the resample.
Extract the shape; throw away the identity.
Profile writes degree.csv (per-node integer degrees) the same way ABCD does, but LFR's stage 2 reads only four derived scalars from it: \(N\), \(\langle k \rangle\), \(k_{\max}\), and the power-law exponent \(t_1\) from powerlaw.Fit with auto-selected \(x_{\min}\). The sequence itself is discarded.
On the synthetic the fit is coarse because only 20 points feed it; the curve below shows the degree histogram and the fitted \(k^{-t_1}\) line. On dnc the fit lands at \(t_1 \approx 1.45\) with auto \(x_{\min} = 1\), which is a poor fit by the strict-KS test but the best the powerlaw package can do.
A second power law, with a size floor.
Profile writes cluster_sizes.csv the same way ABCD does. Stage 2 derives \(c_{\min}, c_{\max}\), and the exponent \(t_2\) fitted with xmin = max(min(cluster_sizes), 3). The floor of 3 is a hard LFR requirement: the C++ binary refuses to build clusters smaller than 3. If the input has size-1 or size-2 clusters (such as the wrapper's singleton outliers), the floor absorbs them; the fitted exponent is estimated only from sizes \(\ge 3\).
On the synthetic: cluster_sizes = [8, 6, 4, 1, 1] → \(c_{\min} = 3\) (floor applies), \(c_{\max} = 8\), \(t_2 \approx 2.23\). On dnc: 442 entries ending in 355 singletons → \(c_{\min} = 3\), \(c_{\max} = 52\), \(t_2 \approx 2.01\).
dnc appear in red; they are below the fit's \(x_{\min}\) floor and do not influence \(t_2\). The fit line starts at \(c_{\min}\).One number: \(\mu\) (node-mean, not stub-weighted).
LFR writes mixing_parameter.txt, a single float. Unlike ABCD's stub-weighted \(\xi\), LFR uses the mean of per-node ratios:
Each node contributes equally regardless of degree. A degree-1 node with one inter-cluster edge contributes \(1/1 = 1\); a degree-1000 hub with 500 inter-cluster edges contributes \(500/1000 = 0.5\). On skewed degrees the two conventions drift by a few percent. On dnc LFR's \(\mu = 0.686\) while ABCD's \(\xi = 0.718\); they are close but not equal.
| N | · | Vertex count. |
|---|---|---|
| \(\langle k \rangle\) | · | Mean degree. Passed to the binary via -k. |
| \(k_{\max}\) | · | Maximum degree. Caps the truncated power-law sampler. |
| \(c_{\min}, c_{\max}\) | · | Size bounds, \(c_{\min}\) clipped at 3. |
| \(t_1\) | · | Degree power-law exponent from powerlaw.Fit auto \(x_{\min}\). |
| \(t_2\) | · | Cluster-size power-law exponent with \(x_{\min} = c_{\min}\). |
| \(\mu\) | · | Mean per-node out-ratio. This is the target the rewire loop chases. |
ABCD's \(\xi\) sums across stubs then divides: \(\xi = (\sum_u d^{\text{out}}_u) / (\sum_u d_u)\). LFR's \(\mu\) takes per-node ratios then averages. On symmetric or uniform-degree graphs the two match; on heavy-tail graphs like dnc they diverge by a few percent. Neither generator accepts the other's convention; do not swap them.
Draw fresh sequences from the fitted power laws.
The C++ binary samples a new degree sequence from \(P(k) \propto k^{-t_1}\) truncated to \([1, k_{\max}]\) with mean constrained to \(\langle k \rangle\), and a new cluster-size list from \(P(c) \propto c^{-t_2}\) with floor \(c_{\min}\) and sum constrained to \(N\). The upper cap \(c_{\max}\) is best-effort: when the fitted exponent is poor, the binary's truncation drifts and output sizes can exceed \(c_{\max}\). Both samples differ from the input.
Press re-roll sequences to draw a new pair. The bars below overlay input (azure) vs resampled (plum) for each rank position. On the synthetic the resampled degree sequence tends to have a flatter tail: with only 20 points, the rejection sampler has little room to reproduce the input's heavy hubs.
Iterate until no homeless node remains.
The paper's step 4: walk the degree sequence and place each node in a random community whose remaining slot count exceeds the node's internal degree \(d^{\text{int}} = (1-\mu) d\). If the binary fails to place a node within \(3N\) retries, it calls change_community_size to reshuffle the cluster-size sequence and restarts the assignment from scratch. Most realistic parameter settings converge on the first attempt.
Step through the iterations below. Home position is random each time; re-roll draws a new permutation.
Configuration model to seed, targeted rewire to match \(\mu_i\).
With assignment fixed, the binary builds the edge set in two halves: a per-cluster configuration model on the internal-degree sub-sequence in each cluster, then a cross-cluster configuration model on the leftover external stubs. After the seed, a mate-rewire phase walks every same-cluster external edge and swaps an endpoint with a random outside node, dragging each node's \(\mu_i\) toward the target \(\mu\). Every swap preserves node degrees, only re-pairing endpoints.
The walker below decomposes the seed phase: each step pairs two stubs uniformly at random; collisions (self-loops, parallels) flag red-dashed and feed a 2-opt cleanup pass that swaps endpoints with a random valid edge. Highlighted nodes mark the up to 4 endpoints touched by each swap; everything else dims. The distribution of \(\mu_i\) at the end is bell-shaped around \(\mu\), not a delta: small-degree nodes have few possible ratios (degree-1 only 0 or 1; degree-2 only 0, 0.5, 1) so they cluster at discrete points. The bars below the walker plot the \(\mu_i\) distribution from a finalised sample; the slider moves the target \(\mu\), re-roll resamples the edge generation.
Before the mate-rewire kicks in, each cluster runs its own configuration-model build via buildSubgraph: a deterministic multimap seed places intra-cluster edges first, then 10 randomization passes 4-way-swap the local adjacency, and finally a multi-edge rewire breaks any duplicates that surface when each cluster is merged into the global edge set. The walker below steps through every Phase-2 swap and every Phase-3 multi-edge rewire across all clusters in cluster order. Each step boldfaces the up to 4 endpoints touched, dims everything else, then animates the cut + place.
The mate-rewire phase is what turns LFR's μ from a target into the bell-shaped distribution above. After per-cluster internal edges are placed, the binary builds a global external-edge config model and runs a phase-3 loop: for any same-cluster external edge, pick a random outside node, do a 4-way swap that pulls one endpoint out of the cluster. The walker below steps op-by-op through that loop on the current sample. Per-cluster internal edges render as faded backdrop; mate edges still inside a cluster surface red-dashed; each step ghost-flashes the cuts before settling on the placements.
What you get on the shipped example.
Default run on dnc + sbm-flat-best+cc at --seed 1 (\(N = 906\), \(\langle k \rangle = 23.02\), \(t_1 = 1.45\), \(t_2 = 2.01\), \(c_{\min} = 3\), \(c_{\max} = 52\), \(\mu = 0.686\)):
| stat | input | lfr output | note |
|---|---|---|---|
| N | 906 | 906 | Exact match (CLI arg). |
| edges | 10,429 | 10,370 | Off by 0.6%. LFR's rewire + resample lose only a handful of edges. |
| mean degree | 23.02 | 22.89 | Close but not exact: the truncated power-law sampler hits the mean in expectation only. |
| max degree | 368 | <368 | The input's heavy tail (368, 312, 256, 225, 202, ...) is often cut short: sampling from \(k^{-1.45}\) with \(k_{\max} = 368\) rarely draws the extreme quantiles without many tries. |
| \(\mu\) | 0.686 | ~0.686 | Hit via the rewire loop within the paper's tolerance; small per-node drift. |
| global cc | 0.548 | 0.252 | Not targeted. |
| clusters | 87 | 10 | LFR resamples cluster sizes from \(c^{-2.01}\) with floor 3 summing to 906. With \(t_2 = 2.01\) the sampler concentrates mass on the tail, so fewer larger clusters cover \(N\); 10 clusters for dnc is typical. |
| com.csv rows | 551 | 906 | Every node gets a cluster in LFR (no singletons to strip). Clustered count shoots up to all of \(N\) because the floor-3 constraint disallows 1-node clusters. |
LFR's output uses node IDs 1 through \(N\). Those IDs do not correspond to the input's node IDs, nor do the cluster IDs correspond. If a downstream tool asks "is the output's cluster 7 the same as the input's cluster 7", the answer is no: LFR does not see the input's clustering, only its size distribution. Anything keyed on identity needs ABCD or SBM instead.
On dnc, input has 87 real clusters (plus 355 singletons the wrapper counts separately); LFR output has 10 clusters. This is not a bug: LFR samples \(c^{-t_2}\) with floor \(c_{\min}\) and sum fixed to \(N\). With \(t_2\) near 2, the sampler draws a handful of large clusters that sum to 906 quickly, leaving no room for the many small clusters the input had. The fitted \(c_{\max} = 52\) does not constrain the output here: empirically every output cluster on dnc lands in [67, 116], so the upper bound is best-effort, not enforced. LFR is the wrong tool when a specific cluster count is required.
The wrapper's outlier_mode=singleton makes every unclustered input node a size-1 pseudo-cluster. LFR's floor \(c_{\min} = 3\) refuses them: their sizes don't survive the fit, and the output has no singletons at all. Outlier semantics are incompatible with LFR; they resolve as "every node is clustered" on LFR output.