EC-SBM v3
Edge-connected SBM, v3
Same per-cluster \(k\)-edge-connectivity floor as v1 + v2. Stage 2 swaps the \(K_{k+1}\) seed + residual-degree-weighted attach for a per-cluster PSO build on a hyperbolic disk; a 1-D temperature search drives each cluster's induced clustering coefficient toward a profile-time target. Stages 3 and 4 are inherited verbatim from v2.
v2 gets clean residual accounting and a block-aware top-up, but its stage-2 core is rigid: a \(K_{k+1}\) clique on the top-\((k+1)\) hubs of each cluster, then a residual-degree-weighted greedy attach for the rest. The shape of the core is fixed by \(k(C)\) alone โ clusters with the same \(k\) get the same triangle density, regardless of how the empirical cluster looks.
v3 replaces stage 2 with PSO. Each cluster gets its own hyperbolic-disk build with attachment count \(m = \min(\max(k, m_\text{floor}), n - 1)\): the first \(m+1\) arrivals connect to all predecessors (a \(K_{m+1}\) seed, which is at least as edge-connected as the \(k\)-floor demands), and every later arrival picks \(m\) partners from a temperature-modulated hyperbolic-distance distribution. Lower \(T\) concentrates partner picks on the closest predecessors, which raises the cluster's clustering coefficient; higher \(T\) flattens the distribution toward random.
The temperature is not free. Profile time computes a per-cluster target \(\hat c(C)\) โ the global clustering coefficient of the empirical intra-cluster induced subgraph. Stage 2 then runs a 1-D bisection-and-secant search over \(T\) for each cluster, evaluating PSO at each candidate \(T\) and keeping the run whose simulated cc is closest to \(\hat c(C)\). The per-cluster output ends up matching the input's local triangle density, not just its mincut.
Stages 3 (residual SBM over all blocks + block-preserving 2-opt rewire) and 4 (block-aware degree matcher with per-block-pair budget) are unchanged from v2. The per-cluster edge-connectivity floor still holds: \(m \ge k\) so the \(K_{m+1}\) seed plus \(m\) attachments per arrival is at least \(k\)-edge-connected by construction.
Shared input graph: 18 clustered + 2 singletons.
The 20-node synthetic this page builds on, drawn under the shared singleton view: C1 (8), C2 (6), C3 (4), plus outliers 19 + 20 each in their own 1-node cluster. EC-SBM v2 then forces outlier_mode=excluded in the next stage, dropping outliers from the profile-time partition; stage 3 folds them back as a combined block. This view is the universal baseline before either move.
Per-node block label. Outliers excluded now, folded back at stage 3.
Profile writes the same seven CSVs as v1: node_id, cluster_id, assignment, degree, edge_counts, mincut, and com. v2 also forces outlier_mode=excluded, so profile's tables cover only the clustered sub-network. The difference shows up later: at stage 3, v2 re-reads the empirical edgelist and folds every outlier into one combined pseudo-block, while v1 creates one singleton block per outlier.
outlier_mode=excluded · 18 nodes across three clusters
assignment.csv
A size-1 cluster has no internal edges (no partner inside), so giving each outlier its own block adds rows and columns to the block matrix without filling any diagonal. Under combined, every outlier shares one block; the outlier-outlier diagonal cell captures the real edges they have among themselves, and clustered-outlier cells capture the cross edges. One block means one residual-SBM cell per relation, which is all v2's stage 3 needs.
Per-node target degree, post-exclusion.
Same as v1: degree.csv carries \(k_u\) counted on the post-exclusion graph. The constructive phase uses these numbers; stage 3 then bumps per-node out_degs by each node's outlier-incident count after re-reading the original edgelist.
Edge count per block pair. 3x3 now, 4x4 at stage 3.
Same half-edge convention as v1 / SBM:
\[ e_{rr} \;=\; 2 \cdot \bigl| \{\, \{u, v\} \in E \,:\, b_u = b_v = r \,\} \bigr|. \]At profile time v2's matrix is the same compact 3x3 as v1 (outliers excluded). Stage 3 then rebuilds a 4x4 by re-reading the original edgelist and adding the combined outlier block on the fourth row/column.
One new scalar per cluster: \(k(C)\).
Same profile addition as v1. mincut.csv records the minimum edge cut of each cluster's induced subgraph in the input, measured via pymincut's Nagamochi-Ono-Ibaraki algorithm with a bucket-queue heap. Singleton clusters get \(k = 0\).
Each cluster's mincut on this synthetic, highlighted in red-dashed on the graph below. C1 has a non-trivial internal cut: two subsets of 4 nodes joined by a thin bridge of 2 edges. C2 and C3 isolate a single low-degree vertex.
| cluster | nodes | min deg in C | \(k(C)\) | one minimum cut |
|---|---|---|---|---|
| C1 | 8 | 3 | 2 | {(1, 5), (4, 8)}, splitting {1,2,3,4} from {5,6,7,8}. |
| C2 | 6 | 2 | 2 | {(9, 13), (12, 13)}. |
| C3 | 4 | 1 | 1 | {(16, 18)}. |
One new scalar per cluster: \(\hat c(C)\), the v3 stage-2 target.
v3's profile asks one extra question of every cluster: what is the global clustering coefficient of its intra-cluster induced subgraph? The pso method's stage-2 T-search drives the simulated cc toward this target, one cluster at a time.
\[ \hat c(C) \;=\; \frac{3 \cdot |\Delta(G[C])|}{|\Lambda(G[C])|}, \]where \(|\Delta(G[C])|\) is the triangle count and \(|\Lambda(G[C])|\) is the number of connected triplets (length-2 paths) of the induced subgraph \(G[C]\). The profile artifact is cluster_ccoeff.csv, one float per cluster iid, only emitted under --method pso.
| cluster | nodes | triangles | triplets | \(\hat c(C)\) |
|---|
One PSO build per cluster. T tuned to match \(\hat c(C)\).
Stage 2 dispatches one independent PSO build per cluster. Each cluster's nodes are sorted by descending profile-time degree (id-asc tiebreak), giving each node a 1-based arrival rank that fixes its radial coordinate \(r_i = 2\log i\). Angular coordinates are drawn iid from \(\mathrm{Uniform}[0, 2\pi)\). Attachment count is \(m = \min(\max(k, m_\text{floor}), n - 1)\) with default \(m_\text{floor} = 1\), so \(m \ge k\) โ the per-cluster mincut floor still holds.
The first \(m+1\) arrivals connect to all predecessors (a \(K_{m+1}\) clique seed). Each later arrival \(t\) computes hyperbolic distances \(d(t, j)\) to every \(j < t\) and picks \(m\) partners. At \(T = 0\) the picks are the \(m\) closest predecessors (deterministic). At \(T > 0\) the picks are an Efraimidis-Spirakis weighted draw from
\[ p_{tj} \;=\; \frac{1}{1 + \exp\bigl((d(t,j) - R_t) / 2T\bigr)}, \]where \(R_t\) is the temperature-dependent connection radius from the canonical PSO formula. Lower \(T\) concentrates picks on the closest predecessors (high cc); higher \(T\) flattens the distribution toward random (low cc).
Around the PSO build, stage 2 runs a 1-D bisection-and-secant search over \(T \in [t_\text{min}, t_\text{max}]\). Each iteration evaluates PSO at one \(T\), measures the cluster's induced global cc, and shrinks the bracket using the sign of \(c - \hat c(C)\). The walker below steps through every search iteration across every cluster; the chosen \(T\) per cluster is whichever evaluated PSO realisation came closest to \(\hat c(C)\).
The \(K_{m+1}\) seed plus \(m\)-attachments-per-arrival construction is at least \(m\)-edge-connected: every later arrival contributes \(m\) new incident edges, and \(m \ge k\). The cluster's induced subgraph stays at least as edge-connected as the empirical \(k(C)\), so v3 inherits v1's mincut proof under a different family of cores.
Compute what the SBM still owes each block.
Stage 3 is unchanged from v2. Before the SBM call, prepare_sbm_inputs re-reads the original edgelist and rebuilds out_degs + probs on all four blocks (three real clusters + one combined outlier block). It subtracts the stage-2 PSO contribution from out_degs and populates probs inter-cells from the original edge list. Finally it reconciles each diagonal against the block's residual stub budget:
\(D_k\) is "residual half-edges block \(k\) still owes", \(E^{\text{inter}}_k\) is "inter-block half-edges block \(k\) is already committed to". The intra-block budget the SBM still needs to sample is
\[ \text{diff}_k \;=\; D_k \;-\; E^{\text{inter}}_k. \]If \(\text{diff}_k \ge 0\), set \(\texttt{probs}[k, k] = \text{diff}_k\) (bump by 1 if odd, to satisfy graph-tool's even-half-edges-per-block rule, and add the bump back to one node's out_degs). If \(\text{diff}_k < 0\), the construction already over-spent the block's out-budget; add \(|\text{diff}_k|\) back to out_degs and set \(\texttt{probs}[k, k] = 0\) (the SBM can only add, never subtract).
On the synthetic, after the live stage-2 PSO build, the residual looks like (numbers reflect whatever PSO realisation the search settled on; the diffs depend on each cluster's stage-2 spend):
Wherever a cluster's PSO build ended up placing more intra edges than the empirical count (a lower-T realisation will tend to do this), \(\text{diff}_k\) goes to zero and the over-spent stubs are added back to out_degs. The residual SBM still sees a proper bipartite-ish problem; v3 just hands it a different stage-2 budget than v2 would.
PSO edges belong to stage 2. Residual intra + inter come from stage 3's SBM. Match-degree covers whatever the rewire cannot place. No cell is sampled twice, and nothing is double-counted because probs was decremented by exactly what stage 2 placed.
One SBM call. Block-preserving swaps repair invalid edges without losing the block count.
With the residual probs, out_degs, and the combined-outlier block assignment ready, v2 calls gt.generate_sbm(b, probs, out_degs, micro_ers=True, micro_degs=True) once. The sampler can still emit self-loops (two of a node's stubs pair together) and parallel edges (the same pair drawn twice). v1 dedupped both and accepted the degree loss. v2 tries to fix each one via a block-preserving 2-opt swap.
The swap: bucket every valid edge by its unordered block-pair \((A, B)\); for each invalid edge, pick a random valid edge in the same bucket and swap endpoints so each node stays in its home block. That preserves both the block-pair count and every node's block membership. Up to 10 retry passes with a stagnation detector. Edges still invalid after that are dropped with a WARN.
Step through; hit random to resample stage 3.
If the invalid edge \((u, v)\) has block-pair \((A, B)\) with \(A \neq B\), the partner edge \((x, y)\) must also sit in bucket \((A, B)\). Swap the \(A\)-side of each edge with the \(B\)-side of the other: \((u_A, x_B)\) and \((x_A, u_B)\). Both nodes stay in their home block, so the block-pair count is invariant. For intra-block \((A = A)\) there are two valid swaps; the code flips a coin.
Block-aware top-up by default. Logs gridlock; does not silently drop.
After the residual SBM + rewire, some nodes may still be short of their target. v2 hands the leftovers to a block-aware top-up that spends edges out of a per-block-pair budget read from the reference clustering, rather than letting stubs leak across cluster boundaries. Anything still unplaced is logged, never silently dropped the way v1's plain greedy does. The viz below runs the simpler global sibling so the heap mechanics stay readable; see the degree matcher page for the full menu.
Toggle apply match_degree to watch the per-node deficit pile shrink. Click random for a different realization.
Same downstream stages, different stage-2 family.
v3's pipeline diverges from v2 only in stage 2 (and the matching profile artifact). Everything that follows โ residual SBM, block-preserving rewire, block-aware degree matcher โ is the same code under the same flags. What changes is the stage-2 budget v3 hands stage 3: PSO can place more or fewer intra-cluster edges than v2's \(K_{k+1}\) + greedy attach, depending on the chosen \(T\) and on each cluster's empirical cc target.
The cluster-level effect is that v3's intra-cluster triangle distribution tracks the input's, while v2's is fixed by \(k(C)\) alone. The block-pair-level effect on the residual-SBM is harder to read: v3 may underspend or overspend each diagonal relative to v2, and the resulting \(\text{diff}_k\) can flip sign per block. The downstream rewire + matcher then absorb whatever stage 2 left on the table.
The T-search drives each cluster's induced cc toward \(\hat c(C)\), but the final output's global cc still depends on the residual SBM and the matcher's edge placements. Stage 2 hits its cluster-by-cluster target; stages 3 + 4 then add edges that influence the global statistic without re-checking it.