EC-SBM v2
Edge-connected SBM, v2
Same per-cluster \(k\)-edge-connectivity floor as v1. Constructive-only core, one residual SBM over all blocks (including a combined outlier block), a block-preserving 2-opt rewire, and a block-aware degree matcher that logs gridlock instead of silently dropping stubs and gates every accepted edge on a per-block-pair budget.
v1 gets the right cluster guarantees, but the bookkeeping is messy. Stage 2 builds \(K_{k+1}\) cores and runs an SBM on the decremented probs + degree arrays; the SBM can sample a pair the core already placed, dedup drops one copy, and the block's intra-count leaks below the profile. Outliers get their own separate SBM through stage 3a. Stage 4's heap-greedy matcher silently drops stubs when a hub runs out of valid partners.
v2 keeps the \(K_{k+1}\) cores and rebuilds the rest. Stage 2 is constructive-only; nothing else samples during the core phase.
Stage 3 is a single residual SBM over every block (real clusters plus a combined outlier block). Invalid edges go to a block-preserving 2-opt rewire that holds both block-pair counts and every node's block membership constant. Stage 4 offers ten matcher algorithms: five global ones, plus five block-aware counterparts that gate every accepted edge on a per-block-pair budget read from the reference clustering. The default is the dynamic-heap greedy with the block-aware filter, which logs gridlock rather than hiding it.
Every edge is owned by exactly one stage, and nothing disappears without a log line. The per-cluster edge-connectivity proof is inherited verbatim from v1.
Shared input graph: 18 clustered + 2 singletons.
The 20-node synthetic this page builds on, drawn under the shared singleton view: C1 (8), C2 (6), C3 (4), plus outliers 19 + 20 each in their own 1-node cluster. EC-SBM v2 then forces outlier_mode=excluded in the next stage, dropping outliers from the profile-time partition; stage 3 folds them back as a combined block. This view is the universal baseline before either move.
Per-node block label. Outliers excluded now, folded back at stage 3.
Profile writes the same seven CSVs as v1: node_id, cluster_id, assignment, degree, edge_counts, mincut, and com. v2 also forces outlier_mode=excluded, so profile's tables cover only the clustered sub-network. The difference shows up later: at stage 3, v2 re-reads the empirical edgelist and folds every outlier into one combined pseudo-block, while v1 creates one singleton block per outlier.
outlier_mode=excluded · 18 nodes across three clusters
assignment.csv
A size-1 cluster has no internal edges (no partner inside), so giving each outlier its own block adds rows and columns to the block matrix without filling any diagonal. Under combined, every outlier shares one block; the outlier-outlier diagonal cell captures the real edges they have among themselves, and clustered-outlier cells capture the cross edges. One block means one residual-SBM cell per relation, which is all v2's stage 3 needs.
Per-node target degree, post-exclusion.
Same as v1: degree.csv carries \(k_u\) counted on the post-exclusion graph. The constructive phase uses these numbers; stage 3 then bumps per-node out_degs by each node's outlier-incident count after re-reading the original edgelist.
Edge count per block pair. 3x3 now, 4x4 at stage 3.
Same half-edge convention as v1 / SBM:
\[ e_{rr} \;=\; 2 \cdot \bigl| \{\, \{u, v\} \in E \,:\, b_u = b_v = r \,\} \bigr|. \]At profile time v2's matrix is the same compact 3x3 as v1 (outliers excluded). Stage 3 then rebuilds a 4x4 by re-reading the original edgelist and adding the combined outlier block on the fourth row/column.
One new scalar per cluster: \(k(C)\).
Same profile addition as v1. mincut.csv records the minimum edge cut of each cluster's induced subgraph in the input, measured via pymincut's Nagamochi-Ono-Ibaraki algorithm with a bucket-queue heap. Singleton clusters get \(k = 0\).
Each cluster's mincut on this synthetic, highlighted in red-dashed on the graph below. C1 has a non-trivial internal cut: two subsets of 4 nodes joined by a thin bridge of 2 edges. C2 and C3 isolate a single low-degree vertex.
| cluster | nodes | min deg in C | \(k(C)\) | one minimum cut |
|---|---|---|---|---|
| C1 | 8 | 3 | 2 | {(1, 5), (4, 8)}, splitting {1,2,3,4} from {5,6,7,8}. |
| C2 | 6 | 2 | 2 | {(9, 13), (12, 13)}. |
| C3 | 4 | 1 | 1 | {(16, 18)}. |
Same \(K_{k+1}\) seed as v1. Nothing else samples here.
Stage 2 calls exactly the same gen_kec_core.generate_cluster as v1, with one change: v2 passes --no-sbm-overlay, so the constructive pass ends with the cores + attach-by-availability and nothing more. No gt.generate_sbm here, no overlay, no dedup. probs and deg carry exactly the budget the construction used.
The walker below is the same per-arrival walk used on the v1 page: cluster-major sort, phase-1 arrivals auto-attach, phase-2 arrivals greedy-pick \(k\) partners by residual int_deg desc. Realization 1 is byte-verified against the shipped Python at PYTHONHASHSEED=0. random step re-samples partner picks at the current arrival only, leaving every other arrival frozen; random all draws a fresh universe across every greedy arrival.
Per-cluster edge-connectivity \(\ge k(C)\) is still guaranteed by the \(K_{k+1}\) seed. What changes is that probs and deg are decremented by exactly the constructive contribution and nothing else; the residual the SBM sees next is honest.
Compute what the SBM still owes each block.
Before the SBM call, prepare_sbm_inputs re-reads the original edgelist and rebuilds out_degs + probs on all four blocks (three real clusters + one combined outlier block). It subtracts the constructive contribution from out_degs and populates probs inter-cells from the original edge list. Finally it reconciles each diagonal against the block's residual stub budget:
\(D_k\) is "residual half-edges block \(k\) still owes", \(E^{\text{inter}}_k\) is "inter-block half-edges block \(k\) is already committed to". The intra-block budget the SBM still needs to sample is
\[ \text{diff}_k \;=\; D_k \;-\; E^{\text{inter}}_k. \]If \(\text{diff}_k \ge 0\), set \(\texttt{probs}[k, k] = \text{diff}_k\) (bump by 1 if odd, to satisfy graph-tool's even-half-edges-per-block rule, and add the bump back to one node's out_degs). If \(\text{diff}_k < 0\), the construction already over-spent the block's out-budget; add \(|\text{diff}_k|\) back to out_degs and set \(\texttt{probs}[k, k] = 0\) (the SBM can only add, never subtract).
On the synthetic, after the verified stage 2 run, the residual looks like:
C1 has already spent its full intra budget during construction, so the SBM sees 0 intra residual there. C2, C3, and OUT each still owe 2 half-edges (one intra edge per block). All four diffs are even on this synthetic, so no parity bump fires; the bump path triggers only on odd diffs (e.g. when an outlier-incident OO edge mismatches block-pair parity).
Constructive edges belong to stage 2. Residual intra + inter come from stage 3's SBM. Match-degree covers whatever the rewire cannot place. No cell is sampled twice, and nothing is double-counted because probs was decremented by exactly what stage 2 placed.
One SBM call. Block-preserving swaps repair invalid edges without losing the block count.
With the residual probs, out_degs, and the combined-outlier block assignment ready, v2 calls gt.generate_sbm(b, probs, out_degs, micro_ers=True, micro_degs=True) once. The sampler can still emit self-loops (two of a node's stubs pair together) and parallel edges (the same pair drawn twice). v1 dedupped both and accepted the degree loss. v2 tries to fix each one via a block-preserving 2-opt swap.
The swap: bucket every valid edge by its unordered block-pair \((A, B)\); for each invalid edge, pick a random valid edge in the same bucket and swap endpoints so each node stays in its home block. That preserves both the block-pair count and every node's block membership. Up to 10 retry passes with a stagnation detector. Edges still invalid after that are dropped with a WARN.
Step through; hit random to resample stage 3.
If the invalid edge \((u, v)\) has block-pair \((A, B)\) with \(A \neq B\), the partner edge \((x, y)\) must also sit in bucket \((A, B)\). Swap the \(A\)-side of each edge with the \(B\)-side of the other: \((u_A, x_B)\) and \((x_A, u_B)\). Both nodes stay in their home block, so the block-pair count is invariant. For intra-block \((A = A)\) there are two valid swaps; the code flips a coin.
Block-aware top-up by default. Logs gridlock; does not silently drop.
After the residual SBM + rewire, some nodes may still be short of their target. v2 hands the leftovers to a block-aware top-up that spends edges out of a per-block-pair budget read from the reference clustering, rather than letting stubs leak across cluster boundaries. Anything still unplaced is logged, never silently dropped the way v1's plain greedy does. The viz below runs the simpler global sibling so the heap mechanics stay readable; see the degree matcher page for the full menu.
Toggle apply match_degree to watch the per-node deficit pile shrink. Click random for a different realization.
What you get on the shipped example.
Default run on dnc + sbm-flat-best+cc at seed 1, with the residual-SBM rewire and the block-aware top-up:
| stat | input | v2 output | note |
|---|---|---|---|
| N | 906 | 906 | Exact. |
| edges | 10,429 | 10,191 | Off by 2.3%. The block-pair-aware top-up spends edges only where the reference \(e_{rs}\) has budget left; once every eligible block-pair is full or every viable partner is already a neighbour, leftover stubs are logged and the run stops. |
| mean degree | 23.02 | 22.50 | Follows the edge count. |
| global cc | 0.548 | 0.555 | A hair above the reference. The \(K_{k+1}\) cores carry the bulk of the triangles and the block-pair-aware top-up adds most of its edges where the input's \(e_{rs}\) concentrates, on the dense diagonals. v1 sits at 0.424 because its overlay dedup shaves intra-block density that v2's rewire keeps intact. |
| clusters | 87 | 87 | Profile-stage passthrough. |
v1 and v2 share the same \(K_{k+1}\) cores edge-for-edge; everything else gets reshuffled. The flow chart below pairs each edge with the stage that placed it on each side.
Both versions place the same \(K_{k+1}\) cores. v1 then runs an SBM on the mutated probs and uses remove_parallel_edges for collisions; the sampler can land on core pairs, dedup drops one, and the intra-block density inside that block is shaved. v2's rewire keeps the block-pair count intact, so the core cliques' triangle structure stays in place and the global cc sits closer to the input's 0.548.
Rewire is not a complete repair. When no valid partner sits in the same bucket, an invalid edge stays invalid; after 10 retry passes or a stagnation break those unresolved edges are dropped with a warning. On dnc at this seed the total drop is tens of edges out of ten thousand: invisible in the stats table, present in the run log. The configuration-model rewire variant of stage 4 is lossy in the same way; the block-aware greedy default catches what rewire strands as long as an eligible partner is still in budget.