Graph Partitioning Algorithms: Edge-cut vs Vertex-cut and an Engineering Guide to METIS

Subtitle / Abstract
Graph partitioning is not a minor offline preprocessing trick. It is a major production performance lever in graph databases: partition incorrectly, and both query latency and network traffic go out of control. Using the ACERS template, this article explains the trade-offs of Edge-cut vs Vertex-cut, the multilevel intuition behind METIS, and the metrics that actually matter in engineering.

Estimated reading time: 18-22 minutes
Tags: graph partitioning, Edge-cut, Vertex-cut, METIS
SEO keywords: Graph Partitioning, Edge-cut, Vertex-cut, METIS, Query Latency
Meta description: From objective functions to engineering metrics, understand how graph partitioning affects query latency and network communication, with runnable code and tuning steps.

Target Audience

Backend engineers building graph databases, graph computing platforms, risk-control graphs, or recommendation graphs
Performance engineers who need to diagnose “slow queries” at the partitioning layer
Algorithm engineers who want to move from concept-level understanding to production implementation

Background / Motivation

In relational databases, performance is often improved with indexes, join reordering, and cache hit optimization. In graph databases, cross-machine edges are often the first bottleneck.
When a query path frequently crosses partitions, it triggers:

Remote RPC round trips (RTT)
Remote subgraph fetching and deserialization
Multi-partition coordination and result merge overhead

So in production, graph partitioning directly impacts two core metrics:

Query latency (p95/p99)
Network communication volume (bytes/s, cross-partition messages)

In one sentence: in production graph databases, partitioning algorithms are not optional optimization, they are a foundational capability.

Core Concepts

Graph Partitioning: split a graph into k partitions while minimizing inter-partition coupling and maintaining load balance.
Edge-cut: minimize the number of cross-partition edges, with each vertex assigned to a single partition.
Vertex-cut: partition by edges and allow vertices to be replicated across partitions; useful for reducing skew caused by hot edges.
Balance Constraint: partition load must not skew too much; common constraints include |V_i| <= (1+ε)|V|/k or edge-load constraints.
METIS (core idea): a multilevel flow (Coarsen -> Initial Partition -> Uncoarsen + Refine) that reduces global search cost by “coarsen first, refine later.”

Quick Orientation Map (60-120 Seconds)

Problem shape: split a large graph into k partitions while minimizing cross-machine access and keeping load balanced.
One-line core: choose an objective first (Edge-cut/Vertex-cut), then solve an initial partition with a multilevel method and do incremental corrections.
When to use / avoid: static or slowly changing graphs fit offline baseline partitioning; high-frequency dynamic graphs need incremental rebalancing support.
Complexity glance: optimal partitioning is combinatorially hard; engineering relies on approximate algorithms plus a monitoring feedback loop.
Common failure mode: optimizing only cut while ignoring balance can make p99 worse.

Master Mental Model

Core abstraction: graph partitioning is a constrained graph-cut optimization problem.
Problem family: combinatorial optimization + local search + multi-objective trade-offs (communication, latency, load).
Isomorphism with known templates:
- Offline stage resembles multilevel coarse-to-fine optimization.
- Online stage resembles local hill-climbing with budget-limited migrations.

Feasibility and Lower-Bound Intuition

For densely connected graphs without clear community boundaries, the theoretical lower bound of cut will not be very low.
When query templates inherently cross communities (for example, cross-domain risk-control paths), even perfect partitioning cannot reduce cross-machine access to zero.
Under strong power-law degree distributions (a few ultra-high-degree nodes), pure Edge-cut faces a hotspot lower bound:
- You may reduce cut, but it is hard to flatten hotspots at the same time.

Counterexample:
If one supernode connects to 100k edges and traffic concentrates around it, forcing strict non-replication leads to severe skew in one partition. In this case, Vertex-cut is often more realistic than Edge-cut.

Problem Modeling and Constraint Scale

In practice, explicitly decompose the objective:

\[ \text{Score} = \alpha \cdot \text{CutCost} + \beta \cdot \text{ImbalanceCost} + \gamma \cdot \text{HotspotCost} \]

Where:

CutCost: number of cross-partition edges or weighted cross-edge sum
ImbalanceCost: penalty for deviation from target partition capacity
HotspotCost: local congestion penalty caused by hot vertices or edges
α,β,γ: business weights (derived from SLA)

Scale recommendations (starting points, not hard standards):

Tens of millions of vertices and hundreds of millions of edges: prioritize offline multilevel partitioning + periodic recalibration
As partition count k increases: check network bottlenecks first, then single-machine bottlenecks; avoid blindly increasing partitions
Scan ε (load slack) typically from 0.03 to 0.10

A — Algorithm (Problem and Algorithm)

Problem Restatement (Engineering Form)

Given a large graph G=(V,E), split it into k partitions such that:

Partition load is as balanced as possible;
Frequently traversed query edges stay inside partitions;
Network communication is minimized;
Hot vertices are handled with controllable strategy (avoid blowing up one machine).

Inputs and Outputs

Name	Type	Description
`G`	graph	Production graph (optionally weighted)
`k`	int	Number of partitions
`obj`	enum	Objective: Edge-cut or Vertex-cut
`constraint`	config	Load-balance threshold, hotspot threshold
return	`part(v)` / `part(e)`	Mapping of vertex or edge to partition

Example (8 Nodes, 2 Partitions)

Community A: 0-1-2-3-0
Community B: 4-5-6-7-4
Bridge edges: (1,4), (2,5), (3,6)

If cut by communities: P0={0,1,2,3}, P1={4,5,6,7}, Edge-cut = 3
If cut randomly: Edge-cut is often >= 6

This is exactly where “query latency can differ by more than 2x”: more cross-partition edges make distributed query loops far more likely.

Deriving the Approach (From Brute Force to Practical)

Naive Brute Force

Enumerate all partition assignments, then compute cut and balance
Exponential complexity, not deployable

Key Observations

Production graphs are usually sparse but huge, so we need approximate optimality rather than global optimality
Most gains come from:
- reducing cross-partition edges
- avoiding hot partitions
Algorithm names are not first priority; objective + constraints + metric feedback loop are.

Method Selection

Edge-cut track: common for OLTP graph queries, short paths, k-hop lookup
Vertex-cut track: more stable when ultra-high-degree vertices are obvious (celebrity vertices, super accounts)
METIS idea: one of the industrial default choices for offline baseline partitioning

C — Concepts (Core Ideas)

1) Edge-cut vs Vertex-cut

Edge-cut (Unique Vertex Ownership)

Objective (simplified):

[ \min \sum_{(u,v)\in E} [part(u) \neq part(v)] ]

Pros: intuitive model, simple query routing
Cons: supernodes can pull large numbers of edges into cross-partition traffic

Vertex-cut (Edge Ownership, Vertex Replication)

A common metric is replication factor:

[ RF = \frac{1}{|V|}\sum_{v\in V} |A(v)| ]

Where A(v) is the partition set containing vertex v. Lower RF is better.

Pros: can spread high-degree vertex edges across machines
Cons: more complex replica consistency and read/write path

2) METIS Multilevel Intuition (Must Understand)

The core of METIS is not one magical formula, but a three-stage flow:

Coarsening: heavy-edge matching to shrink the graph
Initial Partition: quickly generate an initial split on the small graph
Uncoarsen + Refine: project back layer by layer and reduce cut with FM/KL-like local optimization

Engineering value: turn one huge hard problem into many small corrections, usually more stable than greedy partitioning on the original graph.

Deepening Focus (PDKH)

This article deeply expands two concepts:

Concept A: Mapping Edge-cut objective to query latency
Concept B: METIS multilevel partition workflow

Concept A: Edge-cut -> Latency

Problem Reframe: partition quality essentially compresses cross-machine hops.
Minimal Example: under the same query template, Edge-cut=3 vs Edge-cut=7 can roughly double cross-machine requests.
Invariant: under valid load constraints, reducing cross-partition edges does not increase expected remote hops.
Formalization:
- latency ≈ local_cpu + remote_rtt * cross_hops + deserialize_cost
- cross_hops is highly correlated with cut ratio.
Correctness Sketch: with a fixed query template, fewer cross-partition edges means fewer boundary events triggering remote access.
Threshold: when cut_ratio > 0.25, many online graph queries show clear p99 degradation (empirical threshold; calibrate per business).
Failure Mode: minimizing cut without load control creates hot partitions and may reduce overall throughput.
Engineering Reality: cut must be evaluated together with partition load and hotspot degree distribution; do not drive by one metric.

Concept B: METIS Multilevel Flow

Problem Reframe: not “solve once,” but “coarsen, solve coarse, then refine layer by layer.”
Minimal Example: shrink a 10M-edge graph to 200k edges, partition, then replay refinements.
Invariant: each refinement accepts only migrations that reduce objective or keep constraints balanced.
Formalization: Coarsen -> Partition -> Uncoarsen/Refine.
Correctness Sketch: not globally optimal, but monotonic local improvements ensure non-degrading objective.
Threshold: larger graphs and clearer community structure usually yield more stable multilevel gains.
Failure Mode: if graph changes too fast, offline partitioning ages quickly and gains decay.
Engineering Reality: must pair with incremental rebalance (periodic repartition + hotspot migration).

Practical Guide / Steps

Define objective first: choose Edge-cut or Vertex-cut before choosing algorithm names.
Define constraints: partition capacity, hotspot threshold, migration budget.
Get offline baseline partition: use METIS-style multilevel ideas.
Observe online metrics: cut_ratio, RF, p95/p99, cross-partition bytes.
Do local rebalance: migrate in small steps based on hotspot and cross-edge contribution; avoid full repartition.
Run regression validation: benchmark representative query templates, not only one-time batch stats.

Selection Guide

Choose objective by degree distribution:
- Smooth degree distribution: try Edge-cut first.
- Strong power-law: prioritize Vertex-cut evaluation.
Choose objective by query type:
- Shortest path / local subgraph reads: Edge-cut is easier for routing optimization.
- Batch traversal / message propagation: Vertex-cut is often more stable under hotspot pressure.
Choose strategy by machine memory:
- Tight memory: reduce replication, use Vertex-cut cautiously.
- Relatively ample memory: replication can be used to trade for throughput stability.
Choose cadence by migration budget:
- Low migration budget: local incremental correction.
- Acceptable maintenance window: offline repartition + incremental backfill.

Runnable Example (Python)

Below is a runnable local-search example with “balance constraints + cut cost” (for understanding objectives, not a full METIS implementation):

from collections import defaultdict
from typing import Dict, List, Tuple

Edge = Tuple[int, int]


def edge_cut(edges: List[Edge], part: Dict[int, int]) -> int:
    return sum(1 for u, v in edges if part[u] != part[v])


def partition_sizes(part: Dict[int, int], k: int) -> List[int]:
    sizes = [0] * k
    for node in part:
        sizes[part[node]] += 1
    return sizes


def greedy_balanced_partition(
    nodes: List[int],
    edges: List[Edge],
    k: int,
    max_imbalance: float = 0.10,
    max_iter: int = 20,
) -> Dict[int, int]:
    part = {node: node % k for node in nodes}
    limit = int((1.0 + max_imbalance) * len(nodes) / k) + 1

    adj = defaultdict(list)
    for u, v in edges:
        adj[u].append(v)
        adj[v].append(u)

    for _ in range(max_iter):
        improved = False
        sizes = partition_sizes(part, k)

        for node in nodes:
            current = part[node]
            best_part = current
            best_gain = 0

            for candidate in range(k):
                if candidate == current:
                    continue
                if sizes[candidate] + 1 > limit:
                    continue

                # Estimate cut change if node is moved (positive means lower cut).
                gain = 0
                for nei in adj[node]:
                    before_cross = 1 if part[nei] != current else 0
                    after_cross = 1 if part[nei] != candidate else 0
                    gain += (before_cross - after_cross)

                if gain > best_gain:
                    best_gain = gain
                    best_part = candidate

            if best_part != current:
                sizes[current] -= 1
                sizes[best_part] += 1
                part[node] = best_part
                improved = True

        if not improved:
            break

    return part


def main() -> None:
    nodes = list(range(8))
    edges = [
        (0, 1), (1, 2), (2, 3), (3, 0),
        (4, 5), (5, 6), (6, 7), (7, 4),
        (1, 4), (2, 5), (3, 6),
    ]
    k = 2

    init_part = {node: node % k for node in nodes}
    init_cut = edge_cut(edges, init_part)

    opt_part = greedy_balanced_partition(nodes, edges, k=k)
    opt_cut = edge_cut(edges, opt_part)

    print("init part:", init_part, "cut=", init_cut)
    print("opt part :", opt_part, "cut=", opt_cut)


if __name__ == "__main__":
    main()

Run:

python3 graph_partition_demo.py

Runnable Example 2: Vertex-cut Replication Factor Estimation

from collections import defaultdict
from typing import Dict, List, Tuple

Edge = Tuple[int, int]


def replication_factor(edges: List[Edge], edge_part: Dict[Edge, int], n_nodes: int) -> float:
    node_parts = defaultdict(set)
    for (u, v), p in edge_part.items():
        node_parts[u].add(p)
        node_parts[v].add(p)
    total = sum(len(node_parts[node]) if node in node_parts else 1 for node in range(n_nodes))
    return total / n_nodes


def main() -> None:
    # Simplified example: 3 partitions.
    edges = [(0, 1), (0, 2), (0, 3), (4, 5), (5, 6), (6, 7), (3, 4)]
    edge_part = {
        (0, 1): 0,
        (0, 2): 1,
        (0, 3): 2,
        (4, 5): 1,
        (5, 6): 1,
        (6, 7): 1,
        (3, 4): 2,
    }
    rf = replication_factor(edges, edge_part, n_nodes=8)
    print("replication factor =", round(rf, 3))


if __name__ == "__main__":
    main()

This example is meant to visualize RF trend shifts: on the same graph, node replication overhead can differ significantly under different partitioning strategies.

Explanation and Principles (Why This Works)

The key value of this example is to make partition quality measurable:

You can directly observe how much cut is reduced;
You can add business query weights to give critical edges higher importance;
You can tighten balance constraints and observe the latency/throughput turning point.

In real production, METIS performs a more systematic “coarsen + replay refinement” at larger scales, but the underlying idea is still:

Have an objective;
Have constraints;
Have observable metric feedback.

Worked Example (Trace)

Below is a simplified trace of whether a partition migration is worthwhile:

Initial: cut_ratio = 0.29, p99 = 410ms, cross_bytes = 1.8GB/min
Candidate migration: move a 20k-node subgraph from P3 to P5
Estimated benefit: cut_ratio -> 0.23, P5 CPU +5%, P3 CPU -8%

Observed after execution:

Hour 1: p99 drops to 330ms, cross_bytes drops to 1.3GB/min
Hour 6: P5 load stabilizes, no hotspot alarm
Hour 24: peak p99 remains stable at 300~320ms

Conclusion: if load stays within threshold after migration, reducing cross edges usually yields stable latency gains.

Correctness (Proof Sketch)

This does not prove global optimality; it proves monotonic improvement of local migration strategy:

Invariant: every migration must satisfy capacity and hotspot constraints.
Preservation: accept migration only if Score decreases (or equal score but more stable).
Termination: local search stops when no candidate migration can further reduce Score.

So you get at least a constraint-satisfying local optimum, rather than uncontrolled random fluctuation.

Complexity and Thresholds

Offline multilevel methods are often approximately linear to sublinear scalable (depends on implementation and graph structure).
Online local migration cost per round depends on candidate set size |C| and incremental evaluation cost.
In engineering, thresholds matter more than Big-O:
- Migration window per round (for example, 5-15 minutes)
- Migration budget per round (for example, at most 0.5% of vertices)
- Rollback threshold (for example, rollback if p99 rises for 5 consecutive minutes)

Constant Factors and Engineering Reality

Serialization cost: cross-machine edges force object decode, often with high constant factors.
Cache locality: concentrated local subgraphs can significantly affect the upside via cache hit rate.
Batch window: if offline repartitioning exceeds maintenance windows, theoretical gains can be nullified.
Replica consistency: Vertex-cut write paths are more complex; be careful in mixed read-write workloads.

Production Troubleshooting Checklist (Required)

After partition rollout, do not stop at “average latency decreased.” Run a 24-hour replay checklist:

Do the four core metrics improve in the same direction?
- Are p95/p99 lower?
- Are cross-partition bytes lower?
- Are cut_ratio or RF moving toward target?
- Did per-partition CPU/memory stay under alert thresholds?
Is distribution being hidden by averages?
- Did top 10 slow query templates actually improve?
- Did long-tail queries regress?
- Is trend consistent during peak traffic windows?
Are migration side effects controlled?
- Any write jitter during migration windows?
- Did cache hit rate dip below guardrails?
- Has rollback script been drill-tested (at least once)?
Did hotspot partitions drift?
- Is today’s hottest partition the same as yesterday’s?
- Did hotspot just move from one machine to another?
- Is a dedicated hot-vertex strategy needed?
Are capacity boundaries exposed early?
- Under 7-day edge growth forecast, can current k still hold?
- Will replication-factor growth break memory budget?
- Should partition expansion windows be reserved early?

To make troubleshooting reusable, log partition changes in structured form:

{
  "change_id": "part-2026-02-09-01",
  "strategy": "edge_cut_with_balance",
  "before": {"cut_ratio": 0.27, "p99_ms": 380, "cross_bytes_mb_min": 1540},
  "after": {"cut_ratio": 0.21, "p99_ms": 305, "cross_bytes_mb_min": 1090},
  "risk": {"hot_partition_cpu_max": 0.72, "rollback_ready": true}
}

This structured record is critical for postmortems: it answers “why it worked,” “whether it is repeatable,” and “how to do it more safely next time.”

Metric Definitions (Avoid Team Misalignment)

Many partition discussions fail not because algorithms are weak, but because metric definitions differ. Standardize these:

cut ratio
- Definition: #cross-partition edges / #total edges
- Definition scope: report both on “active-subgraph edges” and “full-graph edges” separately
cross-partition bytes
- Definition: total network bytes from cross-partition requests
- Scope: split read and write paths; read-heavy workloads should prioritize read-path metrics
partition hotspot index
- Definition: max_partition_qps / avg_partition_qps
- Scope: compute on both 1-minute and 5-minute windows to capture jitter and trend
replication factor (Vertex-cut only)
- Definition: average number of replicas per vertex
- Scope: separately compute on online-active vertices to avoid risk dilution by cold data

With fixed definitions for these four metrics, partition optimization becomes an auditable engineering process instead of intuition debate.

Replay Benchmark Template (Python)

import csv
import statistics
from dataclasses import dataclass
from typing import List


@dataclass
class QuerySample:
    template: str
    latency_ms: float
    cross_bytes: int
    cross_hops: int


def load_samples(path: str) -> List[QuerySample]:
    result: List[QuerySample] = []
    with open(path, "r", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            result.append(
                QuerySample(
                    template=row["template"],
                    latency_ms=float(row["latency_ms"]),
                    cross_bytes=int(row["cross_bytes"]),
                    cross_hops=int(row["cross_hops"]),
                )
            )
    return result


def p99(values: List[float]) -> float:
    if not values:
        return 0.0
    values_sorted = sorted(values)
    idx = int(0.99 * (len(values_sorted) - 1))
    return values_sorted[idx]


def summarize(samples: List[QuerySample]) -> None:
    latency = [item.latency_ms for item in samples]
    cross_bytes = [item.cross_bytes for item in samples]
    cross_hops = [item.cross_hops for item in samples]
    print("count =", len(samples))
    print("avg_latency_ms =", round(statistics.mean(latency), 2))
    print("p99_latency_ms =", round(p99(latency), 2))
    print("avg_cross_bytes =", int(statistics.mean(cross_bytes)))
    print("avg_cross_hops =", round(statistics.mean(cross_hops), 3))


if __name__ == "__main__":
    baseline = load_samples("baseline.csv")
    candidate = load_samples("candidate.csv")
    print("baseline")
    summarize(baseline)
    print("candidate")
    summarize(candidate)

This script is suitable for minimal before/after replay comparison: same templates, same inputs, unified metrics, no guesswork conclusions.

E — Engineering (Applications)

Scenario 1: Online Graph Queries (Edge-cut Dominant)

Problem: high p99 on k-hop/path queries.
Approach: prioritize reducing cross-partition edges along common query boundaries while maintaining load balance.
Benefit: fewer cross-machine hops, more stable p95/p99.

Goal: cut_ratio from 0.31 -> 0.18
Result: path query p99 from 420ms -> 230ms (example measurement)

Scenario 2: Supernode Graphs (Vertex-cut More Stable)

Problem: a few vertices have extremely high out-degree, causing severe single-machine hotspots under Edge-cut.
Approach: edge partitioning + controlled vertex replication with RF monitoring.
Benefit: spreads hotspot writes/traversals across partitions.

Scenario 3: Sharding and Capacity Planning (METIS Baseline + Incremental Migration)

Problem: full repartition is expensive; business cannot tolerate frequent downtime migrations.
Approach: periodically recompute offline baseline, migrate only high-benefit candidate subgraphs online.
Benefit: continuously improve partition quality within migration budget.

R — Reflection (Deep Dive)

Complexity and Engineering Cost

Partitioning is combinatorially hard; pursuing global optimum is unrealistic.
Engineering should focus on sustainable optimization loops:
- baseline exists
- monitoring exists
- incremental repair exists

Alternatives and Trade-offs

Strategy	Pros	Cons	Best for
Edge-cut	Simple query routing	Supernodes can hotspot	OLTP graph queries
Vertex-cut	Better hotspot control	Replica consistency complexity	Power-law graphs
Random sharding	Simple implementation	High communication cost	Early PoC only

Quantitative Comparison (Example)

Metric	Strategy A (random)	Strategy B (Edge-cut)	Strategy C (Vertex-cut)
cut ratio	0.34	0.19	0.22
RF	1.00	1.00	1.38
query p99	480ms	260ms	290ms
network bytes	2.1GB/min	1.2GB/min	1.0GB/min

Interpretation:

Edge-cut is cleaner on read paths;
Vertex-cut can be better for hotspot and network bytes, but requires replica-management cost;
Real choice depends on read/write ratio and consistency requirements.

Common Pitfalls

Looking only at algorithm names, not objective functions: often leads to “great theory, poor production metrics.”
Optimizing cut only, ignoring balance: latency drops but throughput collapses.
One-time offline partition only: effects naturally decay in dynamic-graph settings.

Counterexample (Must Remember)

Suppose you place all hot vertices into one partition to reduce cut. Communication drops short-term, but that partition’s CPU spikes and queuing worsens p99.
This shows: partition optimization is multi-objective, not single-objective extreme optimization.

FAQ and Caveats

Can METIS directly solve online dynamic repartitioning?
No. METIS is better as an offline baseline; online operation requires incremental migration strategies.
Is Edge-cut always better than Vertex-cut?
No. Under extreme high-degree imbalance, Vertex-cut is often more stable.
How to decide when to repartition?
Watch trends, not points: rising cut_ratio, rising cross-partition bytes, rising p99 over sustained windows.
How to choose partition count k?
Set an upper bound by machine budget first, then benchmark the joint curve of k vs p95/p99 and communication volume to find turning points.

Best Practices and Recommendations

Define main query paths first, then define partition objective
Use business-weighted edges, not uniform edge weights
Set migration budget ceilings per batch to avoid global jitter
Monitor cut_ratio, RF, p99, and network bytes together; avoid single-metric decisions
Prepare dedicated strategies for hot vertices (replication, side indexes, or caching)

Migration Path (Skill Ladder)

After mastering this article, progress in this order:

Dynamic incremental partitioning: migrate only high-benefit local subgraphs
Query-aware partitioning: include query logs in partition weight modeling
Multi-tier graph storage coordination: co-optimize partitioning with hot/cold tiering and cache strategy
Online A/B validation framework: make partition strategies rollbackable, comparable, and auditable

S — Summary

Graph partitioning directly determines latency ceiling and network cost in graph databases.
Edge-cut and Vertex-cut have no universal winner; workload shape decides.
METIS’s core value is “multilevel scaling + local refinement,” not one-shot global optimality.
A production-ready partition strategy must include objective, constraints, monitoring, and incremental repair.

Closing Conclusion

One major engineering difference between graph databases and relational databases is that cross-edge communication can directly consume your performance budget.
Only by building robust partitioning capability can query performance move from “occasionally usable” to “predictably stable.”

References and Further Reading

METIS docs and paper: Karypis & Kumar, Multilevel k-way Partitioning Scheme
PowerGraph (classic Vertex-cut engineering practice)
Pregel / Giraph distributed graph computation models
Neo4j / JanusGraph sharding and query practice materials

Multi-Language Reference Implementations (Excerpt)

C++: Compute Edge-cut

#include <vector>
#include <utility>

int edgeCut(const std::vector<std::pair<int, int>>& edges,
            const std::vector<int>& part) {
    int cut = 0;
    for (const auto& edge : edges) {
        int u = edge.first;
        int v = edge.second;
        if (part[u] != part[v]) {
            cut += 1;
        }
    }
    return cut;
}

Go: Compute Partition Load

package main

func partitionSizes(part []int, k int) []int {
	sizes := make([]int, k)
	for _, partition := range part {
		sizes[partition]++
	}
	return sizes
}

JavaScript: Compute Replication Factor

function replicationFactor(edgeParts, nodeCount) {
  const nodeToParts = Array.from({ length: nodeCount }, () => new Set());
  for (const item of edgeParts) {
    const [u, v, p] = item;
    nodeToParts[u].add(p);
    nodeToParts[v].add(p);
  }
  let total = 0;
  for (const parts of nodeToParts) total += Math.max(parts.size, 1);
  return total / nodeCount;
}

Meta Information

Reading time: 18-22 minutes
Tags: graph partitioning, Edge-cut, Vertex-cut, METIS
SEO keywords: Graph Partitioning, Edge-cut, Vertex-cut, METIS, Query Latency
Meta description: How graph partitioning affects query latency and network communication, with runnable examples and an engineering tuning path.

Call to Action (CTA)

Choose one of your slowest online query templates, measure cross-partition hops and network bytes, then compare p95/p99 before and after one partition optimization pass.
You will quickly see: partition strategy is the core lever in graph-database performance engineering.

Target Audience#

Background / Motivation#

Core Concepts#

Quick Orientation Map (60-120 Seconds)#

Master Mental Model#

Feasibility and Lower-Bound Intuition#

Problem Modeling and Constraint Scale#

A — Algorithm (Problem and Algorithm)#

Problem Restatement (Engineering Form)#

Inputs and Outputs#

Example (8 Nodes, 2 Partitions)#

Deriving the Approach (From Brute Force to Practical)#

Naive Brute Force#

Key Observations#

Method Selection#

C — Concepts (Core Ideas)#

1) Edge-cut vs Vertex-cut#

Edge-cut (Unique Vertex Ownership)#

Vertex-cut (Edge Ownership, Vertex Replication)#

2) METIS Multilevel Intuition (Must Understand)#

Deepening Focus (PDKH)#

Concept A: Edge-cut -> Latency#

Concept B: METIS Multilevel Flow#

Practical Guide / Steps#

Selection Guide#

Runnable Example (Python)#

Runnable Example 2: Vertex-cut Replication Factor Estimation#

Explanation and Principles (Why This Works)#

Worked Example (Trace)#

Correctness (Proof Sketch)#

Complexity and Thresholds#

Constant Factors and Engineering Reality#

Production Troubleshooting Checklist (Required)#

Metric Definitions (Avoid Team Misalignment)#

Replay Benchmark Template (Python)#

E — Engineering (Applications)#

Scenario 1: Online Graph Queries (Edge-cut Dominant)#

Scenario 2: Supernode Graphs (Vertex-cut More Stable)#

Scenario 3: Sharding and Capacity Planning (METIS Baseline + Incremental Migration)#

R — Reflection (Deep Dive)#

Complexity and Engineering Cost#

Alternatives and Trade-offs#

Quantitative Comparison (Example)#

Common Pitfalls#

Counterexample (Must Remember)#

FAQ and Caveats#

Best Practices and Recommendations#

Migration Path (Skill Ladder)#

S — Summary#

Closing Conclusion#

References and Further Reading#

Multi-Language Reference Implementations (Excerpt)#

C++: Compute Edge-cut#

Go: Compute Partition Load#

JavaScript: Compute Replication Factor#

Meta Information#

Call to Action (CTA)#