图1:A directed graph with three vertices (blue circles) and three edges (black arrows).
一个有三个蓝色圆圈(点(图论 Graph Theory)/顶点)和三条黑色箭头的边(边(图论)的有向图。

一个有三个顶点(蓝色圆圈)和三条边(黑色箭头)的有向图 Directed Graph

在计算机科学中,图是一种抽象的数据类型,用来展现数学中图论领域中的无向图 Undirected Graph有向图 Directed Graph的概念。

一个图的数据结构由一个有限的(也可能是可变的)顶点集 Set Of Vertices(也称为节点或点) ,以及一组无向图的无序顶点对或有向图的有序顶点对组成。这些连线被称为边(也称为链接或直线) ,对于有向图,也称为箭头。顶点可以是图结构的一部分,也可以是由整数索引或引用表示的外部实体。

A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.).


Operations 操作方式


The basic operations provided by a graph data structure G usually include:

图形G 的数据结构提供的基本操作通常包括:

  • adjacent(G, x, y): tests whether there is an edge from the vertex x to the vertex y;
  • neighbors(G, x): lists all vertices y such that there is an edge from the vertex x to the vertex y;
  • add_vertex(G, x): adds the vertex x, if it is not there;
  • remove_vertex(G, x): removes the vertex x, if it is there;
  • add_edge(G, x, y): adds the edge from the vertex x to the vertex y, if it is not there;
  • remove_edge(G, x, y): removes the edge from the vertex x to the vertex y, if it is there;
  • get_vertex_value(G, x): returns the value associated with the vertex x;
  • set_vertex_value(G, x, v): sets the value associated with the vertex x to v.

Structures that associate values to the edges usually also provide:


  • get_edge_value(G, x, y): returns the value associated with the edge (x, y);
  • set_edge_value(G, x, y, v): sets the value associated with the edge (x, y) to v.

Representations 表达式

Different data structures for the representation of graphs are used in practice:


Adjacency list

邻接表 Adjacency List

Vertices are stored as records or objects, and every vertex stores a list of adjacent vertices. This data structure allows the storage of additional data on the vertices. Additional data can be stored if edges are also stored as objects, in which case each vertex stores its incident edges and each edge stores its incident vertices.


Adjacency matrix

邻接矩阵 Adjacency Matrix

A two-dimensional matrix, in which the rows represent source vertices and columns represent destination vertices. Data on edges and vertices must be stored externally. Only the cost for one edge can be stored between each pair of vertices.

一个二维矩阵,其中行表示源顶点 Source Vertices,列表示目标顶点 Destination Vertices。关于边和顶点的数据必须存储在外部。只有一条边时它可以被存储在每对顶点之间。

Incidence matrix

关联矩阵 Incidence Matrix

A two-dimensional Boolean matrix, in which the rows represent the vertices and columns represent the edges. The entries indicate whether the vertex at a row is incident to the edge at a column.

一个二维布尔矩阵,其中行表示顶点,列表示边。矩阵的条目值The entries 表明行上的顶点是否与列上的边相关联。--信白该句存疑.翻译成条目值感觉不合适

The following table gives the time complexity cost of performing various operations on graphs, for each of these representations, with |V | the number of vertices and |E | the number of edges. In the matrix representations, the entries encode the cost of following an edge. The cost of edges that are not present are assumed to be ∞.

下表给出了在图上执行各种操作的时间复杂度 Time Complexity,对于每个表达式,用 | V | 顶点数和 | E | 边数。在矩阵表示中,条目值the entries跟随边的代价进行编码。假定不存在的边的值为∞。

Adjacency list Adjacency list 邻接表 Adjacency matrix Adjacency matrix 邻接矩阵 Incidence matrix Incidence matrix 关联矩阵
Store graph Store graph Store graph [math]\displaystyle{ O(|V|+|E|) }[/math] [math]\displaystyle{ O(|V|+|E|) }[/math]

(| v | + | e |)

[math]\displaystyle{ O(|V|^2) }[/math] [math]\displaystyle{ O(|V|^2) }[/math] v | ^ 2) [math]\displaystyle{ O(|V|\cdot|E|) }[/math] [math]\displaystyle{ O(|V|\cdot|E|) }[/math] v | cdot | e |)
Add vertex Add vertex Add vertex [math]\displaystyle{ O(1) }[/math] [math]\displaystyle{ O(1) }[/math] < math > o (1) </math > [math]\displaystyle{ O(|V|^2) }[/math] [math]\displaystyle{ O(|V|^2) }[/math] v | ^ 2) [math]\displaystyle{ O(|V|\cdot|E|) }[/math] [math]\displaystyle{ O(|V|\cdot|E|) }[/math] v | cdot | e |)
Add edge Add edge Add edge [math]\displaystyle{ O(1) }[/math] [math]\displaystyle{ O(1) }[/math] < math > o (1) </math > [math]\displaystyle{ O(1) }[/math] [math]\displaystyle{ O(1) }[/math] < math > o (1) </math > [math]\displaystyle{ O(|V|\cdot|E|) }[/math] [math]\displaystyle{ O(|V|\cdot|E|) }[/math] v | cdot | e |)
Remove vertex Remove vertex 删除顶点 [math]\displaystyle{ O(|E|) }[/math] [math]\displaystyle{ O(|E|) }[/math] e |) [math]\displaystyle{ O(|V|^2) }[/math] [math]\displaystyle{ O(|V|^2) }[/math] v | ^ 2) [math]\displaystyle{ O(|V|\cdot|E|) }[/math] [math]\displaystyle{ O(|V|\cdot|E|) }[/math] v | cdot | e |)
Remove edge Remove edge Remove edge [math]\displaystyle{ O(|V|) }[/math] [math]\displaystyle{ O(|V|) }[/math] v |) </math > [math]\displaystyle{ O(1) }[/math] [math]\displaystyle{ O(1) }[/math] < math > o (1) </math > [math]\displaystyle{ O(|V|\cdot|E|) }[/math] [math]\displaystyle{ O(|V|\cdot|E|) }[/math] v | cdot | e |)
Are vertices x and y adjacent (assuming that their storage positions are known)? Are vertices x and y adjacent (assuming that their storage positions are known)? 顶点 x 和 y 是否相邻(假设它们的存储位置已知) ? [math]\displaystyle{ O(|V|) }[/math] [math]\displaystyle{ O(|V|) }[/math] v |) </math > [math]\displaystyle{ O(1) }[/math] [math]\displaystyle{ O(1) }[/math] < math > o (1) </math > [math]\displaystyle{ O(|E|) }[/math] [math]\displaystyle{ O(|E|) }[/math] e |)
Remarks 备注


Slow to add or remove vertices, because matrix must be resized/copied 增加或删除顶点速度慢,因为矩阵必须调整大小/复制 Slow to add or remove vertices and edges, because matrix must be resized/copied 增加或删除顶点和边时速度慢,因为矩阵必须调整大小/复制


Adjacency lists are generally preferred because they efficiently represent sparse graphs. An adjacency matrix is preferred if the graph is dense, that is the number of edges |E | is close to the number of vertices squared, |V |2, or if one must be able to quickly look up if there is an edge connecting two vertices.[5][6]

Adjacency lists are generally preferred because they efficiently represent sparse graphs. An adjacency matrix is preferred if the graph is dense, that is the number of edges |E | is close to the number of vertices squared, |V |2, or if one must be able to quickly look up if there is an edge connecting two vertices.

邻接表通常是首选的,因为它们能有效地表示稀疏图 Sparse Graph。如果图是稠密图 Dense Graph的,那么邻接矩阵是首选的,即边的数目 |E| 接近于顶点的平方数,|V|2 ,或者说如果有一条边连接两个顶点,那么所选取的数据结构必须能满足快速查找到数据才行。

Parallel Graph Representations 图的并行化表示

The parallelization of graph problems faces significant challenges: Data-driven computations, unstructured problems, poor locality and high data access to computation ratio. The graph representation used for parallel architectures plays a significant role in facing those challenges. Poorly chosen representations may unnecessarily drive up the communication cost of the algorithm, which will decrease its scalability. In the following, shared and distributed memory architectures are considered.

图问题的并行化面临着重大的挑战: 数据驱动的计算、非结构化问题、局部性差和计算数据访问率高。用于并行架构的图表示在面对这些挑战时扮演着重要的角色。选择的表示方式不当可能会增加不必要的算法连接代价,从而降低算法的可扩展性。在下面,我们将考虑共享和分布式的内存架构。

Shared memory 共享内存

In the case of a shared memory model, the graph representations used for parallel processing are the same as in the sequential case, since parallel read-only access to the graph representation (e.g. an adjacency list) is efficient in shared memory.


Distributed Memory 分布式存储

In the distributed memory model, the usual approach is to partition the vertex set [math]\displaystyle{ V }[/math] of the graph into [math]\displaystyle{ p }[/math] sets [math]\displaystyle{ V_0, \dots, V_{p-1} }[/math]. Here, [math]\displaystyle{ p }[/math] is the amount of available processing elements (PE). The vertex set partitions are then distributed to the PEs with matching index, additionally to the corresponding edges. Every PE has its own subgraph representation, where edges with an endpoint in another partition require special attention. For standard communication interfaces like MPI, the ID of the PE owning the other endpoint has to be identifiable. During computation in a distributed graph algorithms, passing information along these edges implies communication.

在分布式存储模型中,常用的方法是将图的顶点集合[math]\displaystyle{ V }[/math] 分解为[math]\displaystyle{ P }[/math] 集合 [math]\displaystyle{ Vo,…,V{ p-1} }[/math] 。这里,[math]\displaystyle{ p }[/math] 是可用处理元素(PE)的数量。然后,顶点集合分区被分配到具有匹配索引的 PE 中,并附加到相应的边上。每个 PE 都有自己的子图表示法,其中带有另一个分区中端点的边需要特别注意。对于像 MPI 这样的标准通信接口,拥有其他端点的 PE 的 ID 必须是可识别的。在分布式图算法的计算过程中,沿着这些边传递信息意味着连接通信。


Partitioning the graph needs to be done carefully - there is a trade-off between low communication and even size partitioning But partitioning a graph is a NP-hard problem, so it is not feasible to calculate them. Instead, the following heuristics are used.

图的划分需要仔细地进行——在低效率连接和大小划分之间有一个权衡。但是图的划分是一个NP艰难的问题 NP-Hard Problem,因此计算它们是不可行的。但是,我们可以使用以下启发式。

1D partitioning: Every processor gets [math]\displaystyle{ n/p }[/math] vertices and the corresponding outgoing edges. This can be understood as a row-wise or column-wise decomposition of the adjacency matrix. For algorithms operating on this representation, this requires an All-to-All communication step as well as [math]\displaystyle{ \mathcal{O}(m) }[/math] message buffer sizes, as each PE potentially has outgoing edges to every other PE.

1D 分区: 每个处理器都会得到 [math]\displaystyle{ n/p }[/math] 顶点和相应的外边。这可以理解为按行或按列对邻接矩阵进行展开。对于在这种表示形式上运行的算法,需要一个 All-to-All 连接步骤以及 [math]\displaystyle{ mathcal{o}(m) }[/math] 消息缓冲区大小,因为每个 PE 可能具有相对于其他 PE 的输出边。

2D partitioning: Every processor gets a submatrix of the adjacency matrix. Assume the processors are aligned in a rectangle [math]\displaystyle{ p = p_r \times p_c }[/math], where [math]\displaystyle{ p_r 2D partitioning: Every processor gets a submatrix of the adjacency matrix. Assume the processors are aligned in a rectangle \lt math\gt p = p_r \times p_c }[/math], where [math]\displaystyle{ p_r 2 d 分区: 每个处理器都有一个邻接矩阵的子矩阵。假设处理器在一个矩形 \lt math\gt p = p_r 乘以 p_c }[/math] 中对齐,其中 [math]\displaystyle{ p_r }[/math]and[math]\displaystyle{ p_c }[/math]and[math]\displaystyle{ p_c[/math ]和[ math ] }[/math] are the amount of processing elements in each row and column, respectively. Then each processor gets a submatrix of the adjacency matrix of dimension [math]\displaystyle{ (n/p_r)\times(n/p_c) }[/math]. This can be visualized as a checkerboard pattern in a matrix.[11] Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to [math]\displaystyle{ p_r + p_c - 1 }[/math] out of [math]\displaystyle{ p = p_r \times p_c }[/math] possible ones.

</math> are the amount of processing elements in each row and column, respectively. Then each processor gets a submatrix of the adjacency matrix of dimension [math]\displaystyle{ (n/p_r)\times(n/p_c) }[/math]. This can be visualized as a checkerboard pattern in a matrix. Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to [math]\displaystyle{ p_r + p_c - 1 }[/math] out of [math]\displaystyle{ p = p_r \times p_c }[/math] possible ones.

</math > 是每行和每列中处理元素的数量。然后每个处理器得到维数 [math]\displaystyle{ (n/p_r)乘以(n/p_c) }[/math] 的邻接矩阵。这可以可视化为矩阵中的棋盘格模式。因此,每个处理单元只能在同一行和列中具有 PE 的输出边。这将每个 PE 的通信伙伴的数量限制为 [math]\displaystyle{ p_r + p_c-1 }[/math][math]\displaystyle{ p = p_r 乘以 p_c }[/math] 可能的伙伴。</math> are the amount of processing elements in each row and column, respectively. Then each processor gets a submatrix of the adjacency matrix of dimension [math]\displaystyle{ (n/p_r)\times(n/p_c) }[/math]. This can be visualized as a checkerboard pattern in a matrix. Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to [math]\displaystyle{ p_r + p_c - 1 }[/math] out of [math]\displaystyle{ p = p_r \times p_c }[/math] possible ones. --信白该句存疑,</math>是代码吗?这块儿没明白怎么搞

See also 另请参见

图的遍历 Graph Traversal用于图遍历的策略

图数据库 Graph Database用于图(数据结构)的持久性

  • Graph rewriting for rule based transformations of graphs (graph data structures)

图重构 Graph Rewriting用于基于规则的图形转换(图数据结构)

画图软件 Graph Drawing Software用于绘制图形的软件、系统和系统提供商

References 参考文献

