第416行: |
第416行: |
| The parallelization of graph problems faces significant challenges: Data-driven computations, unstructured problems, poor locality and high data access to computation ratio. The graph representation used for parallel architectures plays a significant role in facing those challenges. Poorly chosen representations may unnecessarily drive up the communication cost of the algorithm, which will decrease its scalability. In the following, shared and distributed memory architectures are considered. | | The parallelization of graph problems faces significant challenges: Data-driven computations, unstructured problems, poor locality and high data access to computation ratio. The graph representation used for parallel architectures plays a significant role in facing those challenges. Poorly chosen representations may unnecessarily drive up the communication cost of the algorithm, which will decrease its scalability. In the following, shared and distributed memory architectures are considered. |
| | | |
− | 图问题的并行化面临着重大的挑战: 数据驱动的计算、非结构化问题、局部性差和计算数据访问率高。用于并行架构的图表示在面对这些挑战时扮演着重要的角色。选择的表示方式不当可能会增加不必要的算法连接代价,从而降低算法的可扩展性。在下面,我们将考虑共享和分布式的内存架构。 | + | 图问题的并行化面临着重大的挑战: 数据驱动的计算、非结构化问题、局部性差和计算数据访问率高。用于并行架构的图表示在面对这些挑战时扮演着重要的角色。选择的表示方式不当可能会增加不必要的算法连接代价,从而降低算法的可扩展性。接下来的段落中,我们将着重探讨共享和分布式的内存架构。 |
| | | |
| | | |
第427行: |
第427行: |
| In the case of a shared memory model, the graph representations used for parallel processing are the same as in the sequential case, since parallel read-only access to the graph representation (e.g. an adjacency list) is efficient in shared memory. | | In the case of a shared memory model, the graph representations used for parallel processing are the same as in the sequential case, since parallel read-only access to the graph representation (e.g. an adjacency list) is efficient in shared memory. |
| | | |
− | 在共享内存模型的情况下,之所以用于并行处理的图表示与顺序处理的方式相同,是因为对图表示的并行只读访问(例如:邻接表)是共享内存的有效方法。
| + | 在共享内存模型下,之所以用于并行处理的图表示与顺序处理的方式相同,是因为对图表示的并行只读访问(例如:邻接表)是共享内存的有效方法。 |
| | | |
| --[[用户:趣木木|趣木木]]([[用户讨论:趣木木|讨论]])通读一遍 注意多余符号的问题(例如:。邻接表) | | --[[用户:趣木木|趣木木]]([[用户讨论:趣木木|讨论]])通读一遍 注意多余符号的问题(例如:。邻接表) |
第448行: |
第448行: |
| Partitioning the graph needs to be done carefully - there is a trade-off between low communication and even size partitioning But partitioning a graph is a NP-hard problem, so it is not feasible to calculate them. Instead, the following heuristics are used. | | Partitioning the graph needs to be done carefully - there is a trade-off between low communication and even size partitioning But partitioning a graph is a NP-hard problem, so it is not feasible to calculate them. Instead, the following heuristics are used. |
| | | |
− | 图的划分需要仔细地进行——在低效率连接和大小划分之间有一个权衡。但是图的划分是一个'''<font color="#ff8000"><big>NP</big>艰难的问题 NP-Hard Problem</font>,因此计算它们是不可行的。但是,我们可以使用以下启发式。 | + | 图的划分需要仔细地进行——在低效率连接和大小划分之间有一个权衡。但是图的划分是一个'''<font color="#ff8000"><big>NP</big>艰难的问题 NP-Hard Problem</font>,因此计算它们是不可行的。不过,我们可以使用以下启发式。 |
| | | |
| | | |
第464行: |
第464行: |
| 2D partitioning: Every processor gets a submatrix of the adjacency matrix. Assume the processors are aligned in a rectangle <math>p = p_r \times p_c</math>, where <math>p_r | | 2D partitioning: Every processor gets a submatrix of the adjacency matrix. Assume the processors are aligned in a rectangle <math>p = p_r \times p_c</math>, where <math>p_r |
| | | |
− | 2 d 分区: 每个处理器都有一个邻接矩阵的子矩阵。假设处理器在一个矩形 <math>p = p_r 乘以 p_c</math> 中对齐,其中 <math>p_r</math>and<math>p_c</math>and<math>p_c[/math ]和[ math ]</math> are the amount of processing elements in each row and column, respectively. Then each processor gets a [[submatrix]] of the adjacency matrix of dimension <math>(n/p_r)\times(n/p_c)</math>. This can be visualized as a [[checkerboard]] pattern in a matrix.<ref name=":2" /> Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to <math>p_r + p_c - 1</math> out of <math>p = p_r \times p_c</math> possible ones. | + | 2 d 分区: 每个处理器都有一个邻接矩阵的子矩阵。假设处理器在一个矩形 <math>p = p_r 乘以 p_c</math> 中对齐,其中 |
| + | |
| + | <math>p_r</math>and<math>p_c</math>and<math>p_c[/math ]和[ math ]</math> are the amount of processing elements in each row and column, respectively. Then each processor gets a [[submatrix]] of the adjacency matrix of dimension <math>(n/p_r)\times(n/p_c)</math>. This can be visualized as a [[checkerboard]] pattern in a matrix.<ref name=":2" /> Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to <math>p_r + p_c - 1</math> out of <math>p = p_r \times p_c</math> possible ones. |
| | | |
| </math> are the amount of processing elements in each row and column, respectively. Then each processor gets a submatrix of the adjacency matrix of dimension <math>(n/p_r)\times(n/p_c)</math>. This can be visualized as a checkerboard pattern in a matrix. Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to <math>p_r + p_c - 1</math> out of <math>p = p_r \times p_c</math> possible ones. | | </math> are the amount of processing elements in each row and column, respectively. Then each processor gets a submatrix of the adjacency matrix of dimension <math>(n/p_r)\times(n/p_c)</math>. This can be visualized as a checkerboard pattern in a matrix. Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to <math>p_r + p_c - 1</math> out of <math>p = p_r \times p_c</math> possible ones. |