Introduction to Graph Data
I. Concept of Graph
1. Origin of Graph problem
[Seven Bridges of Knigsberg, 1736] The problem is to devise a walk across each of the seven bridges once and only once to touch every part of the town, or this walk does not exist.
Solution: Seven Bridge Problem is to find the Euler Circuit in a graph. It is easier than finding a Hamilton Circuit.
2. Concept of Graph
A graph is a set of nodes and edges that connect them.
(1) Difference between Network and Graph
① Network: Topology structure in a real system.
- Examples: Web, Social Network and Metabolic Network.
- Terminology: Network, Node, Link.
② Graph: Mathematical representation of network.
- Examples: Web Graph, Social Graphs, Knowledge Graph.
- Terminology: Graph, Vertex, Edge.
(2) Application Field
- Social Network
- Citation Network
- Road Network
- Protein Network
- Knowledge Graph
- Internet
- …
(3) Why Graph So Important?
- Graph is a general data structure to model relationships between different entities
- Graph provides a universal language to describe complex data
- Problem: Data availability and computational challenges
- Graph bridges Big Data and Artificial Intelligence
II. Terminology in Graph
1. Directed/Undirected Graph and Vertex Degree
Difference: Whether the edge has its direction.
(1) Undirected Graph
- Properties of Edge: Symmetrical and reciprocal, i.e. (u,v)≡ (v,u)
- Examples: Wechat graph, Collaboration Graph
- Degree(v): the number of edges adjacent to vertex v.
–
–
(2)Directed Graph
Properties of Edge: Directed
Examples: Weibo Following graph, Phone Call Graph
Degree(v): Divide into in-degree and out-degree. And
- , and
- Source Vertex is the vertex with , and Sink Vertex is the vertex with .
2. Special Graphs
(1) Clique(Complete Graph)
Clique, a.k.a Complete Graph, is the undirected graph whose vertices are all connected.
- Number of Edges: In clique, assuming vertices, the total edge number is: . It is also the maximum of number of edge in a graph with vertices.
(2) Bipartite Graph
Bipartite Graph is a graph whose vertices can be divided into two disjoint sets U and V, and every edge connect between U and V.
- Examples: Authors-to-Papers, Buyers-to-Products
3. Representing Graphs
(1) Adjacency Matrix
① Undirected Graph:
- Symmetric: The matrix is symmetric, that is,
- If Non-cyclic: for non-cyclic graph.
- Degree of vertex i:
- Total Edge of Graph:
② Directed Graph:
- Non-symmetric: The matrix is not symmetric, that is, .
- If Non-cyclic: for non-cyclic graph
- Degree of vertex *i*:
- Total Edge of Graph:
③ Disadvantages: Space Complexity: . However, the adjacency matrix is a sparse matrix. It is too much space costing. In graph processing system, adjacency matrix is totally abandoned for its expensive cost for storing.
(2)Adjacency List
Source Vertex | Dst. |
---|---|
1 | (Null) |
2 | 3,4 |
3 | 2,4 |
4 | 5 |
5 | 1,2 |
- Advantages: Easier to work when graph is large and sparse.
- Disadvantages: The time complexity of query a vertex is , which is time-expensive. Besides, point chasing happens when we want to find a path.
(3) Compressed Sparse Representation
Vertex | Offset | Edges |
---|---|---|
1 | 0 | 3 |
2 | 0 | 4 |
3 | 2 | 2 |
4 | 4 | 4 |
5 | 5 | 5 |
1 | ||
2 |
Offset of corresponding Vertex points to the position of vertex’s adjacency edge in Edges.
- Advantages: It is the most compact form of graph representation.
- Disadvantages: Leaving no space for dynamic graph processing.
(4) Sparsity of Graph
Most of graphs in real world are sparse.
Conclusion: It is impossible to use adjacency matrix in graph database or processing system. In real graph computing system, we usually use compressed format to store the large-scale graph. However, it brings new challenge of building dynamic graph computing system for programmers and developers. Thus, *dynamic graph processing system* is one of the future direction for architecture researchers.
Choice of the proper network representation of a given domain/problem determines our ability to use network successfully.
- In some cases, there is a unique, unambiguous representation
- In some cases, the representation is by no means of unique
- The way you assign links will determine the nature of the question you study.
4. Topology Feature of Graph
(1) Self-loop and Multigraph
Note: In adjacency matrix, if there are 1 elements in diagonal, it is a self-loop graph. If there are , then the graph is a multigraph.
(2) Connectivity
① Connected Undirected Graph
② Conncted Directed Graph
③ Strong/Weak Connected Components
- Weak Connected Components(WCC): Despite of direction, if the subgraph is a connected graph, then the subgraph can be called as a WCC of original Graph.
5. Some Real Graph Categories
- 本文作者: Zhang Xinmiao
- 本文链接: https://recoderchris.github.io/2021/02/04/图数据管理笔记1/
- 版权声明: 本博客所有文章除特别声明外,均采用 MIT 许可协议。转载请注明出处!