The Language of Networks: Graph Theory

Social Network Analysis

Termeh Shafie

Network Representations

graph based representation (informally):

a network can be represented by a graph

which is a set of nodes joined by a set of lines known as undirected edges

or a set of arrows known as directed edges or arcs

Nodes and Edges

we conceive of a network as a relation defined on a set of individuals

undirected ties

  • attended meeting with
  • communicates with

directed ties

  • go to for advice
  • lent money to

Strength of a Tie

we can attach values to ties, representing quantitative attributes

  • strength of relationship
  • information capacity of tie
  • rates of flow or traffic across tie
  • distances between nodes
  • probabilities
  • frequency of interaction

Node Attributes

the actors (nodes) in the network are individuals with attitudes, behaviors, and attributes which may

  • guide them in their choices of partners
  • be shaped (influenced) by their partners

Graphs

we conceive of a network as a graph: \(G(V, E)\)

on individuals or nodes or vertices: \(V = \{1, 2, ..., n \}\)

relations or ties or edges: \(E \subseteq \{(i, j) : i, j \in V\}\)

Graphs

we conceive of a network as a graph: \(G(V, E)\)

on individuals or nodes or vertices: \(V = \{1, 2, ..., n \}\)

relations or ties or edges: \(E \subseteq \{(i, j) : i, j \in V\}\)

individuals: \(V = \{1, 2, 3, 4\}\)

relations: \(E = \{(1, 2), (1, 3), (3, 2), (4, 2)\}\)

Graphs

we conceive of a network as a graph: \(G(V, E)\)

on individuals or nodes or vertices: \(V = \{1, 2, ..., n \}\)

relations or ties or edges: \(E \subseteq \{(i, j) : i, j \in V\}\)

individuals: \(V = \{1, 2, 3, 4\}\)

relations: \(E = \{(1, 2), (1, 3), (3, 2), (4, 2)\}\)

Graphs

we conceive of a network as a graph: \(G(V, E)\)

on individuals or nodes or vertices: \(V = \{1, 2, ..., n \}\)

relations or ties or edges: \(E \subseteq \{(i, j) : i, j \in V\}\)

the subset \(A = \{(1, 2), (1, 3), (3, 2)\}\) is induced by removing {4} from \(V\)

Graphs

we conceive of a network as a graph: \(G(V, E)\)

on individuals or nodes or vertices: \(V = \{1, 2, ..., n \}\)

relations or ties or edges: \(E \subseteq \{(i, j) : i, j \in V\}\)

How similar are these two graphs?

Network Representation

Adjacency Matrix

We conceive of a graph as a collection of tie variables \(\{X_{ij} : i, j \in V\}\)

\[ X_{ij} = \begin{cases} x_{ij} & \text{if } i \rightarrow j \\ 0 & \text{otherwise} \end{cases} \]

\[ \left( \begin{array}{c|cccc} & 1 & 2 & 3 & 4 \\ \hline 1 & x_{11} & x_{12} & x_{13} & x_{14} \\ 2 & x_{21} & x_{22} & x_{23} & x_{24} \\ 3 & x_{31} & x_{32} & x_{33} & x_{34} \\ 4 & x_{41} & x_{42} & x_{43} & x_{44} \\ \end{array} \right) \]


Network Representation

Adjacency Matrix

We conceive of a graph as a collection of tie variables \(\{X_{ij} : i, j \in V\}\)

\[ X_{ij} = \begin{cases} 1 & \text{if } i \rightarrow j \\ 0 & \text{otherwise} \end{cases} \]

\[ \left( \begin{array}{c|cccc} & 1 & 2 & 3 & 4 \\ \hline 1 & - & 1 & 1 & 0 \\ 2 & 0 & - & 0 & 0 \\ 3 & 0 & 1 & - & 0 \\ 4 & 0 & 1 & 0 & - \\ \end{array} \right) \]


Graph Structures and Structural Metrics


  • graph structures used to isolate graph’s interesting/important sections

  • structural metrics give measurements of a graph’s structural property

    • global metrics refer to a whole graph
    • local metrics refer to induced subgraphs of different order (order 1 = nodes, order 2 = dyads, order 3 = triads, etc.)

Network Representation

Adjacency Matrix


useful for summarizing structural information and patterns of a network

  • degree
  • number of nodes/edges
  • undirected/directed networks
  • density
  • shortest paths
  • etc.

Example: Rise of the Medici


data("flo_marriage")
as_adjacency_matrix(flo_marriage, sparse = FALSE)
#>              Acciaiuoli Albizzi Barbadori Bischeri Castellani Ginori Guadagni
#> Acciaiuoli            0       0         0        0          0      0        0
#> Albizzi               0       0         0        0          0      1        1
#> Barbadori             0       0         0        0          1      0        0
#> Bischeri              0       0         0        0          0      0        1
#> Castellani            0       0         1        0          0      0        0
#> Ginori                0       1         0        0          0      0        0
#> Guadagni              0       1         0        1          0      0        0
#> Lamberteschi          0       0         0        0          0      0        1
#> Medici                1       1         1        0          0      0        0
#> Pazzi                 0       0         0        0          0      0        0
#> Peruzzi               0       0         0        1          1      0        0
#> Pucci                 0       0         0        0          0      0        0
#> Ridolfi               0       0         0        0          0      0        0
#> Salviati              0       0         0        0          0      0        0
#> Strozzi               0       0         0        1          1      0        0
#> Tornabuoni            0       0         0        0          0      0        1
#>              Lamberteschi Medici Pazzi Peruzzi Pucci Ridolfi Salviati Strozzi
#> Acciaiuoli              0      1     0       0     0       0        0       0
#> Albizzi                 0      1     0       0     0       0        0       0
#> Barbadori               0      1     0       0     0       0        0       0
#> Bischeri                0      0     0       1     0       0        0       1
#> Castellani              0      0     0       1     0       0        0       1
#> Ginori                  0      0     0       0     0       0        0       0
#> Guadagni                1      0     0       0     0       0        0       0
#> Lamberteschi            0      0     0       0     0       0        0       0
#> Medici                  0      0     0       0     0       1        1       0
#> Pazzi                   0      0     0       0     0       0        1       0
#> Peruzzi                 0      0     0       0     0       0        0       1
#> Pucci                   0      0     0       0     0       0        0       0
#> Ridolfi                 0      1     0       0     0       0        0       1
#> Salviati                0      1     1       0     0       0        0       0
#> Strozzi                 0      0     0       1     0       1        0       0
#> Tornabuoni              0      1     0       0     0       1        0       0
#>              Tornabuoni
#> Acciaiuoli            0
#> Albizzi               0
#> Barbadori             0
#> Bischeri              0
#> Castellani            0
#> Ginori                0
#> Guadagni              1
#> Lamberteschi          0
#> Medici                1
#> Pazzi                 0
#> Peruzzi               0
#> Pucci                 0
#> Ridolfi               1
#> Salviati              0
#> Strozzi               0
#> Tornabuoni            0

adjacency matrix is symmetric for an undirected network

Adjacency Matrix: Undirected Network


Adjacency Matrix: Directed Network


Adjacency Matrix: Weighted Network


Adjacency Matrix: Signed Network


Adjacency Matrix

A <- matrix(
  c(0, 1, 1,
    1, 0, 1,
    1, 1, 0),
  nrow = 3, ncol = 3, byrow = TRUE)
rownames(A) <- c("Bob","Ann","Steve")
colnames(A) <- c("Bob","Ann","Steve")
A
#>       Bob Ann Steve
#> Bob     0   1     1
#> Ann     1   0     1
#> Steve   1   1     0
  • \(A_{ij}=1\) if there is an edge between \(i\) and \(j\)
  • \(A\) is symmetric for undirected networks
  • If \(A_{ij}>1\) then the values are interpreted as weights

Edge List

an edge list is a \(n × 2\) matrix containing a row for each edge where n is the number of nodes in the graph

  • the first column contains the node of origin
  • the second column the node of destination
  • in weighted networks, a third column indicates the edge weight

Edge List

Edge List


el <- matrix(
  c("Bob","Ann",
    "Bob","Steve",
    "Ann","Steve"),
  nrow = 3,ncol = 2, byrow = TRUE)
el
#>      [,1]  [,2]   
#> [1,] "Bob" "Ann"  
#> [2,] "Bob" "Steve"
#> [3,] "Ann" "Steve"

more efficient for sparse data (null edges aren’t stored)

Some Special Graphs

Empty and Complete Graphs

generally in graphs with no loops: number of possible edges is given by

  • undirected: \(\binom{n}{2}= \frac{n(n-1)}{2}\)

  • directed: \(n(n-1)\)


In R

g3 <- make_full_graph(n = 10)
g4 <- make_ring(n = 10)
g5 <- make_empty_graph(n = 10)

ls("package:igraph",pattern = "make_*")

Subgraphs

Induced Subgraph

a subgraph of a graph induced by a subset of nodes is the subset of nodes and all the edges that connect them


Example: subset {4, 5, 6, 7} induces….

:::

Dyads and Triads

  • a dyad consists of any two nodes in a graph and the ties that may (or may not) exist between them

  • a triad consists of any three nodes in a graph and the ties that may (or may not) exist between them

Undirected Dyads and Triads

two different dyads:

four different triads:

Directed Dyads and Triads

three different dyads (MAN)

16 different triads (MAN-labeled)

Cliques

a clique of a graph is any maximal complete subgraph

Cliques

a clique of a graph is any maximal complete subgraph

Cliques

a clique of a graph is any maximal complete subgraph

Cliques

all induced subgraphs of a complete graph are cliques

Cliques

all induced subgraphs of a complete graph are cliques

Cliques

all induced subgraphs of a complete graph are cliques

Degree of Node

the degree of a node is the number of edges attached to it

for directed networks we talk of in-degree and out-degree


out-degree: \([d_1,d_2,d_3,d_4] = [2,0,1,1]\)

in-degree: \([d_1,d_2,d_3,d_4] = [0,3,0,1]\)

Degree of Node

using adjacency matrix instead:

column sums give in-degree


\[ \begin{array}{c|cccc|c} & 1 & 2 & 3 & 4 & \sum \\ \hline 1 & - & 1 & 1 & 0 & \mathbf{2} \\ 2 & 0 & - & 0 & 0 & \mathbf{0} \\ 3 & 0 & 1 & - & 0 & \mathbf{1} \\ 4 & 0 & 1 & 0 & - & \mathbf{1} \\ \end{array} \]

Degree of Node

using adjacency matrix instead:

row sums give out-degree


\[ \begin{array}{c|cccc|c} & 1 & 2 & 3 & 4 & \sum \\ \hline 1 & - & 1 & 1 & 0 & \mathbf{2} \\ 2 & 0 & - & 0 & 0 & \mathbf{0} \\ 3 & 0 & 1 & - & 0 & \mathbf{1} \\ 4 & 0 & 1 & 0 & - & \mathbf{1} \\ \hline \sum & \mathbf{0} & \mathbf{3} & \mathbf{1} & \mathbf{0} & \\ \end{array} \]

Degree of Node

using adjacency matrix instead:

sum over whole matrix gives total number of edges (directed edges)


\[ \begin{array}{c|cccc|c} & 1 & 2 & 3 & 4 & \sum \\ \hline 1 & - & 1 & 1 & 0 & \mathbf{2} \\ 2 & 0 & - & 0 & 0 & \mathbf{0} \\ 3 & 0 & 1 & - & 0 & \mathbf{1} \\ 4 & 0 & 1 & 0 & - & \mathbf{1} \\ \hline \sum & \mathbf{0} & \mathbf{3} & \mathbf{1} & \mathbf{0} & \mathbf{4} \\ \end{array} \]

Degree of Node

out-degree
\(x_{i+} = \displaystyle \sum_j x_{ij}\)

in-degree
\(x_{+j} = \displaystyle \sum_i x_{ij}\)


\[ \begin{array}{c|cccc|c} & {i \rightarrow} \\ j \downarrow & 1 & 2 & 3 & 4 \\ \hline 1 & x_{11} & x_{12} & x_{13} & x_{14} \\ 2 & x_{21} & x_{22} & x_{23} & x_{24} \\ 3 & x_{31} & x_{32} & x_{33} & x_{34} \\ 4 & x_{41} & x_{42} & x_{43} & x_{44} \\ \end{array} \]

Degree Distribution

since we can calculate the degree of every nodes we can list how many nodes have a certain degree d

Degree Distribution


\[ \begin{array}{c|ccccccc|c} & d_1 & d_2 & d_3 & d_4 & d_5 & d_6 & d_7 & \sum \\ \hline d_1 & - & 1 & 0 & 0 & 0 & 0 & 0 & \mathbf{1} \\ d_2 & 1 & - & 1 & 1 & 1 & 0 & 0 & \mathbf{4} \\ d_3 & 0 & 1 & - & 0 & 1 & 0 & 0 & \mathbf{2} \\ d_4 & 0 & 1 & 0 & - & 1 & 0 & 0 & \mathbf{2} \\ d_5 & 0 & 1 & 1 & 1 & - & 1 & 1 & \mathbf{5} \\ d_6 & 0 & 0 & 0 & 0 & 1 & - & 1 & \mathbf{2} \\ d_7 & 0 & 0 & 0 & 0 & 1 & 1 & - & \mathbf{2} \\ \hline \sum & \mathbf{1} & \mathbf{4} & \mathbf{2} & \mathbf{2} & \mathbf{5} & \mathbf{2} & \mathbf{2} & \mathbf{18} \\ \end{array} \]

Degree Distribution


degree 0 1 2 3 4 5 \(\sum = 18\)
# nodes (freq. of degree) 0 1 4 0 1 1 \(\sum = 7\)

Degree Distribution

since we can calculate the degree of every nodes we can list how many nodes have a certain degree d


but what can the degree distribution tell us?

  • is there heterogeneity?
  • is there centralization?
  • “the rich get richer”?
  • etc.

Density of a Graph

the total number of ties present out of the total possible

number of present edges: \(\sum_{i=1}^n d_i /2 = 18/2 = 9\)

number of possible edges: \(\frac{n(n-1)}{2} = \frac{7(6)}{2}= 21\) (complete graph)

density: \(9/21 \approx 0.43\)

Paths

a path from one node a to another node b is an ordered sequence of distinct nodes in which adjacent pairs are linked by an edge


the length of the path is the number of edges traversed


Paths

a path from one node a to another node b is an ordered sequence of distinct nodes in which adjacent pairs are linked by an edge


the length of the path is the number of edges traversed



a path from 1 to 2 of length 1 = {1,2}

Paths

a path from one node a to another node b is an ordered sequence of distinct nodes in which adjacent pairs are linked by an edge


the length of the path is the number of edges traversed



a path from 1 to 4 of length 2 = {1,2,4}

Paths

a path from one node a to another node b is an ordered sequence of distinct nodes in which adjacent pairs are linked by an edge


the length of the path is the number of edges traversed



a path from 1 to 4 of length 3 = {1,3,5,4}

Paths

a path from one node a to another node b is an ordered sequence of distinct nodes in which adjacent pairs are linked by an edge


the length of the path is the number of edges traversed



a path from 1 to 6 of length 4 = {1,2,4,5,6}

Paths, Trails, Walks


  • path: can’t repeat node
    • ex: 1-2-3-4-5-6-7-8
    • not 1-7-2-1-8
  • trail: can’t repeat edge
    • ex: 1-2-3-1-7-8
    • not 1-7-2-1-7-8
  • walk: unrestricted
    • ex: 1-2-3-1-2-7-1-8

Geodesic Distance and Graph Diameter

the geodesic distance between nodes a and b is the length of the shortest path between them (if no such path exists it is infinite)


Geodesic Distance

the geodesic distance between nodes a and b is the length of the shortest path between them (if no such path exists it is infinite)



the geodesic distance is 3 between node 2 and 7

Geodesic Distance

the geodesic distance between nodes a and b is the length of the shortest path between them (if no such path exists it is infinite)



the geodesic distance is 3 between node 1 and 7

Graph Diameter

the maximum geodesic distance is called the diameter of the graph





can you figure out the diameter of the graph?

Graph Diameter

the maximum geodesic distance is called the diameter of the graph


  1. Let \(A\) be your adjacency matrix.
  2. Compute powers \(A^k\) (matrix multiplication). \(A^k[i][j] \neq 0\) means there is a path of length \(≤ k\) from i to j.
  3. Track the smallest k such that all non-diagonal entries become non-zero.
  4. That smallest such k is your diameter.

\(A^k\) tells you which nodes are reachable in exactly k steps.

Reachability and Connectedness

if there is a path from node a to b, then a is reachable from b


if each node of graph G is reachable from every other node, then G is connected


a component of graph G is maximal connected subgraph

Reachability and Connectedness



Reachability and Connectedness



14 is reachable from 9

Reachability and Connectedness



14 is not reachable from 5

Reachability and Connectedness



the graph is not connected

it has 3 components

Cutpoints

a node is a cutpoint if its removal increases the number of components



is 5 a cutpoint?

Cutpoints

a node is a cutpoint if its removal increases the number of components



is 5 a cutpoint?

yes, removing it leads to two components

Cutpoints

a node is a cutpoint if its removal increases the number of components



is 4 a cutpoint?

Cutpoints

a node is a cutpoint if its removal increases the number of components



is 4 a cutpoint?

no, removing it doesn’t increase number of components

Bridges

an edge is a bridge if its removal increases the number of components


is {1,8} a bridge?

Bridges

an edge is a bridge if its removal increases the number of components


is {1,8} a bridge?

yes, removing it leads to two components

Relating to Theory…

  • cohesion,embeddedness
    • clustering and degree to which nodes are embedded in cliques
  • homophily
    • ’birds of the same feather’
  • transitivity
    • open and closed triads ’friend of a friend is my friend’
  • the Matthew effect
    • ’the rich get richer’ and other degree based effects
  • structural holes
    • do some nodes bridge clustered regions?
  • structural balance
    • ’the enemy of my enemy is my friend’
  • influence, selection
    • are people in a particular position because of their ’behaviour’?
    • or the other way around?