Social Network Analysis
A measure of centrality is an index that assigns a numeric values to the nodes of the network. The higher the value, the more central the node.
“Being central” is a very ambiguous term hence there exists a large variety of indices that assess centrality with very different structural properties.
Degree
Number of direct neighbors (“popularity”)
Closeness
Reciprocal of the sum of the length of the shortest paths
Betweenness
Number of shortest paths that pass through a node (“brokerage”)
Eigenvector
Being central means being connected to other central nodes
PageRank
Similar to eigenvector, just for directed networks
undirected networks
directed networks
can easily be calculated using row/column sums of adjacency matrix
how close a node is to all other nodes?
how often a node lies along the shortest path between two other nodes?
index of potential for
access to diversity of what flows
iterative version of degree centrality
node’s centrality is proportional to sum of centralities of those it has ties to
which node has the highest eigenvector centrality?
which node has the highest eigenvector centrality?
PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
Random surfer
A person following links at random PageRank is related to the probability of a random surfer ending up at a certain page.
Damping factor
At each step, the random surfer follows a link with probability \(d\) or jumps to an unconnected page with probability \(1-d\).
Empirically \(d\sim0.85\)
igraph
contains the following 10 indices:
degree()
)strength()
)betweenness()
)closeness()
)eigen_centrality()
)alpha_centrality()
)power_centrality()
)page_rank()
)eccentricity()
)authority_score()
and hub_score()
)subgraph_centrality()
)The sna
package implements roughly the same indices but adds:
flowbet()
)loadcent()
)gilschmidt()
)infocent()
)stresscent()
)In the context of efficiency, brokerage and social capital
two important and interrelated network theories:
Actors connected via strong ties are more likely in same social environment
Actors connected via strong ties are more likely in same social environment
connected to transitivity
is concerned with ego networks:
networks that consist of a focal actor (the ego)
and the other actors directly connected to the ego (the alters) whilst also including the ties among the alters
the gap arising with an absent tie between two neighbors of an actor
More structural holes
\(\implies\) more opportunities to gain novel information
\(\implies\) more social capital
Can be quantified by betweenness centrality for ego networks
Cohesive subgroups are subsets of actors among whom there are relatively strong, direct, intense, frequent, or positive ties.
Methods that formalize the intuitive and theoretical notion of social group using social network properties
A clique in a network is a set of nodes that form a complete subnetwork within a network (called a complete subgraph).
A maximal clique is a clique that cannot be extended to a bigger clique by addding more nodes to it.
Maximal cliques can be calculated with max_cliques()
# only return cliques with three or more nodes
cl <- max_cliques(clique_graph,min = 3)
cl
#> [[1]]
#> + 3/30 vertices, from 0193e05:
#> [1] 9 17 18
#>
#> [[2]]
#> + 3/30 vertices, from 0193e05:
#> [1] 7 4 5
#>
#> [[3]]
#> + 3/30 vertices, from 0193e05:
#> [1] 7 4 8
#>
#> [[4]]
#> + 3/30 vertices, from 0193e05:
#> [1] 10 2 11
#>
#> [[5]]
#> + 3/30 vertices, from 0193e05:
#> [1] 16 12 15
#>
#> [[6]]
#> + 3/30 vertices, from 0193e05:
#> [1] 6 1 5
#>
#> [[7]]
#> + 4/30 vertices, from 0193e05:
#> [1] 12 13 15 14
#>
#> [[8]]
#> + 3/30 vertices, from 0193e05:
#> [1] 12 2 1
#>
#> [[9]]
#> + 5/30 vertices, from 0193e05:
#> [1] 1 2 5 4 3
A k-core is a subgraph in which every node has at least k neighbors within the subgraph. A k-core is thus a relaxed version of a clique.
Minimum-cut method
cut graph into partitions which minimizes some metric
Hierarchical clustering
Agglomerative/Divisive methods to build a hierarchy of clusters
based on node similarity
Modularity Maximization
Modularity is defined as the fraction of edges that fall within given groups minus the expected fraction if edges were random
Statistical inference
stochastic blockmodeling based on generative models
min_cut(g,value.only = FALSE)
#> $value
#> [1] 5
#>
#> $cut
#> + 5/92 edges from 89bbe6f (vertex names):
#> [1] 1--15 4--15 5--15 6--11 7--16
#>
#> $partition1
#> + 10/20 vertices, named, from 89bbe6f:
#> [1] 15 20 16 19 12 11 17 13 18 14
#>
#> $partition2
#> + 10/20 vertices, named, from 89bbe6f:
#> [1] 1 2 3 4 5 6 7 8 9 10
Really only works in very obvious cases.
ls("package:igraph",pattern = "cluster_")
#> [1] "cluster_edge_betweenness" "cluster_fast_greedy"
#> [3] "cluster_fluid_communities" "cluster_infomap"
#> [5] "cluster_label_prop" "cluster_leading_eigen"
#> [7] "cluster_leiden" "cluster_louvain"
#> [9] "cluster_optimal" "cluster_spinglass"
#> [11] "cluster_walktrap"
# compute clustering
clu <- cluster_louvain(karate)
# cluster membership vector
mem <- membership(clu)
mem
#> [1] 1 1 1 1 2 2 2 1 3 1 2 1 1 1 3 3 2 1 3 1 3 1 3 4 4 4 3 4 4 3 3 4 3 3
# clusters as list
com <- communities(clu)
com
#> $`1`
#> [1] 1 2 3 4 8 10 12 13 14 18 20 22
#>
#> $`2`
#> [1] 5 6 7 11 17
#>
#> $`3`
#> [1] 9 15 16 19 21 23 27 30 31 33 34
#>
#> $`4`
#> [1] 24 25 26 28 29 32
imc <- cluster_infomap(karate)
lec <- cluster_leading_eigen(karate)
loc <- cluster_louvain(karate)
sgc <- cluster_spinglass(karate)
wtc <- cluster_walktrap(karate)
scores <- c(infomap = modularity(karate,membership(imc)),
eigen = modularity(karate,membership(lec)),
louvain = modularity(karate,membership(loc)),
spinglass = modularity(karate,membership(sgc)),
walk = modularity(karate,membership(wtc)))
scores
#> infomap eigen louvain spinglass walk
#> 0.4020381 0.3934089 0.4197896 0.4197896 0.3532216
clu_leiden <- cluster_leiden(K50,objective_function = "CPM",resolution_parameter = 0.5)
table(membership(clu_leiden))
#>
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
#> 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#> 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
cohesive subgroups: dense groups sparsely connected to each other
core-periphery: dense core and a sparse, unconnected periphery
Partition the adjacency matrix into core and periphery such that the number of edges are maximized in the core and the number of edges in the periphery are minimized
Discrete model implemented in the package netUtils
library(netUtils)
data("core_graph")
core_periphery(core_graph)
#> $vec
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> $corr
#> [1] 1
rgraph <- sample_gnp(100,0.2)
core_periphery(rgraph)
#> $vec
#> [1] 0 1 1 0 0 1 0 1 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 0 1
#> [38] 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0
#> [75] 0 1 1 1 0 0 1 1 1 0 0 0 0 0 1 0 1 0 1 1 0 1 0 1 0 1
#>
#> $corr
#> [1] 0.1466399
higher corr
⇒ better fit with an idealized core-periphery structure
How similar are two nodes in terms of patterns of connectivity with others?
Partition the nodes into so called equivalence or similarity classes
Nodes in same equivalence class occupy the same position in the network
Actors are structurally equivalent
if they have the same ties
to the same other actors
if completely substitutable
equivalence classes with more than one actor
Actors are regularly equivalent
In most real-world applications, the standard definition of structural equivalence is much too strong.
We need a measure of position that allows for “more or less” rather than “yes” and “no.”
This is what is called structurally similarity
Two nodes are structurally similar if they have similar patterns of connectivity with the same others.
Outline the structural neighborhood pattern of actors
Decide on a rule which governs the assignment of a zero or one to the relational tie between positions in the model
Image matrix
Reduced graph
Check for certain structural patterns based on the partitioning
(core-periphery, cohesive subgroups or factions, group brokerage, etc)
image matrix
reduced graph
image matrix
reduced graph
Compares the row (or column) vectors of nodes in a graph and returns a number between -1 and +1
Ex. Structural equivalence:
Pairs of structurally equivalent nodes get a +1
Nodes that are completely different from one another (that is connect to completely disjoint sets of neighbors) get a -1
Nodes that have an even combination of similarities and differences in their pattern of connectivity to others get a number close to zero
The correlation distance between two nodes \(k\) and \(l\): \[
d^{\text{corr}}_{k,l} = \frac{\sum_j \left( a_{(k)j} - \bar{a}_{(k)j} \right) \left( a_{(l)j} - \bar{a}_{(l)j} \right)}
{\sqrt{ \sum_j \left( a_{(k)j} - \bar{a}_{(k)j} \right)^2 \sum_j \left( a_{(l)j} - \bar{a}_{(l)j} \right)^2 }}
\] \(a_{(k)j}\) is the row (or column) vector for node \(k\) in the adjacency matrix
\(a_{(l)j}\) is the row (or column) vector for node \(l\) in the adjacency matrix
and \[\bar{a}_{(k)j} = \frac{\sum_j a_{kj}}{N}\]
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
A | 0 | 0 | 1 | 1 | 0 | 0 |
B | 0 | 0 | 1 | 1 | 0 | 0 |
C | 1 | 1 | 0 | 0 | 1 | 0 |
D | 1 | 1 | 0 | 0 | 1 | 0 |
E | 0 | 0 | 1 | 1 | 0 | 1 |
F | 0 | 0 | 0 | 0 | 1 | 0 |
Let’s look at the row vector for node A
0 | 0 | 1 | 1 | 0 | 0 |
\[ \implies \sum_i a_{(A)j} = 0 + 0 + 1 + 1 + 0 + 0 = 2 \]
And we know that \(N = 6\), so row mean is
\[ \overline{a}_{(A)j} = \frac{\sum_i a_{Aj}}{N} = \frac{2}{6}=0.33 \]
Vector of row entries minus the row mean:
(0 - 0.33) | (0 - 0.33) | (1 - 0.33) | (1 - 0.33) | (0 - 0.33) | (0 - 0.33) |
Result of subtracting row mean from row entries:
-0.33 | -0.33 | 0.67 | 0.67 | -0.33 | -0.33 |
Which implies: \[ \sum_j (a_{(A)j} - \overline{a}_{(A)j}) = -0.33 -0.33 + 0.67 + 0.67 -0.33 -0.33 = 0.02 \]
similarly compute the rest…
When we do that for each pair of nodes in the example, we end up with the structural similarity matrix:
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
A | – | 1 | -0.66 | -0.66 | 0.71 | -0.32 |
B | 1 | – | -0.66 | -0.66 | 0.71 | -0.32 |
C | -0.66 | -0.66 | – | 1 | -0.93 | 0.08 |
D | -0.66 | -0.66 | 1 | – | -0.93 | 0.08 |
E | 0.71 | 0.71 | -0.93 | -0.93 | – | -0.45 |
F | -0.32 | -0.32 | 0.08 | 0.08 | -0.45 | – |
what of we want to find the correlation distance of a correlation distance matrix?
second iteration
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
A | – | 1 | -0.94 | -0.94 | 0.97 | -0.65 |
B | 1 | – | -0.94 | -0.94 | 0.97 | -0.65 |
C | -0.94 | -0.94 | – | 1 | -0.98 | 0.47 |
D | -0.94 | -0.94 | 1 | – | -0.98 | 0.47 |
E | 0.97 | 0.97 | -0.98 | -0.98 | – | -0.64 |
F | -0.65 | -0.65 | 0.47 | 0.47 | -0.64 | – |
third iteration
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
A | – | 1 | -1 | -1 | 1 | -0.93 |
B | 1 | – | -1 | -1 | 1 | -0.93 |
C | -1 | -1 | – | 1 | -1 | 0.89 |
D | -1 | -1 | 1 | – | -1 | 0.89 |
E | 1 | 1 | -1 | -1 | – | -0.93 |
F | -0.93 | -0.93 | 0.89 | 0.89 | -0.93 | – |
forth iteration
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
A | – | 1 | -1 | -1 | 1 | -1 |
B | 1 | – | -1 | -1 | 1 | -1 |
C | -1 | -1 | – | 1 | -1 | 1 |
D | -1 | -1 | 1 | – | -1 | 1 |
E | 1 | 1 | -1 | -1 | – | -1 |
F | -1 | -1 | 1 | 1 | -1 | – |
structurally equivalent nodes have exactly the same pattern of +1 and -1 across rows
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
A | -- | 1 | -1 | -1 | 1 | -1 |
B | 1 | -- | -1 | -1 | 1 | -1 |
C | -1 | -1 | -- | 1 | -1 | 1 |
D | -1 | -1 | 1 | -- | -1 | 1 |
E | 1 | 1 | -1 | -1 | -- | -1 |
F | -1 | -1 | 1 | 1 | -1 | -- |
(Breiger, Boorman, and Arabie 1975)