Network Concepts & Descriptives I

Social Network Analysis

Termeh Shafie

First some reminders in

Creating simple networks

g1 <- make_graph(c(1,2, 1,3, 2,3, 2,4, 3,5, 4,5), n = 5, dir = FALSE)
g2 <- graph_from_literal(Joey-Chandler:Monica-Ross, Joey-Ross-Rachel)

Special Graphs

g3 <- make_full_graph(n = 10)
g4 <- make_ring(n = 10)
g5 <- make_empty_graph(n = 10)

ls("package:igraph",pattern = "make_*")

Random Graphs

g6 <- sample_gnp(n = 100,p = 0.1)
g7 <- sample_pa(n = 100, power = 1.5, m = 1, directed = FALSE)

ls("package:igraph",pattern = "sample_*")

igraph Objects

g2
#> IGRAPH 7de36f8 UN-- 5 6 -- 
#> + attr: name (v/c)
#> + edges from 7de36f8 (vertex names):
#> [1] Joey    --Chandler Joey    --Monica   Joey    --Ross     Chandler--Ross    
#> [5] Monica  --Ross     Ross    --Rachel


library(netUtils)
str(g2)
#> -----------------------------------------------------------
#> UNNAMED NETWORK (undirected, unweighted, one-mode network)
#> -----------------------------------------------------------
#> Nodes: 5, Edges: 6, Density: 0.6, Components: 1, Isolates: 0
#> -Vertex Attributes:
#>  name(c): Joey, Chandler, Monica, Ross, Rachel ...
#> ---
#> -Edges: 
#>  Joey--Chandler Joey--Monica Joey--Ross Chandler--Ross Monica--Ross
#> Ross--Rachel

Attributes

node attributes

V(g2)$name
#> [1] "Joey"     "Chandler" "Monica"   "Ross"     "Rachel"
V(g2)$gender <- c("M","M","F","M","F") 
# g2 <- set_vertex_attr("gender", c("M","M","F","M","F"))

edge attributes

E(g2)
#> + 6/6 edges from 7de36f8 (vertex names):
#> [1] Joey    --Chandler Joey    --Monica   Joey    --Ross     Chandler--Ross    
#> [5] Monica  --Ross     Ross    --Rachel
E(g2)$weight <- sample(1:5,size = 6, replace = TRUE)
# g2 <- set_edge_attr("weight", sample(1:5,size = 6, replace = TRUE))

Attributes

g2
#> IGRAPH 7de36f8 UNW- 5 6 -- 
#> + attr: name (v/c), gender (v/c), weight (e/n)
#> + edges from 7de36f8 (vertex names):
#> [1] Joey    --Chandler Joey    --Monica   Joey    --Ross     Chandler--Ross    
#> [5] Monica  --Ross     Ross    --Rachel


str(g2)
#> ---------------------------------------------------------
#> UNNAMED NETWORK (undirected, weighted, one-mode network)
#> ---------------------------------------------------------
#> Nodes: 5, Edges: 6, Density: 0.6, Components: 1, Isolates: 0
#> -Vertex Attributes:
#>  name(c): Joey, Chandler, Monica, Ross, Rachel ...
#>  gender(c): M, M, F, M, F ...
#> ---
#> -Edge Attributes:
#>  weight(n): 4, 2, 4, 2, 5, 2 ...
#> ---
#> -Edges: 
#>  Joey--Chandler Joey--Monica Joey--Ross Chandler--Ross Monica--Ross
#> Ross--Rachel

Network Representations: Adjacency Matrix

A <- matrix(
  c(0, 1, 1,
    1, 0, 1,
    1, 1, 0),
  nrow = 3, ncol = 3, byrow = TRUE)
rownames(A) <- c("Bob","Ann","Steve")
colnames(A) <- c("Bob","Ann","Steve")
A
#>       Bob Ann Steve
#> Bob     0   1     1
#> Ann     1   0     1
#> Steve   1   1     0

Network Representation: Edgelist


el <- matrix(
  c("Bob","Ann",
    "Bob","Steve",
    "Ann","Steve"),
  nrow = 3,ncol = 2, byrow = TRUE)
el
#>      [,1]  [,2]   
#> [1,] "Bob" "Ann"  
#> [2,] "Bob" "Steve"
#> [3,] "Ann" "Steve"

more efficient for sparse data (null edges aren’t stored)

Networks from Matrices and Lists

adjacency matrix

graph_from_adjacency_matrix(
  A,
  mode = "undirected",
  weighted = NULL,
  diag = FALSE)
#> IGRAPH bc25b8a UN-- 3 3 -- 
#> + attr: name (v/c)
#> + edges from bc25b8a (vertex names):
#> [1] Bob--Ann   Bob--Steve Ann--Steve

edgelist

graph_from_edgelist(el, directed = FALSE)
#> IGRAPH 4bcdd7a UN-- 3 3 -- 
#> + attr: name (v/c)
#> + edges from 4bcdd7a (vertex names):
#> [1] Bob--Ann   Bob--Steve Ann--Steve
ls("package:igraph",pattern = "graph_from_*")

Reading Network Data


Data is already in R (e.g. networkdata)
No extra work

Data was processed in another SNA tool

read_graph(file, format = c("edgelist", "pajek", "ncol", "lgl",
  "graphml", "dimacs", "graphdb", "gml", "dl"), ...)

Some extra work (with some issues)

Data is in a csv/spreadsheet/..
read.table(), read.csv(), readxl, readr,…

Preparing Network Data

adjacency matrix

Does the matrix have row/col names?

tab <- read.csv(file, header = TRUE, row.names = 1) 
A <- as.matrix(tab)

Is the network directed/undirected?

graph_from_adjacency_matrix(A, mode = c("directed","undirected"))

Is the network weighted/unweighted?

graph_from_adjacency_matrix(A, weighted = c(NULL, TRUE))

Does the network contain loops?

graph_from_adjacency_matrix(A, diag = c(TRUE, FALSE))

Preparing Network Data

edgelist

tab <- read.csv(file, header = c(TRUE, FALSE)) 
el <- as.matrix(tab)

Is the network directed/undirected?

g <- graph_from_edgelist(el, directed = c(TRUE, FALSE))

Is the network weighted/unweighted?

# assuming the weights are the 3rd column in el
g <- graph_from_edgelist(el[,1:2], directed = c(TRUE, FALSE))
E(g)$weight <- el[,3]

Preparing Network Data



some stepping stones

  • Are weights actually weights or different types of edges?
  • You can’t always tell from an edgelist if the network is directed
  • Isolated nodes are lost if an edgelist is used

Preparing Network Data with Attributes

Organize network data in two separate files

from to
Arizona Robbins Leah Murphy
Alex Karev Leah Murphy
Arizona Robbins Lauren Boswell
Arizona Robbins Callie Torres
Erica Hahn Callie Torres
Alex Karev Callie Torres
name sex birthyear
Addison Montgomery F 1967
Adele Webber F 1949
Teddy Altman F 1969
Amelia Shepherd F 1981
Arizona Robbins F 1976
Rebecca Pope F 1975


graph_from_data_frame(el, directed = c(TRUE, FALSE), vertices)

From Graph Theory to Network Concepts to Descriptives

  • Social network concepts and their theoretical underpinnings
  • Span several branches of the social sciences (e.g. sociology, psychology, criminology, economy, etc)

these are crucial for understanding

  • The presence and absence of ties
  • How networks emerge and function
  • Which social processes and phenomena are implied
  • How to connect posed research question to network research design

Homophily

Actors are more likely to connect to other actors with similar attribute


The homophily principle

The phenomena of attribute similarity producing ties among actors

people’s personal networks are homogeneous with regard to many sociodemographic, behavioral, and intrapersonal characteristics [McPherson et. al.,2001, Link to paper]

Homophily

Actors are more likely to connect to other actors with similar attribute


The homophily principle

The phenomena of attribute similarity producing ties among actors

Also know as assortativity

The opposite: heterophily

Homophily

the tendency for people to connect to others with similar

  • behavior/beliefs/attributes

the tendency for people to change their

  • behavior/beliefs/attributes

according to the

  • behavior/beliefs/attributes

of those that they are tied to

Social Selection vs. Social Influence

can we separate social influence and selection for cross-sectional data?

Social Selection vs. Social Influence

YES!

  • by logical reasoning (fixed covariates)

    • for example ethnicity, age, etc
  • by theory

    • we can make an a priori choice

but always be ware of confounders!

  • proximity/social context (foci)
  • contextual factors
  • anything that affects both attributes and ties

Assortativity in

data("greys")

Assortativity in

Measures the level of homophily based on some node labeling or values.


Assortativity based on numerical attribute

assortativity(greys,degree(greys))
#> [1] -0.2900461

A negative value means that nodes with high degree tend to connect to nodes with low degree.

Assortativity in

data("s50")

Assortativity in

Measures the level of homophily based on some node labeling or values.


Assortativity based on nominal attribute

data(s50)

assortativity_nominal(s50[[3]],V(s50[[3]])$smoke)
#> [1] 0.2942478

A high value means that connected nodes tend to have the same labels. In this case the smoking behavior.

Reciprocity and Social Exchange

Reciprocity

when a tie between two actors in a directed network is reciprocated in the reverse direction


crucial concept in social exchange theories:

social processes arising from actors’ cost and benefit analysis of social outcomes in terms of opportunities and constraints

Reciprocity and Social Exchange

  • friendship network: ties are likely reciprocated when it reflects trust and emotional support
  • hierarchical network (e.g.leadership networks): reciprocity is unlikely

Reciprocity and Social Exchange

The norms of reciprocity (Gouldner, 1960) Link to paper

Obligation to reciprocate depends on value of benefit of reciprocating

Benefits are more valued (higher chance of reciprocating) when

  • the recipient is in greater need
  • the donor cannot afford to give the benefit
  • the donor provide the benefit in the absence of self-interest
  • the donor was not required to give the benefit

Reciprocity in

# grooming behavior of monkeys
data("rhesus")
reciprocity(rhesus)
#> [1] 0.7567568

About 76% of edges are reciprocated in the network

The dyad census includes the count of reciprocated ties

dyad_census(rhesus)
#> $mut
#> [1] 42
#> 
#> $asym
#> [1] 27
#> 
#> $null
#> [1] 51

Reciprocity in

Transitivity

Transitivity/Triadic Closure
when looking at three actors A, B and C, if actors A and B are connected, and A and C are connected, then actors B and C are also connected


the friend of my friend is my friend

Triad Census

The triad census consists of a classification of all directed triads into one of 16 different categories. The distribution can be compared against null models to test for the presence of configural biases (e.g., transitivity bias)

Triad Census in

triad_census(rhesus)
#>  [1]  49  72 115  16  12  11  50  50   2   0  54  13  12   7  58  39

Transitivity

Transitivity is also a measure of the degree to which nodes in a graph tend to cluster together. This is also called the clustering coefficient.

local
gives an indication of the embeddedness of single nodes

global
indication of the clustering in the network

\[ \frac{3 \times \text{number of triangles} }{\text{total number of triplets}} \]

Transitivity in

data("coleman")
g <- as.undirected(coleman[[1]])
transitivity(g, type = "global")
#> [1] 0.4400826
round(transitivity(g, type = "local", isolates = "zero"),2)
#>    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
#> 0.40 1.00 0.00 0.50 0.67 1.00 0.00 0.00 0.53 0.00 0.40 1.00 0.33 0.33 0.00 0.33 
#>   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
#> 0.00 0.40 0.33 0.43 0.30 0.38 0.31 0.00 0.00 0.27 0.33 0.00 0.00 1.00 0.17 0.40 
#>   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
#> 0.70 1.00 0.33 0.70 0.67 0.29 0.20 0.20 0.47 0.60 0.27 0.00 0.70 0.50 0.60 0.47 
#>   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
#> 0.67 0.57 0.33 0.57 0.67 0.27 0.36 0.67 0.60 1.00 0.50 0.60 0.67 0.50 0.80 0.70 
#>   65   66   67   68   69   70   71   72   73 
#> 0.60 0.71 0.87 0.24 0.71 0.38 0.49 0.00 0.00

In empirical networks, we often observe a tendency towards high transitivity (“the friend of a friend is a friend”)

What do you think is the (local/global) transitivity of greys?

Transitivity

Transitivity


The value itself is not always enough to judge the level of transitivity of a network. We should also check if it deviates significantly from what would be expected by randomness (more on this later in the course)

deg <- degree(g)
# keep density fixed
mean(replicate(500,transitivity(rewire(g,each_edge(1)),type = "global")))
#> [1] 0.06761784

# keep degree sequence fixed
mean(replicate(500,transitivity(sample_degseq(deg),type = "global")))
#> [1] 0.07526258

Structural Balance


A special case of transitivity applied to signed networks


  • Fritz Heider: underpinnings of structural balance theory (1940s)
  • Dorwin Cartwright & Frank Harary: formalised with graph theory (1950s)

Structural Balance

A triangle is balanced if

  • all ties are positive (“the friend of a friend is a friend”)
  • only one tie is positive (“the enemy of my enemy is my friend”)

The remaining configurations are said to be unbalanced.

Structural Balance

Extension: A network is balanced if i.a., it can be partitioned into two vertex subsets, such that intra-group edges are all positive and inter-group edges are all negative.

Degree distribution


The degree of a node in a network is the number of connections it has to other nodes.

The degree distribution is the probability distribution of the degrees over the whole network.

Empirical degree distributions are generally right skewed:
(many nodes have a few connections and few have many)

  • “preferential attachment”
  • “Matthew effect”
  • “the rich get richer”

Scale Free Networks

A scale-free network is a network whose degree distribution follows a power law (asymptotically). The fraction \(P(k)\) of nodes in the network having degree \(k\) is given by \(P(k) \sim k^\boldsymbol{-\gamma}\)

Degree distribution

Degree distribution are informative about

  • homogeneity/heterogeneity
    the extent to which all actors have similar or dissimilar degrees

  • centralization
    the extent to which the network is dominated by a single actor
    (the extent to which a network looks like a star network)

Degree distribution in


er <- sample_gnp(n = 5000, p = 0.01)
pa <- sample_pa(n = 5000, power = 1.5, m = 2, directed = FALSE)


plot(degree_distribution(er))

plot(degree_distribution(pa),log = "xy")

Density in


The density of a network is defined as the fraction of the potential edges in a network that are actually present.

c(graph.density(make_empty_graph(10)), 
  graph.density(greys), 
  graph.density(make_full_graph(10)))
#> [1] 0.00000000 0.03983229 1.00000000

Small World Phenomenon


One can find a short chain of acquaintances (i.e. a shortest path), often of no more than a handful of individuals, connecting almost any two people on the planet.

  • Milgram’s small-world experiment: 6 (“6 degrees of separation”)
  • Average distance between users on Facebook (2016): 4.57

Small World Experiment (Milgram)


Erdős number

Paul Erdős published ~1525 papers with ~500 collaborators
The Erdős number of an individual is their distance to Paul Erdős in the co-authorship network

Bacon number

The Bacon number of an actor is their distance to Kevin Bacon in the movie co-appearance network.

Oracle of Bacon

Erdős-Bacon-Sabbath number

Zlatan number

The Zlatan number is the distance of football players to Zlatan Ibrahimovic in the “squad network” (blog post)

Diameter

The length of the longest shortest path is called the diameter of the network.

diameter(greys)
#> [1] 8

Shortest Paths in

A shortest path is a path that connects two nodes in a network with a minimal number of edges. The length of a shortest path is called the distance between two nodes.

shortest_paths(greys,from = "Alex Karev",to = "Owen Hunt",output = "vpath")
#> $vpath
#> $vpath[[1]]
#> + 5/54 vertices, named, from f7716f1:
#> [1] Alex Karev         Addison Montgomery Mark Sloan         Teddy Altman      
#> [5] Owen Hunt         
#> 
#> 
#> $epath
#> NULL
#> 
#> $predecessors
#> NULL
#> 
#> $inbound_edges
#> NULL

Shortest Paths in

Distances in

distances(greys)[1:5,1:5]
#>                    Addison Montgomery Adele Webber Teddy Altman Amelia Shepherd
#> Addison Montgomery                  0          Inf            2               2
#> Adele Webber                      Inf            0          Inf             Inf
#> Teddy Altman                        2          Inf            0               2
#> Amelia Shepherd                     2          Inf            2               0
#> Arizona Robbins                     3          Inf            3               3
#>                    Arizona Robbins
#> Addison Montgomery               3
#> Adele Webber                   Inf
#> Teddy Altman                     3
#> Amelia Shepherd                  3
#> Arizona Robbins                  0

The Grey’s Anatomy network is disconnected (4 connected components)

Next Time

  • Centrality

  • Connectivity and Social Cohesion

  • Network Positions and Social Roles