Graphs are an excellent way to gain a deeper understanding of large systems of information as they provide us a flexible and intuitive way to generate insight through visualizing the relationships within the data. In this tutorial, we’ll focus specifically on undirected graphs. Both Facebook and LinkedIn connections can be illustrated with undirected graphs because a connection between two people always goes in both directions. Such is the case of the reciprocal nature of these websites (friendships must be mutual, invitations must be accepted, etc.), and unlike platforms such as Twitter where you can follow someone but they don’t necessarily have to return the follow–e.g. Béyonce still hasn’t followed me back.
But what do graphs have to do with matrices, you could ask? Well, in the world of data science, you cannot escape matrices–try as you might! That is because matrices are an excellent way of representing data in a compact manner that your computer and inner statistician will love (HELLO GRAPH ANALYTICS!) So, let’s learn how to take a visually interpretable graph, and give it a compact representation which you can use for generating graph metrics!
Example 1:
We’ll start by creating the following graph with the visNetwork package. To isolate a node and its relationships within the graph, simply click on a node or select it from the drop-down menu in the upper left corner. You can also move the graph and zoom in and out.
library(visNetwork)
# Create nodes dataframe for visNetwork.
nodes <- data.frame (id = 1:6, label = 1:6, color = rep(‘#7D9CB8’, 6))
# Create edges dataframe for visNetwork.
edges <- data.frame(from = c(1, 2, 3, 3, 4, 5), to = c(2, 3, 4, 5, 5, 6))
# Plot network using visNetwork.
visNetwork(nodes, edges) %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
To represent this graph as the adjacency matrix A, we’ll let rows and columns represent nodes, or vertices. For the current example, we’ll have 6 rows (representing nodes 1-6) and 6 columns (again, representing nodes 1-6). We should always have a square matrix! Each entry represents the presence or absence of an edge, or relationship, between the two nodes–1 indicates an edge, 0 no edge.
For the graph above, the adjacency matrix looks like this:
Since there’s an edge going from node 1 to 2, we see a 1 in both (row 1, column 2) and
(row 2, column 1). The lack of directionality in the graph results in a symmetric matrix.
Also notice that the diagonal consists entirely of zeros. That’s because there are no edges from any node to itself. This is an easy way to check for loops!
However, real life often has loops, and nodes can even have more than one edge between them. So, let’s now look at an example with loops and multi-edges.
Example 2:
# Create new edges dataframe for visNetwork.
edgesMessy <- data.frame(from = c(1, 2, 3, 3, 4, 5, 1, 2, 5, 5),
to = c(2, 3, 4, 5, 5, 6, 1, 3, 6, 6))
# Plot network using visNetwork.
visNetwork(nodes, edgesMessy) %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
The adjacency matrix looks as follows:
Notice that a loop is represented as a 2. For undirected graphs, each loop adds 2 since it counts each time the edge meets the node. (If there were two loops for node 1, the entry would be 4.) We can also see that there are two edges between nodes 2 and 3. Therefore, and
are now represented by a 2. The edge number between nodes 5 and 6 has also changed accordingly.
To recap:
- adjacency matrices are always square
- adjacency matrices for undirected graphs are symmetric
- an undirected graph with no loops will have zeros along the diagonal
- each loop in an undirected graph is represented by a 2
- adjacency matrices can account for multi-edges
To download an R notebook containing this lecture and all code, click here.
If you found this article helpful, one-time donations are very much appreciated!
Choose an amount
Or enter a custom amount
Your contribution is genuinely appreciated.
Donate
Awesome post, thanks for the help!
Very helpful. This makes sense to me now!
Wow this helped a lot. Thanks!