--- title: "Graphs with the gRbase package" author: "Søren Højsgaard" output: html_document: toc: true number_sections: true vignette: > %\VignetteIndexEntry{graphs: Graphs with the gRbase package} %\VignettePackage{gRbase} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} --- ```{r include=FALSE} library(knitr) opts_chunk$set( fig.path='fig/GRAPH', tidy=FALSE ) ``` ```{r echo=F} options("width"=80) library(igraph) ps.options(family="serif") ``` # Graphs and Conditional Independence (\#chap:graph) \em As of major version 2 of `gRbase`, that is versions 2.x.y, `gRbase` no longer depends on the packages `graph`, `Rgraphviz`, and `RBGL` packages. Graph functionality in these packages now relies either on the `igraph` package or on graph algorithms implemented in `gRbase`. This document reflects these changes. * As a consequence, this document provides an up-to-date version of Chapter 1 in the book Graphical Models with R (2012); hereafter abbreviated GMwR, see @hojsgaard:etal:12. * This document also reflects that since GMwR was published in 2012, some packages that are mentioned in GMwR are no longer on CRAN. This includes the packages `lcd` and `sna`. * In this document it has been emphasized if a function has been imported from `igraph` or if it is native function from `gRbase` by writing `igraph::this\_function()` and `gRbase::this\_function()` * One notable feature that is not available in this version of `gRbase` are functions related to maximal prime subgraph decomposition. They may be reimplented at a later stage. ## Introduction (\#sec:graph:intro) A graph as a *mathematical* object may be defined as a pair $\cal G = (V, E)$, where $V$ is a set of *vertices* or *nodes* and $E$ is a set of *edges*. Each edge is associated with a pair of nodes, its *endpoints*. Edges may in general be directed, undirected, or bidirected. Graphs are typically visualized by representing nodes by circles or points, and edges by lines, arrows, or bidirected arrows. We use the notation $\alpha-\beta$, $\alpha\to\beta$, and $\alpha\leftrightarrow\beta$ to denote edges between $\alpha$ and $\beta$. Graphs are useful in a variety of applications, and a number of packages for working with graphs are available in \R. In statistical applications we are particularly interested in two special graph types: undirected graphs and directed acyclic graphs (often called DAGs). The \gRbase\ package supplements \igraph\ by implementing some algorithms useful in graphical modelling. \grbase\ also provides two wrapper functions, `ug()` and `dag()` for easily creating undirected graphs and DAGs represented either as `igraph` objects or adjacency matrices. The first sections of this chapter describe some of the most useful functions available when working with graphical models. These come variously from the \gRbase\ and \igraph, but it is not usually necessary to know which. As *statistical* objects, graphs are used to represent models, with nodes representing model variables (and sometimes model parameters) in such a way that the independence structure of the model can be read directly off the graph. Accordingly, a section of this chapter is devoted to a brief description of the key concept of conditional independence and explains how this is linked to graphs. Throughout the book we shall repeatedly return to this in more detail. ## Graphs Our graphs have a finite node set $V$ and for the most part they are *simple graphs* in the sense that they have no loops nor multiple edges. Two vertices $\alpha$ and $\beta$ are said to be *adjacent*, written $\alpha\sim \beta$, if there is an edge between $\alpha$ and $\beta$ in $\cal G$, i.e.\ if either $\alpha - \beta$, $\alpha\to\beta$, or $\alpha\leftrightarrow\beta$. \index{adjacent nodes} \index{simple graphs} In this chapter we primarily represent graphs as `igraph` objects, and except where stated otherwise, the functions we describe operate on these objects. ### Undirected Graphs {#sec:graph:UG} The following forms are equivalent: ```{r } library(gRbase) ug0 <- gRbase::ug(~a:b, ~b:c:d, ~e) ug0 <- gRbase::ug(~a:b + b:c:d + e) ug0 <- gRbase::ug(~a*b + b*c*d + e) ug0 <- gRbase::ug(c("a", "b"), c("b", "c", "d"), "e") ug0 ``` ```{r } plot(ug0) ``` The default size of vertices and their labels is quite small. This is easily changed by setting certain attributes on the graph, see Sect.~\@ref(sec:graph:igraph) for examples. However, to avoid changing these attributes for all the graphs shown in the following we have defined a small plot function \comic{myplot()}. There are also various facilities for controlling the layout. For example, we may use a layout algorithm called `layout.fruchterman.reingold` as follows: ```{r } myplot <- function(x, layout=layout.fruchterman.reingold(x), ...) { V(x)$size <- 30 V(x)$label.cex <- 3 plot(x, layout=layout, ...) return(invisible()) } ``` The graph `ug0i` is then displayed with: ```{r } myplot(ug0) ``` Per default the \comics{ug()}{gRbase} function returns an `igraph` object, but the option `result="matrix"` lead it to return an adjacency matrix instead. For example, ```{r } ug0i <- gRbase::ug(~a:b + b:c:d + e, result="matrix") ug0i ``` Different represents of a graph can be obtained by coercion: ```{r } as(ug0, "matrix") as(ug0, "dgCMatrix") as(ug0i, "igraph") ``` Edges can be added and deleted using `addEdge()` and `removeEdge()` ```{r eval=T} ## Using gRbase ug0a <- gRbase::addEdge("a", "c", ug0) ug0a <- gRbase::removeEdge("c", "d", ug0) ``` ```{r } ## Using igraph ug0a <- igraph::add_edges(ug0, c("a", "c")) ug0a <- igraph::delete_edges(ug0, c("c|d")) ``` The nodes and edges of a graph can be retrieved with \comics{nodes()}{graph} and \comics{edges()}{graph} functions. ```{r eval=T} ## Using gRbase gRbase::nodes(ug0) gRbase::edges(ug0) |> str() ``` ```{r } ## Using igraph igraph::V(ug0) igraph::V(ug0) |> attr("names") igraph::E(ug0) igraph::E(ug0) |> attr("vnames") ``` ```{r } gRbase::maxClique(ug0) ## |> str() gRbase::get_cliques(ug0) |> str() ## Using igraph igraph::max_cliques(ug0) |> lapply(function(x) attr(x, "names")) |> str() ``` A *path* (of length $n$) between $\alpha$ and $\beta$ in an undirected graph is a set of vertices $\alpha=\alpha_0,\alpha_1,\dots,\alpha_n=\beta$ where $\alpha_{i-1}-\alpha_i$ for $i=1,\dots, n$. If a path $n\alpha=\alpha_0,\alpha_1,\dots,\alpha_n=\beta$ has $\alpha=\beta$ then the path is said to be a *cycle* of length $n$. nnn A subset $D \subset V$ in an undirected graph is said to *separate* $A \subset V$ from $B \subset V$ if every path between a vertex in $A$ and a vertex in $B$ contains a vertex from $D$. ```{r } gRbase::separates("a", "d", c("b", "c"), ug0) ``` This shows that $\{b,c\}$ separates $\{a\}$ and $\{d\}$. The graph $\cal G_0=(V_0,E_0)$ is said to be a *subgraph* of $\cal G=(V,E)$ if $V_0\subseteq V$ and $E_0\subseteq E$. For $A \subseteq V$, let $E_A$ denote the set of edges in $E$ between vertices in $A$. Then $\cal G_A=(A, E_A)$ is the \emph{subgraph induced by} $A$. For example ```{r } ug1 <- gRbase::subGraph(c("b", "c", "d", "e"), ug0) ``` ```{r } ug12 <- igraph::subgraph(ug0, c("b", "c", "d", "e")) ``` ```{r } par(mfrow=c(1,2), mar=c(0,0,0,0)) myplot(ug1); myplot(ug12) ``` \index{boundary} \index{neighbours} \index{closure} The *boundary* $\bound(\alpha)=\adj(\alpha)$ is the set of vertices adjacent to $\alpha$ and for undirected graphs the boundary is equal to the set of *neighbours* $\nei(\alpha)$. The *closure* $\clos(\alpha)$ is $\bound(\alpha)\cup \{\alpha\}$. ```{r } gRbase::adj(ug0, "c") gRbase::closure("c", ug0) ``` ### Directed Acyclic Graphs {#sec:graph:DAG} A *directed graph* as a mathematical object is a pair $\cal G = (V, E)$ where $V$ is a set of vertices and $E$ is a set of directed edges, normally drawn as arrows. A directed graph is *acyclic* if it has no directed cycles, that is, cycles with the arrows pointing in the same direction all the way around. A *DAG* is a directed graph that is acyclic. A DAG may be created using the `dag()` function. The graph can be specified by a list of formulas or by a list of vectors. The following statements are equivalent: ```{r } dag0 <- gRbase::dag(~a, ~b*a, ~c*a*b, ~d*c*e, ~e*a, ~g*f) dag0 <- gRbase::dag(~a + b*a + c*a*b + d*c*e + e*a + g*f) dag0 <- gRbase::dag(~a + b|a + c|a*b + d|c*e + e|a + g|f) dag0 <- gRbase::dag("a", c("b", "a"), c("c", "a", "b"), c("d", "c", "e"), c("e", "a"), c("g", "f")) dag0 ``` Note that `\~{ }a} means that \code{"a"` has no parents while `\~{ }d*b*c} means that `"d"` has parents \code{"b"` and `"c"}. Instead of ``\code{*}'', a ``\code{:`'' can be used in the specification. If the specified graph contains cycles then `dag()} returns \code{NULL`. Per default the \comics{dag()}{gRbase} function returns an `igraph` object, but the option `result="matrix"` leads it to return an adjacency matrix instead. ```{r ,echo=T} myplot(dag0) ``` ```{r } gRbase::nodes(dag0) gRbase::edges(dag0) |> str() ``` Alternatively a list of (ordered) pairs can be optained with edgeList() ```{r } edgeList(dag0) |> str() ``` The vpar() function returns a list, with an element for each node together with its parents: ```{r } vpardag0 <- gRbase::vpar(dag0) vpardag0 |> str() vpardag0$c ``` \index{path} \index{parents} \index{children} \index{ancestors} \index{ancestral set} \index{ancestral graph} A *path* (of length $n$) from $\alpha$ to $\beta$ is a sequence of vertices $\alpha=\alpha_0, \dots, \alpha_n=\beta$ such that $\alpha_{i-1}\to\alpha_i$ is an edge in the graph. If there is a path from $\alpha$ to $\beta$ we write $\alpha\mapsto\beta$. The *parents* $\parents(\beta)$ of a node $\beta$ are those nodes $\alpha$ for which $\alpha \rightarrow \beta$. The *children* $\child(\alpha)$ of a node $\alpha$ are those nodes $\beta$ for which $\alpha \rightarrow \beta$. The *ancestors* $\anc(\beta)$ of a node $\beta$ are the nodes $\alpha$ such that $\alpha\mapsto\beta$. The *ancestral set* $\anc(A)$ of a set $A$ is the union of $A$ with its ancestors. The *ancestral graph* of a set $A$ is the subgraph induced by the ancestral set of $A$. ```{r } gRbase::parents("d", dag0) gRbase::children("c", dag0) gRbase::ancestralSet(c("b", "e"), dag0) ag <- gRbase::ancestralGraph(c("b", "e"), dag0) myplot(ag) ``` \index{moralization} An important operation on DAGs is to (i) add edges between the parents of each node, and then (ii) replace all directed edges with undirected ones, thus returning an undirected graph. This operation is used in connection with independence interpretations of the DAG, see Sect.~\@ref(sec:graph:CI), and is known as *moralization*. This is implemented by the \comics{moralize()}{gRbase} function: ```{r } dag0m <- gRbase::moralize(dag0) myplot(dag0m) ``` ### Mixed Graphs {#sec:graph:chaingraphs} \index{mixed graphs} Although the primary focus of this book is on undirected graphs and DAGs, it is also useful to consider *mixed graphs*. These are graphs with at least two types of edges, for example directed and undirected, or directed and bidirected. A sequence of vertices $v_1,v_2, \dots v_{k}, v_{k+1}$ is called a *path* if for each $i=1 \dots k$, either $v_i - v_{i+1}$, $v_i \leftrightarrow v_{i+1}$ or $v_i \rightarrow v_{i+1}$. If $v_i - v_{i+1}$ for each $i$ the path is called *undirected*, if $v_i \rightarrow v_{i+1}$ for each $i$ it is called *directed*, and if $v_i \rightarrow v_{i+1}$ for at least one $i$ it is called *semi-directed*. If $v_i = v_{k+1}$ it is called a *cycle*. \index{directed path} \index{undirected path} \index{path} \index{cycle} \index{semi-directed path} Mixed graphs are represented in the \igraph\ package as directed graphs with multiple edges. In this sense they are not simple. A convenient way of defining them (in lieu of model formulae) is to use adjacency matrices. We can construct such a matrix as follows: ```{r } adjm <- matrix(c(0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0), byrow=TRUE, nrow=4) rownames(adjm) <- colnames(adjm) <- letters[1:4] adjm ``` Note that `igraph` interprets symmetric entries as double-headed arrows and thus does not distinguish between bidirected and undirected edges. However we can persuade `igraph` to display undirected instead of bidirected edges: ```{r } gG1 <- gG2 <- as(adjm, "igraph") lay <- layout.fruchterman.reingold(gG1) E(gG2)$arrow.mode <- c(2,0)[1+is.mutual(gG2)] ``` ```{r } par(mfrow=c(1,2), mar=c(0,0,0,0)) myplot(gG1, layout=lay); myplot(gG2, layout=lay) ``` \index{chain graphs} A *chain graph* is a mixed graph with no bidirected edges and no semi-directed cycles. Such graphs form a natural generalisation of undirected graphs and DAGs, as we shall see later. The following example is from @Frydenberg1990: \setkeys{Gin}{width=0.7\textwidth, height=0.7\textwidth} %sæt figurstørrelse i Sweave ```{r } d1 <- matrix(0, 11, 11) d1[1,2] <- d1[2,1] <- d1[1,3] <- d1[3,1] <- d1[2,4] <- d1[4,2] <- d1[5,6] <- d1[6,5] <- 1 d1[9,10] <- d1[10,9] <- d1[7,8] <- d1[8,7] <- d1[3,5] <- d1[5,10] <- d1[4,6] <- d1[4,7] <- 1 d1[6,11] <- d1[7,11] <- 1 rownames(d1) <- colnames(d1) <- letters[1:11] cG1 <- as(d1, "igraph") E(cG1)$arrow.mode <- c(2,0)[1+is.mutual(cG1)] myplot(cG1, layout=layout.fruchterman.reingold) ``` \setkeys{Gin}{width=0.45\textwidth, height=0.45\textwidth} %sæt figurstørrelse i Sweave \index{chain graph components} \index{component DAG} The *components* of a chain graph $\cal G$ are the connected components of the graph formed after removing all directed edges from $\cal G$. All edges within a component are undirected, and all edges between components are directed. Also, all arrows between any two components have the same direction. The graph constructed by identifying its nodes with the components of $\cal G$, and joining two nodes with an arrow whenever there is an arrow between the corresponding components in $\cal G$, is a DAG, the so-called *component DAG* of $\cal G$, written $\cal G_{C}$. % The \comics{is.chaingraph()}{lcd} function in the \lcd\ package % determines whether a mixed graph is a chain graph. It takes an % adjacency matrix as input. For example, the above graph is indeed a % chain graph: % ```{r eval=F} % library(lcd) % is.chaingraph(as(cG1, "matrix")) % @ % Here `vert.order` gives an ordering of the vertices, from which the % connected components may be identified using `chain.size`. \index{anterior set} \index{anterior graph} \index{moralization} The *anterior set* of a vertex set $S \subseteq V$ is defined in terms of the component DAG. Write the set of components of $\cal G$ containing $S$ as $S_c$. Then the anterior set of $S$ in $\cal G$ is defined as the union of the components in the ancestral set of $S_c$ in $\cal G_{C}$. The *anterior graph* of $S \subseteq V$ is the subgraph of $\cal G$ induced by the anterior set of $S$. The *moralization* operation is also important for chain graphs. Similar to DAGs, unmarried parents of the same chain components are joined and directions are then removed. % The operation % is implemented in the \comics{moralize()}{lcd} function in the % \lcd\ package, which uses the adjacency matrix representation. For % example, % ```{r , eval=F} % ## cGm <- as(moralize(as(cG1, "matrix")), "graphNEL") % cGm <- moralize(cG1) % plot(cGm) % @ % ```{r echo=F, eval=F} % detach(package:lcd) % @ ## Conditional Independence and Graphs {#sec:graph:CI} \index{conditional independence} The concept of statistical independence is presumably familiar to all readers but that of *conditional independence* may be less so. Suppose that we have a collection of random variables $(X_v)_{v \in V}$ with a joint density. Let $A$, $B$ and $C$ be subsets of $V$ and let $X_A=(X_v)_{v \in A}$ and similarly for $X_B$ and $X_C$. Then the statement that $X_A$ and $X_B$ are conditionally independent given $X_C$, written $A \cip B \cd C$, means that for each possible value of $x_C$ of $X_C$, $X_A$ and $X_B$ are independent in the conditional distribution given $X_C=x_c$. So if we write $f()$ for a generic density or probability mass function, then one characterization of $A \cip B \cd C$ is that \[ f(x_A,x_B \cd x_C) = f(x_A \cd x_C) f(x_B \cd x_C). \] An equivalent characterization [@Dawid1998] is that the joint density of $(X_A, X_B, X_C)$ factorizes as \begin{equation} f(x_A,x_B,x_C) = g(x_A,x_C)h(x_B,x_C), {#eqn:graph:factcrit} \end{equation} that is, as a product of two functions $g()$ and $h()$, where $g()$ does not depend on $x_B$ and $h()$ does not depend on $x_A$. This is known as the *factorization criterion*. \index{factorizaton criterion} Parametric models for $(X_v)_{v \in V}$ may be thought of as specifying a set of joint densities (one for each admissible set of parameters). These may admit factorisations of the form just described, giving rise to conditional independence relations between the variables. Some models give rise to patterns of conditional independences that can be represented as an undirected graph. More specifically, let $\cal G=(V,E)$ be an undirected graph with cliques $C_1, \dots C_k$. Consider a joint density $f()$ of the variables in $V$. If this admits a factorization of the form \[ f(x_V) = \prod_{i=1}^k g_i(x_{C_i}) \] for some functions $g_1() \dots g_k()$ where $g_j()$ depends on $x$ only through $x_{C_j}$ then we say that $f()$ factorizes according to $\cal G$. \index{global Markov property} If all the densities under a model factorize according to $\cal G$, then $\cal G$ encodes the conditional independence structure of the model, through the following result (the *global Markov property*): whenever sets $A$ and $B$ are separated by a set $C$ in the graph, then $A \cip B \cd C$ under the model. Thus for example ```{r } myplot(ug0) ``` ```{r } gRbase::separates("a", "d", "b", ug0) ``` shows that under a model with this dependence graph, $a \cip d \cd b$. If we want to find out whether two variable sets are marginally independent, we ask whether they are separated by the empty set, which we specify using a character vector of length zero: ```{r } gRbase::separates("a", "d", character(0), ug0) ``` Model families that admit suitable factorizations are described in later chapters in this book. These include: log-linear models for multivariate discrete data, graphical Gaussian models for multivariate Gaussian data, and mixed interaction models for mixed discrete and continuous data. Other models give rise to patterns of conditional independences that can be represented by DAGs. These are models for which the variable set $V$ may be ordered in such way that the joint density factorizes as follows \begin{equation} {#eq:graph:dagfact} f(x_V) = \prod_{v \in V} f(x_v \cd x_{\parents(v)}) \end{equation} for some variable sets $\{\parents(v)\}_{v \in V}$ such that the variables in $\parents(v)$ precede $v$ in the ordering. Again the vertices of the graph represent the random variables, and we can identify the sets $\parents(v)$ with the parents of $v$ in the DAG. \index{d-separation} With DAGs, conditional independence is represented by a property called *d-separation*. That is, whenever sets $A$ and $B$ are $d$-separated by a set $C$ in the graph, then $A \cip B \cd C$ under the model. The notion of $d$-separation can be defined in various ways, but one characterisation is as follows: $A$ and $B$ are $d$-separated by a set $C$ if and only if they are separated in the graph formed by moralizing the anterior graph of $A \cup B \cup C$. So we can easily define a function to test this: ```{r eval=T} d_separates <- function(a, b, c, dag_) { ##ag <- ancestralGraph(union(union(a, b), c), dag_) ag <- ancestralGraph(c(a, b, c), dag_) separates(a, b, c, moralize(ag)) } d_separates("c", "e", "a", dag0) ``` So under `dag0` it holds that $c \cip e \cd a$. % Alternatively, we can use the function \comics{dSep()}{ggm} in the \ggm\ package: % ```{r } % library(ggm) % dSep(as(dag0, "matrix"), "c", "e", "a") % @ % ```{r echo=F} % detach(package:ggm) % @ Still other models correspond to patterns of conditional independences that can be represented by a chain graph $\cal G$. There are several ways to relate Markov properties to chain graphs. Here we describe the so-called LWF Markov properties, associated with Lauritzen, Wermuth and Frydenberg. For these there are two levels to the factorization requirements. Firstly, the joint density needs to factorize in a way similar to a DAG, i.e. \[ f(x_V) = \prod_{C \in \calC} f(x_C \cd x_{\parents(C)}) \] where $\calC$ is the set of components of $\cal G$. In addition, each conditional density $f(x_C \cd x_{\parents(C)})$ must factorize according to an undirected graph constructed in the following way. First form the subgraph of $\cal G$ induced by $C \cup \parents(C)$, drop directions, and then complete $\parents(C)$ (that is, add edges between all vertices in $\parents(C))$). \index{c-separation} For densities which factorize as above, conditional independence is related to a property called *c-separation*: that is, $A \cip B \cd C$ whenever sets $A$ and $B$ are $c$-separated by $C$ in the graph. The notion of $c$-separation in chain graphs is similar to that of $d$-separation in DAGs. $A$ and $B$ are $c$-separated by a set $C$ if and only if they are separated in the graph formed by moralizing the anterior graph of $A \cup B \cup C$. % The \comics{is.separated()}{lcd} function in the \lcd\ package can % be used to query a given chain graph for $c$-separation. For example, % ```{r eval=F} % library(lcd) % is.separated("e", "g", c("k"), as(cG1,"matrix")) % @ % ```{r echo=F, eval=F} % detach(package:lcd) % @ % \noindent % implies that $e \not \negthinspace \negthinspace \negthinspace \cip g \cd k$ for the chain graph `cG1` % we considered previously. ## More About Graphs ### Special Properties {#sec:graph:properties} A node in an undirected graph is *simplicial* if its boundary is complete. \index{simplicial node} ```{r } gRbase::is.simplicial("b", ug0) gRbase::simplicialNodes(ug0) ``` To obtain the *connected components* of a graph: \index{connected components} ```{r } gRbase::connComp(ug0) |> str() ## Using igraph igraph::components(ug0) |> str() ``` \index{chordless cycles} \index{triangulated graphs} \index{chordal graphs} If a cycle $\alpha=\alpha_0,\alpha_1,\dots,\alpha_n=\alpha$ has adjacent elements $\alpha_i \sim \alpha_j$ with $j \not \in \{i-1,i+1\} $ then it is said to have a *chord*. If it has no chords it is said to be *chordless*. A graph with no chordless cycles of length $\ge 4$ is called *triangulated* or *chordal*: ```{r eval=T} gRbase::is.triangulated(ug0) ``` ```{r } igraph::is_chordal(ug0) ``` Triangulated graphs are of special interest for graphical models as they admit closed-form maximum likelihood estimates and allow considerable computational simplification by decomposition. A triple $(A,B,D)$ of non--empty disjoint subsets of $V$ is said to *decompose* $\cal G$ into $\cal G_{A\cup D}$ and $\cal G_{B\cup D}$ if $V = A \cup B \cup D$ where $D$ is complete and separates $A$ and $B$. \index{decomposition} \index{perfect vertex ordering} ```{r } gRbase::is.decomposition("a", "d", c("b", "c"), ug0) ``` Note that although $\{d\}$ is complete and separates $\{a\}$ and $\{b,c\}$ in `ug0`, the condition fails because $V \neq \{a,b,c,d\}$. \index{decomposable graphs} \index{perfect vertex ordering} \index{maximum cardinality search} A graph is *decomposable* if it is complete or if it can be decomposed into decomposable subgraphs. A graph is decomposable if and only if it is triangulated. An ordering of the nodes in a graph is called a *perfect ordering* if $\bound(i)\cap\{1,\dots,i-1\}$ is complete for all $i$. Such an ordering exists if and only if the graph is triangulated. If the graph is triangulated, then a perfect ordering can be obtained with the *maximum cardinality search* (or mcs) algorithm. The \comics{mcs()}{gRbase} function will produce such an ordering if the graph is triangulated; otherwise it will return `NULL`. ```{r } myplot(ug0) ``` ```{r } gRbase::mcs(ug0) ``` ```{r } igraph::max_cardinality(ug0) igraph::max_cardinality(ug0)$alpham1 |> attr("names") ``` Sometimes it is convenient to have some control over the ordering given to the variables: ```{r } gRbase::mcs(ug0, root=c("d", "c", "a")) ``` Here \comics{mcs()}{gRbase} tries to follow the ordering given and succeeds for the first two variables but then fails afterwards. \index{running intersection property} \index{RIP ordering} \index{separators} The cliques of a triangulated undirected graph can be ordered as $(C_1, \dots, C_Q)$ to have the *running intersection property* (also called a *RIP ordering*). The running intersection property is that $C_j \cap (C_1 \cup \dots \cup C_{j-1}) \subset C_i$ for some $i str() myplot(ug2) ``` ```{r } ug3 <- gRbase::triangulate(ug2) gRbase::is.triangulated(ug3) ``` ```{r } zzz <- igraph::is_chordal(ug2, fillin=TRUE, newgraph=TRUE) V(ug2)[zzz$fillin] ug32 <- zzz$newgraph ``` ```{r } par(mfrow=c(1,3), mar=c(0,0,0,0)) lay <- layout.fruchterman.reingold(ug2) myplot(ug2, layout=lay); myplot(ug3, layout=lay); myplot(ug32, layout=lay) ``` % Recall that an undirected graph $\cal G$ is triangulated (or chordal) if it % has no cycles of length $>= 4$ without a chord. % A graph is triangulated if and only if there exists a perfect ordering of its vertices. % Any undirected graph $\cal G$ can be triangulated by adding edges to % the graph, so called fill--ins, resulting in a graph $\cal G^*$, say. Some of the fill--ins on $\cal G^*$ may be superfluous % in the sense that they could be removed and still give a triangulated graph. A % triangulation with no superfluous fill-ins is called a *minimal* triangulation. In % general this is not unique. This should be distinguished from a *minimum* % triangulation which is a graph with the smallest number of % fill-ins. Finding a minimum % triangulation is known to be NP-hard. The function \comics{minimalTriang()}{gRbase} % finds a minimal triangulation. Consider the following: % \index{minimal triangulation} \index{minimum triangulation} % ```{r mintri, , include=F, eval=F} % G1 <- gRbase::ug(~a:b+b:c+c:d+d:e+e:f+a:f+b:e) % mt1.G1 <- minimalTriang(G1) % G2 <- gRbase::ug(~a:b:e:f+b:c:d:e) % mt2.G1<-minimalTriang(G1, TuG=G2) % par(mfrow=c(2,2)) % plot(G1, sub="G1") % plot(mt1.G1, sub="mt1.G1") % plot(G2, sub="G2") % plot(mt2.G1, sub="mt2.G1") % @ % % \includegraphics*[width=3.5in, height=3.5in]{fig/GRAPH-mintri} % The graph `G1` is not triangulated; % `mt1.G1} is a minimal triangulation of \code{G1`. % Furthermore, `G2} is a triangulation of \code{G1`, but it is not % a minimal triangulation. Finally, `mt2.G1` is a minimal % triangulation of `G1} formed on the basis of \code{G2`. % \index{maximal prime subgraph decomposition} % The *maximal prime subgraph decomposition* of an undirected graph % is the smallest subgraphs into which the graph can be % decomposed. % Consider the following code fragment: % ```{r mps, , include=F, eval=F} % G1 <- gRbase::ug(~a:b+b:c+c:d+d:e+e:f+a:f+b:e) % G1.rip <- mpd(G1) % G1.rip % par(mfrow=c(1,3)) % plot(G1, main="G1") % plot(subGraph(G1.rip$cliques[[1]], G1), main="subgraph 1") % plot(subGraph(G1.rip$cliques[[2]], G1), main="subgraph 2") % @ % @ % ```{r echo=F, eval=F} % pdf(file="fig/GRAPH-mps.pdf",width=8,height=5/1.7) % <> % graphics.off() % @ % % \includegraphics*[width=4in, height=2in]{fig/GRAPH-mps} % Here \verb-G1- is not decomposable but the graph can be % decomposed. The function \comics{mpd()}{gRbase} returns a junction % RIP--order representation of the maximal prime subgraph % decomposition. The subgraphs of \verb-G1- defined by the cliques listed % in \verb-G1.rip- are the smallest subgraphs into which \verb'G1' can % be decomposed. \index{Markov blanket} The *Markov blanket* of a vertex $v$ in a DAG may be defined as the minimal set that $d$-separates $v$ from the remaining variables. It is easily derived as the set of neighbours to $v$ in the moral graph of $\cal G$. For example, the Markov blanket of vertex `e} in \code{dag0` is ```{r eval=F} adj(moralize(dag0), "e") ``` It is easily seen that the Markov blanket of $v$ is the union of $v$'s parents, $v$'s children, and the parents of $v$'s children. % \subsection{Graph Layout in \pkg{Rgraphviz}} % {#sec:graph:layout} % [THIS SECTION HAS BEEN REMOVED] % Although the way graphs are displayed on the page or screen has no % bearing on their mathematical or statistical properties, in practice % it is helpful to display them in a way that clearly reveals their % structure. The \Rgraphviz\ package implements several methods for % automatically setting graph layouts. We sketch these very briefly % here: for more detailed information see the online help files, for % example, type `?dot`. % \begin{itemize} % - The *dot* method, which is default, is intended for drawing % DAGs or hierarchies such as organograms or phylogenies. % - The *twopi* method is suitable for connected graphs: it % produces a circular layout with one node placed at the centre and % others placed on a series of concentric circles about the centre. % - The *circo* method also produces a circular layout. % - The *neato* method is suitable for undirected graphs: an % iterative algorithm determines the coordinates of the nodes so that % the geometric distance between node-pairs approximates their path % distance in the graph. % - Similarly, the *fdp* method is based on an iterative % algorithm due to @Fruchterman1991, in which adjacent nodes are % attracted and non-adjacent nodes are repulsed. % \end{itemize} % The graphs displayed using `Rgraphviz` can also be embellished in % various ways: the following example displays the text in red and fills % the nodes with light grey. % @ % ```{r } % plot(dag0, attrs=list(node = list(fillcolor="lightgrey",fontcolor="red"))) % @ % Graph layouts can be reused: this can be useful, for example to would-be authors of % books on graphical modelling who would like to compare alternative models for the same dataset. % We illustrate how to plot a graph and the graph obtained by removing an edge using the same layout. % To do this, we use the \comics{agopen()}{Rgraphviz} function generate an `Ragraph` object, which is a representation % of the layout of a graph (rather than of the graph as a mathematical object). % From this we remove the required edge. % ```{r , eval=F} % edgeNames(ug3) % ng3 <- agopen(ug3, name="ug3", layoutType="neato") % ng4 <- ng3 % AgEdge(ng4) <- AgEdge(ng4)[-3] % plot(ng3) % @ % ```{r , eval=F} % plot(ng4) % @ % The following example illustrates how individual edge and node attributes may be set. We use the % chain graph `cG1` described above. % \setkeys{Gin}{width=0.7\textwidth, height=0.7\textwidth} %sæt figurstørrelse i Sweave % ```{r , eval=F} % cG1a <- as(cG1, "graphNEL") % nodes(cG1a) <- c("alpha","theta","tau","beta","pi","upsilon","gamma", % "iota","phi","delta","kappa") % edges <- buildEdgeList(cG1a) % for (i in 1:length(edges)) { % if (edges[[i]]@attrs$dir=="both") edges[[i]]@attrs$dir <- "none" % edges[[i]]@attrs$color <- "blue" % } % nodes <- buildNodeList(cG1a) % for (i in 1:length(nodes)) { % nodes[[i]]@attrs$fontcolor <- "red" % nodes[[i]]@attrs$shape <- "ellipse" % nodes[[i]]@attrs$fillcolor <- "lightgrey" % if (i <= 4) { % nodes[[i]]@attrs$fillcolor <- "lightblue" % nodes[[i]]@attrs$shape <- "box" % } % } % cG1al <- agopen(cG1a, edges=edges, nodes=nodes, name="cG1a", layoutType="neato") % plot(cG1al) % @ % \setkeys{Gin}{width=0.45\textwidth, height=0.45\textwidth} %sæt figurstørrelse i Sweave \subsection{The \igraph\ package} {#sec:graph:igraph} It is possible to create `igraph` objects using the \comics{graph.formula()}{igraph} function: ```{r } ug4 <- graph.formula(a -- b:c, c--b:d, e -- a:d) ug4 myplot(ug4) ``` The same graph may be created from scratch as follows: ```{r eval=T} ug4.2 <- graph.empty(n=5, directed=FALSE) V(ug4.2)$name <- V(ug4.2)$label <- letters[1:5] ug4.2 <- add.edges(ug4.2, 1+c(0,1, 0,2, 0,4, 1,2, 2,3, 3,4)) ug4.2 ``` The graph is displayed using the \comics{plot()}{igraph} function, with a layout determined using the `graphopt` method. A variety of layout algorithms are available: type `?layout` for an overview. Note that per default the nodes are labelled $0, 1, \dots$ and so forth. We show how to modify this shortly. As mentioned previously we have created a custom function \comics{myplot()}{gRbase} which creates somewhat more readable plots: ```{r eval=F} myplot(ug4, layout=layout.graphopt) ``` Objects in \pkg{igraph} graphs are defined in terms of node and edge lists. In addition, they have *attributes*: these belong to the vertices, the edges or to the graph itself. The following example sets a graph attribute, *layout*, and two vertex attributes, *label* and *color*. These are used when the graph is plotted. The *name* attribute contains the node labels. ```{r } ug4$layout <- layout.graphopt(ug4) V(ug4)$label <- V(ug4)$name V(ug4)$color <- "red" V(ug4)[1]$color <- "green" V(ug4)$size <- 40 V(ug4)$label.cex <- 3 plot(ug4) ``` Note the use of array indices to access the attributes of the individual vertices. Currently, the indices are zero-based, so that `V(ug4)[1]` refers to the second node (B). (This may change). Edges attributes are accessed similarly, using a container structure `E(ug4)`: also here the indices are zero-based (currently). It is easy to extend `igraph` objects by defining new attributes. In the following example we define a new vertex attribute, `discrete`, and use this to color the vertices. ```{r } ug5 <- set.vertex.attribute(ug4, "discrete", value=c(T, T, F, F, T)) V(ug5)[discrete]$color <- "green" V(ug5)[!discrete]$color <- "red" plot(ug5) ``` A useful interactive drawing facility is provided with the \comics{tkplot()}{igraph} function. This causes a pop-up window to appear in which the graph can be manually edited. One use of this is to edit the layout of the graph: the new coordinates can be extracted and re-used by the \comics{plot()}{igraph} function. For example \begin{verbatim} > tkplot(ug4) 2 \end{verbatim} \includegraphics{figman/tkplot} The \comics{tkplot()}{igraph} function returns a window id (here 2). While the popup window is open, the current layout can be obtained by passing the window id to the \comics{tkplot.getcoords()}{igraph} function, as for example ```{r eval=F} xy <- tkplot.getcoords(2) plot(g, layout=xy) ``` It is straightforward to reuse layout information with `igraph` objects. The layout functions when applied to graphs return a matrix of $(x,y)$ coordinates: ```{r } layout.fruchterman.reingold(ug4) ``` Most layout algorithms use a random generator to choose an initial configuration. Hence if we set the layout attribute to be a layout function, repeated calls to plot will use different layouts. For example, after ```{r } ug4$layout <- layout.fruchterman.reingold ``` repeated invocations of `plot(ug4)` will use different layouts. In contrast, after ```{r } ug4$layout <- layout.fruchterman.reingold(ug4) ``` the layout will be fixed. The following code fragment illustrates how two graphs with the same vertex set may be plotted using the same layout. ```{r samelay, include=T, eval=T} ug5 <- gRbase::ug(~A*B*C + B*C*D + D*E) ug6 <- gRbase::ug(~A*B + B*C + C*D + D*E) lay.fr <- layout.fruchterman.reingold(ug5) ug6$layout <- ug5$layout <- lay.fr V(ug5)$size <- V(ug6)$size <- 50 V(ug5)$label.cex <- V(ug6)$label.cex <- 3 par(mfrow=c(1,2), mar=c(0,0,0,0)) plot(ug5); plot(ug6) ``` %\includegraphics*[width=3.5in, height=3.5in]{fig/GRAPH-samelay} %\includegraphics*{fig/GRAPH-samelay} % @ % ```{r echo=F} % pdf(file="fig/GRAPH-samelay2.pdf",width=8,height=8) % <> % graphics.off() % @ % \includegraphics*[width=3in, height=3in]{fig/GRAPH-samelay2} An overview of attributes used in plotting can be obtained by typing `?igraph.plotting`. A final example illustrates how more complex graphs can be displayed: ```{r , eval=T} em1 <- matrix(c(0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0), nrow=4, byrow=TRUE) iG <- graph.adjacency(em1) V(iG)$shape <- c("circle", "square", "circle", "square") V(iG)$color <- rep(c("red", "green"), 2) V(iG)$label <- c("A", "B", "C", "D") E(iG)$arrow.mode <- c(2,0)[1 + is.mutual(iG)] E(iG)$color <- rep(c("blue", "black"), 3) E(iG)$curved <- c(T, F, F, F, F, F) iG$layout <- layout.graphopt(iG) myplot(iG) ``` % ### 3-D graphs % [THIS SECTION HAS BEEN REMOVED] % The \comics{gplot3d()}{sna} function in the \sna\ package displays a graph in % three dimensions. Using a mouse, the graph can be rotated and % zoomed. Opinions differ as to how useful this is. The following code fragment % can be used to try the facility. First we derive the adjacency matrix of a % built-in graph in the \igraph\ package, then we display it % as a (pseudo)-three-dimensional graph. % ```{r eval=F} % library(sna) % aG <- as(graph.famous("Meredith"),"matrix") % gplot3d(aG) % @ % ### Alternative Graph Representations % {#sec:graph:representations} % [THIS SECTION HAS BEEN REMOVED] % As mentioned above, `graphNEL` objects are so-called because they % use a node and edge list representation. So these can also be created % directly, by specifying a vector of nodes and a list containing the % edges corresponding to each node. For example, % @ % ```{r , eval=F} % V <- c("a","b","c","d") % edL <- vector("list", length=4) % names(edL) <- V % for (i in 1:4) { % edL[[i]] <- list(edges=5-i) % } % gR <- new("graphNEL", nodes=V, edgeL=edL) % plot(gR) % @ % ### Different Graph Representations and Coercion between these % {#sec:graphs:coercion} % Default is that \comic{ug()} and \comic{dag()} return a graph in the % `graphNEL` representation. Alternative representations are % obtained using: % @ % ```{r print=T} % ug_igraph <- ug(~a:b+b:c:d+e, result="igraph") % ug_matrix <- ug(~a:b+b:c:d+e, result="matrix") % @ % It is possible convert between different representations of undirected % graphs and DAGs using \comics{as()}{\grbase}: % @ % ```{r print=T} % ug_NEL <- as(ug_igraph, "graphNEL") % ug_matrix <- as(ug_NEL, "matrix") % ug_igraph2 <- as(ug_matrix, "igraph") % @ ### Operations on Graphs in Different Representations {#sec:graph:querygraph} % The functions for operations on graphs illustrated in the previous sections are all % available for graphs in the `graphNEL` representation (some % operations are in fact available for graphs in the other % representations as well). Notice that the functions differ in whether % they take the graph as the first or as the last argument (that is % mainly related to different styles in different packages). \index{adjacency matrix} The \grbase\ package has a function \comics{querygraph()}{gRbase} which provides a common interface to the graph operations for undirected graphs and DAGs illustrated above. Moreover, \comics{querygraph()}{gRbase} works on graphs represented as `igraph` objects and adjacency matrices. The general syntax is ```{r } args(querygraph) ``` For example, we obtain: ```{r eval=T} ug_ <- gRbase::ug(~a:b + b:c:d + e) gRbase::separates("a", "d", c("b", "c"), ug_) gRbase::querygraph(ug_, "separates", "a", "d", c("b", "c")) gRbase::qgraph(ug_, "separates", "a", "d", c("b", "c")) ``` \bibliographystyle{unsrtnat} \bibliography{./GMwR_strings,./GMwR} \printindex \end{document}