---
title: "Introduction to PairViz"
author: "Catherine B. Hurley and R.W. Oldford"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to PairViz}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```
PairViz is an R package which provides orderings of  objects for visualisation purposes.
The problem of constructing an ordering can be regarded as traversing possibly weighted graphs.
In this vignette, we focus on graphs and their traversal. Applications to data visualisation 
are given in accompanying vignettes.


## PairViz installation
PairViz is installed from CRAN in the usual way.
```{r eval=FALSE}
install.packages("PairViz")
```

PairViz uses the graph data structure provided by the graph package which is located on the Bioconductor repository rather than CRAN. To install this, use
```{r eval=FALSE}
if (!requireNamespace("graph", quietly = TRUE)){
    install.packages("BiocManager")
    BiocManager::install("graph")
}
```






In this vignette, we will also show plots of the graph structure, which requires package
igraph from CRAN. The following utility function will be helpful:


```{r }
requireNamespace("igraph")
if (!requireNamespace("igraph", quietly = TRUE)){
    install.packages("igraph")
}


igplot <- function(g,weights=FALSE,layout=igraph::layout_in_circle, 
                   vertex.size=60, vertex.color="lightblue",...){
    g <- igraph::graph_from_graphnel(as(g, "graphNEL"))
    op <- par(mar=c(1,1,1,1))
    if (weights){
      ew <- round(igraph::get.edge.attribute(g,"weight"),3)  
      igraph::plot.igraph(g,layout=layout,edge.label=ew,vertex.size=vertex.size,vertex.color=vertex.color,...)
    }
    else
    igraph::plot.igraph(g,layout=layout,vertex.size=vertex.size,vertex.color=vertex.color,...)
    par(op)
}
```
Alternatively, graphs may be plotted using package Rgraphviz
which is installed from Bioconductor:

```{r eval=FALSE}

if (!requireNamespace("Rgraphviz", quietly = TRUE)){
    install.packages("BiocManager")
    BiocManager::install("Rgraphviz")
}

# For a graph g use
plot(g)
```


## Graphs 

A graph is a mathematical structure made up of nodes, and edges joining those nodes.
We consider only graphs where edges are undirected. In some cases, edges are 
assigned weights, reflecting some measure of importance.

A complete graph is a graph where there is an edge connecting every pair of nodes. Here are two such
complete graphs, k4 with four nodes and k5 with five nodes.

```{r fig.show='hold'}
suppressPackageStartupMessages(library(PairViz))
k4 <- mk_complete_graph(4)
k5 <- mk_complete_graph(5)
igplot(k4)
igplot(k5)
```

## Traversing graph edges

We will first construct graph traversals that visit every edge of the graph at least once. The PairViz function
`eulerian()` gives such traversals:
```{r}
eulerian(k5)
```
Mathematically speaking, an Eulerian path of a graph is a path that visits every edge of a graph exactly once. 
An Eulerian tour of a graph is an Eulerian path that is closed, i.e. ends up at the starting node.
For the graph `k5`, one such Eulerian tour  goes from 1 ->2 -> 3 -> 1 and so  on until it ends back at node 1,
as given by `eulerian(k5)`. It is well-known that a graph has an Eulerian tour if every node has an even number of edges. This condition holds for a complete graph with an odd number of nodes, such as `k5`.

Similarly, a graph has an Eulerian path if all but two nodes have an even number of edges.
If we remove one edge, say that connecting nodes 1 and 2 from `k5`, these two nodes have
an odd number of edges while other nodes are even.
```{r fig.show='hold', fig.align='center'}
k5a <- removeEdge("1", "2", k5)
igplot(k5a)
eulerian(k5a)
```
The Eulerian for `k5a` starts at one of the odd nodes (here "1") and visits all edges ending at "2", the
other odd node.

Most graphs are not Eulerian, that is they do not meet the conditions for an Eulerian path to exist. The graph `k4`
for instance, has four nodes and all have three edges. In this case, any path visiting all edges must
visit some edges more than once. This is what `eulerian(k4)` does:
```{r}
eulerian(k4)
```
If you look closely you will see the edge connecting nodes "3" and "4" is visited twice.
An other way to think of it is that extra edges need to be added to
a graph so that all but two nodes become even. For `k4` only one extra edge is needed, and
in the example above that extra edge connects nodes "3" and "4".

In summary, for any graph, the function `eulerian()` returns a path visiting all edges at least once. 
If the graph is Eulerian, the returned path visits each edge  exactly once. Otherwise, some edges are visited twice. For more details of this algorithm, see Hurley and Oldford (2010, 2011).




## Edge traversal of a weighted graph

First construct a graph with weighted edges.
```{r}
n <- LETTERS[1:5]
g <- new("graphNEL",nodes=n)
efrom <- n[c(1,1,2,2,2,4)]
eto <- n[c(2:3,3:5,5)]
ew <- c(8,9,5:7,1)
g <- addEdge(efrom, eto, g, ew) 
```



To plot the graph use
```{r,fig.align="center"}
igplot(g, weights=TRUE,edge.label.color="black")
```


```{r}
eulerian(g)
```
As the graph `g` is weighted, the eulerian algorithm starts at the edge with the lowest weight, and
then proceeds to visit every edge, preferring lower weight edges whenever there is a choice of
edge to be visited next. If the `eulerian()` function is called with the option `weighted=FALSE`, then
weights are ignored. 
```{r}
eulerian(g, weighted=FALSE)
```
In this case, the Eulerian path visits edges going next to the first available node
in `nodes(g)`.


## Edge traversal of a complete graph


For complete  graphs, PairViz includes some alternative constructions of Eulerian (or nearly Eulerian) paths.

- The function `eseq()` uses a recursive algorithm,  and forms a path on 1,..,n  by appending extra edges on to the tour on 1,...,(n-2). For more details of the `eseq` algorithm, see Hurley and Oldford (2011).

- The function `eseqa()` gives a different sequence. For odd n, the tour starts at 1,
then takes steps of size 1,2,...,m repeatedly, where m is (n-1)/2. For even n, the path constructed
is formed as `eseqa(n+1)`, followed by dropping node n+1.  (Note that any permulation of the
step sizes 1,2,...,m would do just as well).

Two other functions are

- `kntour_drop()` takes an Euler tour on the complete graph with nodes 1,2,..n as input for odd n, and 
removes all occurences of n from the sequence. The result is a nearly Eulerian path on the
complete graph with n-1 nodes.

- `kntour_add()` takes an Euler tour on the complete graph with nodes 1,2,..n as input for odd n, and 
appends a path joining node n+1 to each of nodes 1,2,...,n . The result is a nearly Eulerian path on the
complete graph with n-1 nodes.



## Edge traversal of a complete graph via Hamiltonian decompositions
A Hamiltonian path of a graph is a path that visits every node of a graph exactly once.
For a general graph, determining whether such a path exists is difficult, and
in fact is an NP-hard problem. For a complete graph of n nodes, the path 1,2,...,n
is a Hamiltonian path.

Consider a complete graph with an odd number of nodes such as `k5`.
```{r fig.show='hold'}
ew <- rep(1, length(edgeNames(k5)))
s1 <- c(1,5,9,10)
ew[s1]<- 5
ec <- rep("grey40", length(ew))
ec[s1]<- "cyan"
igplot(k5, edge.width=ew, edge.color=ec)

ew[]<- 1
s2 <- c(2,6:8)
ew[s2]<- 5
ec[] <- "grey40"
ec[s2]<- "magenta"
igplot(k5, edge.width=ew, edge.color=ec)
```

The blue hamiltonian path visits the nodes in order 1,2,3,5,4.
The pink hamiltonian path visits the nodes in order 1,3,4,2,5.
For a complete graph with an odd number of nodes such as `k5`, it is possible to
construct an Eulerian tour by concatenating together a number of Hamiltonians.
This construction is known as a Hamiltonian decomposition.

```{r fig.align='center'}
ew[]<- 5
ec[s1]<- "cyan"
s3 <- 3:4
ec[s3]<- "rosybrown1"
igplot(k5, edge.width=ew, edge.color=ec)
igplot(k5, edge.width=ew, edge.color=ec)
```
In the figure above, the light red edge from 4 to 1 connects the two hamiltonians,
and the remaining light red edge from 5 to 1 connects the last node of the pink path to
the first node from the blue path to complete the tour.

In PairViz, the two hamiltonian paths above are constructed by
```{r}
hpaths(5)
```
If you want the eulerian composed of the two hamiltonians, use
```{r}
hpaths(5, matrix=FALSE)
```

The above constructions produce paths on complete graphs with an even number of nodes
which are nearly eulerian.
Look at
```{r}
hpaths(6)
hpaths(6, matrix=FALSE)
```
Each row of the result of hpaths produces a hamiltonian on k6 (a complete graph with 6 nodes). When the hamiltonians
are concatenated, all edges on k6 are visited.
With a close inspection, we see the edges connecting the rows (here 4-2 and 5-3)
are visted twice.
There is no need to connect the first node of row 1 to the last node of row 3, as
this edge has already been visited.

Other isomorphic decompositions are obtained by supplying the first Hamiltonian as an argument to hpaths. Then all node labels are rearranged accordingly, as in the code snippet below.
```{r}
hpaths(1:5)
```


## Edge traversal of a weighted complete graph via hamiltonian decompositions

<!-- Consider the complete graph k7 whose edge weights are given by -->
<!-- ```{r, fig.align="center"} -->
<!-- set.seed(123) -->
<!-- k7 <- mk_complete_graph(7) -->
<!-- em <- edgeMatrix(k7) -->
<!-- ew <- sample(ncol(em),ncol(em)) -->
<!-- edgeData(k7, as.character(em[1,]), as.character(em[2,]), "weight") <- ew -->
<!-- igplot(k7, edge.label=ew,edge.label.color="black") -->
<!-- ``` -->

<!-- We could just as well represent the graph k7 by a distance matrix: -->
<!-- ```{r} -->
<!-- d7 <- matrix(0,7,7) -->
<!-- d7[t(em)]<- ew -->
<!-- d7[t(em)[,2:1]]<- ew -->
<!-- d7 -->
<!-- ``` -->

Consider the complete graph `k7` whose edge weights are given by
```{r}
set.seed(123)
k7 <- mk_complete_graph(7)
ew <- sample(numEdges(k7),numEdges(k7)) # a vector of edgeweights
```

The easiest way to assign weights to edges is to use a distance matrix:
```{r}
d7 <- matrix(0,7,7)
d7[lower.tri(d7)] <- ew
d7[upper.tri(d7)]<-  t(d7)[upper.tri(d7)]
d7 
# or using the shortcut function edge2dist from PairViz
#d7 <- as.matrix(edge2dist(ew))
```
          

Now `d7` is a symmetric matrix of distances, where
the values in `ew` specify the distances in order 1-2, 1-3, ...., 1-7,
2-7,..., 3-7, ...,6-7.

```{r fig.align="center",fig.width=6, fig.height=6}
k7 <- mk_complete_graph(d7)
igplot(k7, weights=TRUE,edge.label.color="black", vertex.label.cex=2,vertex.size=30)

# Unfortunately, plot.igraph does not show graph edge weights  automatically, you have to 
# input them as above. You might want to check that the igraph
# matches that of ew.
igraph::E(igraph::graph_from_graphnel(k7))

```



Like `hpaths()`, the function `weighted_hpaths()` finds Hamiltonians which when concatenated
visit all edges at least once.
However `weighted_hpaths()` uses a greedy algorithm to make the first
Hamiltonian low weight with weights increasing for successive Hamiltonians.


```{r}
weighted_hpaths(d7)
# this version returns the eulerian
weighted_hpaths(d7, matrix=FALSE)
```
We can easily calculate the edge weights and their sum for the first
Hamiltonian.




```{r}
o <- weighted_hpaths(d7, matrix=FALSE)
o1 <- o[1:8] # include the 8th to form the tour
d7e <- dist2edge(d7) 
# d7e is a vector giving edge weights in order (1,1)... (1,7), (2,3),.. (2,7) etc
h1weights <- path_weights(d7e, o1) # the edge weights for o1
# the same as
d7[cbind(head(o1,-1), o1[-1])]

h1weights
sum(h1weights)
```
The second and third Hamiltonians have edges whose sums are
```{r}
o2 <- o[8:15]
sum(path_weights(d7e, o2))
o3 <- o[15:22]
sum(path_weights(d7e, o3))
```

The `weighted_hpaths()` function uses `order_tsp()`, a TSP solver (by default nearest insertion) to come up with
the first Hamiltonian. This algorithm uses heuristics, and may
not find the overall winner.
As the graph has just 7 nodes, it is possible
to use brute force evaluation to find the shortest hamiltonian tour.

```{r}
order_best(d7,cycle=TRUE)
order_tsp(d7,cycle=TRUE)
```
In this example, we can confirm that `order_tsp` finds the shortest Hamiltonian tour.
As `order_best` is computationally highly demanding,
if you try it for graphs with
more that `maxexact` (defaults to 9) nodes, only a sample of permutations is evaluated.


## Comparisions of edge traversals on an unweighted complete graph
Consider the graph `k7`.
At his stage, we have four algorithms for forming Eulerians which do not use weights.
These are
```{r}
e1 <- eseq(7)
e2 <- eseqa(7)
e3 <- eulerian(7) # same path as eulerian(k7, weighted=FALSE)
h1 <- hpaths(7, matrix=FALSE)
```




```{r  fig.width=7, fig.height=6, echo=FALSE}
par(mfrow=c(2,2))
par(mar=c(2,2,3,1))

plot(e1, type="n",main="eseq(7)")
lines(e1[1:4], col="tan1", lwd=1.5)
lines(4:11,e1[4:11], col="cyan", lwd=1.5)
lines(11:22,e1[11:22], col="magenta", lwd=1.5)

points(e1, pch=20); grid()

plot(e2, type="n",main="eseqa(7)")
e2x <- rep(NA, length(e2))
gap1 <- seq(1, length(e2), by=3)
gap1 <- sort(c(gap1, gap1+1))

e2x[gap1]<- e2[gap1]
lines(e2x, col="tan1", lwd=1.5)

e2x <- rep(NA, length(e2))
gap2 <- gap1+1
e2x[gap2]<- e2[gap2]
lines(e2x, col="cyan", lwd=1.5)

e3x <- rep(NA, length(e2))
gap3 <- gap2+1
e3x[gap3]<- e2[gap3]
lines(e3x, col="magenta", lwd=1.5)
points(e2, pch=20);grid()


plot(e3, type="n", main="eulerian(7)")


e3x <- rep(NA, length(e3))
#indy <- which(head(e3,-1) <= 1 | e3[-1] <= 1)
indy <- which(pmin(head(e3,-1), e3[-1]) <= 1)
e3x[indy] <- e3[indy]
e3x[indy+1] <- e3[indy+1]
lines(e3x, col="tan1", lwd=1.5)


e3x <- rep(NA, length(e3))
#indy <- which(head(e3,-1) %in% c(2,3) | e3[-1] %in% c(2,3))
indy <- which(pmin(head(e3,-1), e3[-1]) %in% c(2,3))

e3x[indy] <- e3[indy]
e3x[indy+1] <- e3[indy+1]
lines(e3x, col="cyan", lwd=1.5)



e3x <- rep(NA, length(e3))
#indy <- which(head(e3,-1) >=4 & e3[-1] >=4)
indy <- which(pmin(head(e3,-1), e3[-1]) >=4)
e3x[indy] <- e3[indy]
e3x[indy+1] <- e3[indy+1]
lines(e3x, col="magenta", lwd=1.5)

points(e3, pch=20);grid()

plot(h1, type="n", main="hpaths(7)"); 
lines(1:8,h1[1:8], col="cyan", lwd=1.5); lines(8:15,h1[8:15], col="magenta", lwd=1.5)
lines(15:22,h1[15:22], col="tan1", lwd=1.5)
points(h1, pch=20);grid()
```

The plot of `eseq(7)` is coloured to show its recursive construction. `eseq(3)`
is in tan, the additional edges added for `eseq(5)` are in blue, while those for `eseq(7)` are in pink.

The plot of `eseqa(7)` shows its construction; edges are a distance 1 (in tan), 2 (in blue) and 3
(in pink) apart repeatedly, considering the last node 7 and the first to be a distance 1 apart.

In `eulerian(7)`, edges connecting to node 1 are in tan, edges from nodes 2 and 3 are in blue
excepting those connecting node 1, while remaining edges involving nodes 4,5, 6 and 7 
only are in pink. This display illustrates that the eulerian algorithm always
moves to the lowest available node.

In the plot of `hpaths(7)` the three concatenated hamiltonians are coloured blue, pink and
tan.

From these displays, if you want an eulerian visiting the
low numbered nodes first, use `eseq()` or `eulerian()`. 
If you want an Eulerian where visits to a node are spread out, pick `eseqa()` or `hpaths()`.
And of course, if the Hamiltonian property is important, then `hpaths()` is the best choice.

## Comparisions of edge traversals on an weighted complete graph


We have two different Eulerians for an edge-weighted `k7`.
These are
```{r}
e4 <- eulerian(d7) # same path as eulerian(k7)
h2 <- weighted_hpaths(d7, matrix=FALSE)
```
which are plotted below.
The colouring scheme used here is the same as that for the plots
of `eulerian(7)` and `hpaths(7)`.

```{r  fig.width=7, fig.height=3, echo=FALSE}
par(mfrow=c(1,2))
par(mar=c(2,2,3,1))



plot(e4, type="n", main="eulerian(d7)")

e4x <- rep(NA, length(e4))
indy <- which(pmin(head(e4,-1), e4[-1]) %in% c(2,3))
e4x[indy] <- e4[indy]
e4x[indy+1] <- e4[indy+1]
lines(e4x, col="cyan", lwd=1.5)

e4x <- rep(NA, length(e4))
indy <- which(pmin(head(e4,-1), e4[-1]) <= 1)

e4x[indy] <- e4[indy]
e4x[indy+1] <- e4[indy+1]
lines(e4x, col="tan1", lwd=1.5)

e4x <- rep(NA, length(e4))
indy <-  which(pmin(head(e4,-1), e4[-1]) >=4)

e4x[indy] <- e4[indy]
e4x[indy+1] <- e4[indy+1]
lines(e4x, col="magenta", lwd=1.5)



points(e4, pch=20);grid()

plot(h2, type="n", main="weighted_hpaths(d7)"); 
lines(1:8,h2[1:8], col="cyan",lwd=1.5); lines(8:15,h2[8:15], lwd=1.5,col="magenta")
lines(15:22,h2[15:22], col="tan1", lwd=1.5)
points(h2, pch=20);grid()
```

Recall that the goal of our traversals of edge-weights graphs was to visit
low-weight edges early in the path.
To check this, we can find the edges weights
for the paths using

```{r}
d7e <- dist2edge(d7) 
path_weights(d7e, e4) # the edge weights for e4
path_weights(d7e, h2) # the edge weights for h2
```

Plotting these, we see that the edge weights for both `e4` and `h2`
increase as the index increases.
The path based on Hamiltonians is less successful at ordering the edge weights
as it must visit a sequence of Hamiltonians. The edge weights for the first Hamiltonian
are in blue, for the second in magenta and the third in tan.

```{r  fig.width=7, fig.height=3, echo=FALSE}
par(mfrow=c(1,2))
par(mar=c(2,2,3,1))

e4w <- path_weights(d7e, e4)
plot(e4w, type="l", col="grey80", main="edge weights of eulerian(d7)", ylab="weight")
points(e4w, pch=21, bg="cyan");grid()

h2w <- path_weights(d7e, h2)
plot(h2w, type="l", col="grey80", main="edge weights of hpaths(d7)", ylab="weight")

points(1:7,h2w[1:7], pch=21, bg="cyan", col="grey50")
points(8:14,h2w[8:14], pch=21, bg="magenta", col="grey50")
points(15:21,h2w[15:21], pch=21, bg="tan1",col="grey50")
grid()

```

## References
C.B. Hurley and R.W. Oldford,
Pairwise display of high dimensional information via 
Eulerian tours and Hamiltonian decompositions. Journal of Computational and Graphical Statistics.
19(10), pp. 861--886, 2010.

C.B. Hurley and R.W. Oldford, 
Eulerian tour algorithms for data visualization and the PairViz package. 
Computational Statistics, 26(4), pp 613--633, 2011. 



<!-- You can enable figure captions by `fig_caption: yes` in YAML: -->

<!--     output: -->
<!--       rmarkdown::html_vignette: -->
<!--         fig_caption: yes -->

<!-- Then you can use the chunk option `fig.cap = "Your figure caption."` in **knitr**. -->

<!-- ## More Examples -->

<!-- You can write math expressions, e.g. $Y = X\beta + \epsilon$, footnotes^[A footnote here.], and tables, e.g. using `knitr::kable()`. -->

<!-- ```{r, echo=FALSE, results='asis'} -->
<!-- knitr::kable(head(mtcars, 10)) -->
<!-- ``` -->

<!-- Also a quote using `>`: -->

<!-- > "He who gives up [code] safety for [code] speed deserves neither." -->
<!-- ([via](https://twitter.com/hadleywickham/status/504368538874703872)) -->