---
title: "Probabilistic Centrality"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{07 probabilistic centrality}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This vignette describes methods to analyse all possible centrality rankings
of a network at once. To do so, a partial rankings as computed
from [neighborhood-inclusion](neighborhood_inclusion.html) or, more general,
[positional dominance](positional_dominance.html) is needed. In this vignette we 
focus on neighborhood-inclusion but note that all considered methods are readily
applicable for positional dominance. For more examples consult the tutorial.

________________________________________________________________________________

## Theoretical Background

Neighborhood-inclusion or induces a partial ranking on the vertices of a graph $G=(V,E)$. 
We write $u\leq v$ if $N(u)\subseteq N[v]$ holds for two vertices $u,v \in V$. 
From the fact that
$$
u\leq v \implies c(u) \leq c(v)
$$
holds for any centrality index $c:V\to \mathbb{R}$, we can characterize the set of all *possible*
centrality based node rankings. Namely as the set of rankings that extend the partial ranking
"$\leq$" to a (complete) ranking.  
\
A node ranking can be defined as a mapping 
$$rk: V \to \{1,\ldots,n\},$$
where we use the convention that $u$ is the top ranked node if $rk(u)=n$ and the 
bottom ranked one if $rk(u)=1$. The set of all possible rankings can then be characterized as
$$
\mathcal{R}(\leq)=\{rk:V \to \{1,\ldots,n\}\; : \; u\leq v \implies rk(u)\leq rk(v)\}.
$$
This set contains **all** rankings that could be obtained with a centrality index.  
\
Once $\mathcal{R}(\leq)$ is calculated, it can be used for a probabilistic assessment of centrality,
analyzing all possible rankings at once. Examples include *relative rank probabilities*
(How likely is it, that a node $u$ is more central than another node $v$?) or
*expected ranks* (How central do we expect a node $u$ to be).  
\
It most be noted though, that deriving the set $\mathcal{R}(\leq)$ quickly becomes 
infeasible for larger networks, and one has to resort to approximation methods.
These and more theoretical details can be found in

> Schoch, David. (2018). Centrality without Indices: Partial rankings and rank
Probabilities in networks. *Social Networks*, **54**, 50-60.([link](https://doi.org/10.1016/j.socnet.2017.12.003))

```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=5,fig.align = 'center')
```

________________________________________________________________________________

## Exact Probabilities in the `netrankr` Package


```{r setup-blind,include=FALSE}
library(Matrix)
```

```{r setup, warning=FALSE,message=FALSE}
library(netrankr)
library(igraph)
library(magrittr)
```

Before calculating any probabilities consider the following example graph and
the rankings induced by various centrality indices, shown as rank intervals 
(consult [this](partial_centrality.html) vignette for details).

```{r pos_dom}
data("dbces11")
g <- dbces11

#neighborhood inclusion 
P <- g %>% neighborhood_inclusion(sparse = FALSE)

#without %>% operator:
# P <- neighborhood_inclusion(g)

cent_scores <- data.frame(
   degree=degree(g),
   betweenness=round(betweenness(g),4),
   closeness=round(closeness(g),4),
   eigenvector=round(eigen_centrality(g)$vector,4),
   subgraph=round(subgraph_centrality(g),4))

plot(rank_intervals(P),cent_scores = cent_scores)
```

Notice how all five indices rank a different vertex as the most central one.  
\
In the following subsections the output of the function `exact_rank_prob()` 
are described which may help to circumvent the potential arbitrariness of index induced rankings.
But first, let us briefly look at all the return values.
```{r calc_probs}
res <- exact_rank_prob(P)
res
```

The function returns an object of type \emph{netrankr_full} which contains the result of a full probabilistic rank analysis.
The specific list entries are discussed in the following subsections.

### Rank Probabilities

Instead of insisting on fixed ranks of nodes as given by indices, we can use *rank probabilities*
to assess the likelihood of certain rank. Formally, rank probabilities are simply defined as
$$
P(rk(u)=k)=\frac{\lvert \{rk \in  \mathcal{R}(\leq) \; : \; rk(u)=k\} \rvert}{\lvert \mathcal{R}(\leq) \rvert}.
$$
Rank probabilities are given by the return value `rank.prob` of the `exact_rank_prob()`
function.

```{r rk_probs}
rp <- round(res$rank.prob,2)
rp
```

Entries `rp[u,k]` correspond to $P(rk(u)=k)$.  
\
The most interesting probabilities are certainly $P(rk(u)=n)$, that is how likely
is it for a node to be the most central.

```{r rk_top}
rp[,11]
```
Recall from the previous section that we found five indices that ranked $6,7,8,10$
and $11$ on top. The probability tell us now, how likely it is to find an index that
rank these nodes on top. In this case, node $11$ has the highest probability to be 
the most central node.

### Relative Rank Probabilities
In some cases, we might not necessarily be interested in a complete ranking of nodes, 
but only in the relative position of a subset of nodes. This idea leads to 
*relative rank probabilities*, that is formally defined as
$$
P(rk(u)\leq rk(v))=\frac{\lvert \{rk \in  \mathcal{R}(\leq) \; : \; rk(u)\leq rk(v)\} \rvert}{\lvert \mathcal{R}(\leq) \rvert}.
$$
Relative rank probabilities are given by the return value `relative.rank` of the `exact_rank_prob()`
function.

```{r rrp}
rrp <- round(res$relative.rank,2)
rrp
```

Entries `rrp[u,v]` correspond to $P(rk(u)\leq rk(v))$.  
\
The more a value `rrp[u,v]` deviates from $0.5$ towards $1$, the more confidence we gain
that a node $v$ is more central than a node $u$.

###Expected Ranks
The *expected rank* of a node in centrality rankings is defined as the expected 
value of the rank probability distribution. That is,
$$
\rho(u)=\sum_{k=1}^n k\cdot P(rk(u)=k).
$$
Expected ranks are given by the return value `expected.rank` of the `exact_rank_prob()`
function.

```{r rk_exp}
ex_rk <- round(res$expected.rank,2)
ex_rk
```

As a reminder, the higher the numeric rank, the more central a node is. In this case,
node $11$ has the highest expected rank in any centrality ranking.