---
title: "Graphic Comparison Of Discrete Kernels" 
output: rmarkdown::html_vignette 
vignette: >
  %\VignetteIndexEntry{Graphic Comparison Of Discrete Kernels} 
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
link-citations: TRUE
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(kernopt)
```

## Introduction

This vignette illustrates some functions provided by the *kernopt* package that implements discrete symmetric kernels, which were recently developed for modeling count data distributions.

The *kernopt* package computes that kernel at a target `x` for various values of observations `z` and a fixed bandwidth `h` by using the function `discrete_kernel()` included in this package.

## Discrete Epanechnikov Kernel

For a positive integer $h\in\mathbb{N}\setminus\{0\}$, the first example is a discrete symmetric *Epanechnikov* kernel with $\mathcal{S}_{x}=\{x,...,x\pm h\}$ given by:

\begin{equation}
K^{Epan}_{x,h}(y)= \left(a\bigg(\frac{x-y}{h}\bigg)^{2} + b\right){\bf 1}_{\{x,x\pm 1,...,x\pm h\}}(y),
\end{equation}

with $a=-b=3h/(1-4h^{2})$ [@chu2017]

That kernel only involves the parameter $h$, which is a positive integer that scales the distance between $x$ and $y$. The discrete *Epanechnikov* kernel is less often used in the literature, compared to its continuous counterpart.

Below is an illustration of the discrete *Epanechnikov* kernel for different values of `h` using the `discrete_kernel()`function.

```{r, fig.cap='Distribution of discrete symmetric Epanechnikov kernel at the target point $x=5$ for various bandwidth parameters $h$.'}
x <- 5 # Target
z <- 0:10 # Observations (?)
h <- c(1, 2, 3, 4) # Set of Bandwidths
K_epan <- matrix(
  data = 0,
  nrow = length(z),
  ncol = length(h)
)
for (i in 1:length(h))
{
  K_epan[, i] <- discrete_kernel(kernel = "epanech", x, z, h[i])
}
plot(
  z,
  K_epan[, 1],
  xlab = "x",
  ylab = "Probability",
  ylim = c(0, 1),
  pch = 1
)
lines(z, K_epan[, 1], lty = 1)
for (i in 2:length(h))
{
  points(z, K_epan[, i], xlab = "z", pch = i)
  lines(z, K_epan[, i], lty = i)
}
legend(
  "topleft",
  c("h=1", "h=2", "h=3", "h=4"),
  lty = 1:4,
  pch = 1:4,
  cex = 1.6
)
```

## Discrete Triangular Kernel

For $h>0$ and non-negative integer $a\in\mathbb{N}$, the second example is the discrete symmetric triangular kernel with $\mathcal{S}_{x}=\{x,x\pm 1,...,x\pm a\}$, which has a pmf defined by \begin{equation*}\label{eq_TriangKern}
K^{Triang}_{a;x,h}(y)= \frac{(a+1)^{h}}{C(a,h)}\left(1 - \frac{|y-x|^{h}}{(a+1)^{h}}\right){\bf 1}_{\{x,x\pm 1,...,x\pm a\}}(y),
\end{equation*} with $C(a,h)=(2a+1)(a+1)^{h} - 2\sum_{j=0}^{a}j^{h}$ [@KokonZocSenga2007]. That kernel depends on two parameters $h$ and $a$. That kernel was implemented in the \pkg{kernopt} package through the function \fct{discrete\_triang} with its specific arguments as follows:

```{r, fig.cap="Distribution of discrete symmetric triangular kernel ($a=1$) at the target point $x=5$ for various bandwidth parameters $h$."}
x <- 5
z <- 0:10
h <- c(0.1, 0.4, 1, 2)
a <- 1

K_trg <- matrix(
  data = 0,
  nrow = length(z),
  ncol = length(h)
)

for (i in 1:length(h))
{
  K_trg[, i] <- discrete_kernel(kernel = "triang", x, z, h[i], a)
}
plot(
  z,
  K_trg[, 1],
  xlab = "x",
  ylab = "Probability",
  ylim = c(0, 1),
  pch = 1
)
lines(z, K_trg[, 1], lty = 1)
for (i in 2:length(h))
{
  points(z, K_trg[, i], xlab = "z", pch = i)
  lines(z, K_trg[, i], lty = i)
}
legend(
  "topleft",
  c("h=0.1", "h=0.4", "h=1", "h=2"),
  lty = 1:4,
  pch = 1:4,
  cex = 1.6
)
```

The discrete symmetric triangular kernel is also available in the *Ake* package in R [@Wansouwe2016].

## Discrete Optimal Kernel
The Discrete Optimal Kernel was proposed in  [@SengaDurrieu2024].

For $x\in \mathcal{S}$ and a fixed integer $k\geq1$, the discrete symmetric "optimal" kernel on $\mathcal{S}_{x}=\{x,x\pm 1,\ldots,x\pm k\}$ that minimises the mean integrate squared error of the estimator $\widehat{f}_{K,h}$ in (\ref{eq:estim_fn}) is given by 
\begin{equation*}\label{eq:OptKern_k}
 K_{k;x,h}^{opt}(y)
 =\lambda\bigg(\frac{3k^{2}+3k-1}{5} -(x-y)^{2} \bigg)+\frac{h}{2k+1}, y\in\mathcal{S}_{x},  
\end{equation*}
where $\lambda=15(1-h)/\big((2k+1)(4k^{2}+4k-3)\big)>0$ and $3/5(1-1/k)<h<1$. 


The bandwidth parameter $h$ of the ``optimal'' kernels $K_{k;x,h}^{opt}$ is bounded with a lower bound $h_{lower}= 3/5(1-1/k)\geq0$, which particularly depends on $k\geq 1$. The distribution of that kernel can be plotted as follows:

```{r, fig.cap='Distribution of discrete symmetric "optimal" kernel ($k=1$) at the target point $x=5$ for various bandwidth parameters $h$.'}
x <- 5
z <- 0:10
h <- c(0.1, 0.4, 0.7, 0.9)
k <- 1

K_opt <- matrix(
  data = 0,
  nrow = length(z),
  ncol = length(h)
)

for (i in 1:length(h))
{
  K_opt[, i] <- discrete_kernel(kernel = "optimal", x, z, h[i], k)
}
plot(
  z,
  K_opt[, 1],
  xlab = "x",
  ylab = "Probability",
  ylim = c(0, 1),
  pch = 1
)
lines(z, K_opt[, 1], lty = 1)
for (i in 2:length(h))
{
  points(z, K_opt[, i], xlab = "z", pch = i)
  lines(z, K_opt[, i], lty = i)
}
legend(
  "topleft",
  c("h=0.1", "h=0.4", "h=0.7", "h=0.9"),
  lty = 1:4,
  pch = 1:4,
  cex = 1.6
)
```


## Graphical comparison of the discrete kernels


```{r}
oldpar <- par(mfrow = c(2, 2), mar = c(1, 1, 1, 1)) # 2 x 2 pictures on one plot
# 1,1 - Optimal (k=1, h) and Epanechnikov (h=1)
plot(
  x = z,
  # y = K_opt[, 1],
  xlab = "z",
  ylab = "Probability",
  ylim = c(0, 1),
  main = "Optimal (k=1, h) and Epanechnikov (h=1)",
  type = "o",
  pch = 1,
  # Symbol
  lty = 1,
  # Line type
  lwd = 2,
  # Line width
  cex = 1.6,
  # Magnification factor
  cex.axis = 1.2,
  # Magnification factor for axis
  cex.lab = 1.2,
  # Magnification factor for label
)

lines(
  z,
  K_opt[, 2],
  type = "o",
  pch = 2,
  # Symbol
  lty = 2,
  # Line type
  lwd = 2,
  # Line width
  col = "grey",
)

lines(
  z,
  K_opt[, 3],
  type = "o",
  pch = 3,
  lty = 3,
  lwd = 2
)


lines(
  z,
  K_epan[, 1],
  type = "o",
  pch = 4,
  lty = 4,
  lwd = 2,
  col = "grey"
)

legend(
  0,
  1,
  c("Opt. h=0.2", "Opt. h=0.7", "Opt. h=0.95", "Epan. h=1"),
  lty = c(1, 2, 3, 4),
  pch = c(1, 2, 3, 4),
  col = c("black", "grey", "black", "grey"),
  lwd = c(2, 2, 2, 2),
  cex = 1.2
)



# Triangular (a=1, h) and Epanechnikov (h=1)
z <- 0:10
x <- 5
a <- 1
h <- c(0.2, 0.7, 0.95)
K_trg <- matrix(0, length(z), length(h))
for (i in 1:length(h))
{
  K_trg[, i] <- discrete_triang(x, z, h[i], a)
}

plot(
  z,
  K_trg[, 1],
  xlab = "z",
  ylab = "Probability",
  main = "Triangular (a=1, h) and Epanechnikov (h=1)",
  ylim = c(0, 1),
  type = "o",
  pch = 1,
  lty = 1,
  lwd = 2,
  cex = 1.6,
  cex.axis = 1.2,
  cex.lab = 1.2
)

lines(
  z,
  K_trg[, 2],
  type = "o",
  pch = 2,
  lty = 2,
  lwd = 2,
  col = "grey"
)

lines(
  z,
  K_trg[, 3],
  type = "o",
  pch = 3,
  lty = 3,
  lwd = 2
)

lines(
  z,
  K_epan[, 1],
  type = "o",
  pch = 4,
  lty = 4,
  lwd = 2,
  col = "grey"
)

legend(
  0,
  1,
  c("Triang. h=0.2", "Triang. h=0.7", "Triang. h=0.95", "Epan. h=1"),
  lty = c(1, 2, 3, 4),
  pch = c(1, 2, 3, 4),
  col = c("black", "grey", "black", "grey"),
  lwd = c(2, 2, 2, 2),
  cex = 1.2
)

#  Optimal (k=2, h) and Epanechnikov (h=2)
# p=k=2
# opt
z <- 0:10
x <- 5
k <- 2
h <- c(0.5, 0.7, 0.95)
K_opt <- matrix(0, length(z), length(h))
for (i in 1:length(h))
{
  K_opt[, i] <- discrete_optimal(x, z, h[i], k)
}


plot(
  z,
  K_opt[, 1],
  xlab = "z",
  ylab = "Probability",
  ylim = c(0, 1),
  main = "Optimal (k=2, h) and Epanechnikov (h=2)",
  type = "o",
  pch = 1,
  lty = 1,
  lwd = 2,
  cex = 1.6,
  cex.axis = 1.2,
  cex.lab = 1.2
)

lines(
  z,
  K_opt[, 2],
  type = "o",
  pch = 2,
  lty = 2,
  lwd = 2,
  col = "grey"
)

lines(
  z,
  K_opt[, 3],
  type = "o",
  pch = 3,
  lty = 3,
  lwd = 2
)

lines(
  z,
  K_epan[, 2],
  type = "o",
  pch = 4,
  lty = 4,
  lwd = 2,
  col = "grey"
)

legend(
  0,
  1,
  c("Opt. h=0.5", "Opt. h=0.7", "Opt. h=0.95", "Epan. h=2"),
  lty = c(1, 2, 3, 4),
  pch = c(1, 2, 3, 4),
  col = c("black", "grey", "black", "grey"),
  lwd = c(2, 2, 2, 2),
  cex = 1.2
)

# Triangular (a=2, h) and Epanechnikov (h=2)
# triangular
z <- 0:10
x <- 5
a <- 2
h <- c(0.5, 0.7, 0.95)
K_trg <- matrix(0, length(z), length(h))
for (i in 1:length(h))
{
  K_trg[, i] <- discrete_triang(x, z, h[i], a)
}


plot(
  z,
  K_trg[, 1],
  xlab = "z",
  ylab = "Probability",
  main = "Triangular (a=2, h) and Epanechnikov (h=2)",
  ylim = c(0, 1),
  type = "o",
  pch = 1,
  lty = 1,
  lwd = 2,
  cex = 1.6,
  cex.axis = 1.2,
  cex.lab = 1.2
)

lines(
  z,
  K_trg[, 2],
  type = "o",
  pch = 2,
  lty = 2,
  lwd = 2,
  col = "grey"
)

lines(
  z,
  K_trg[, 3],
  type = "o",
  pch = 3,
  lty = 3,
  lwd = 2
)

lines(
  z,
  K_epan[, 2],
  type = "o",
  pch = 4,
  lty = 4,
  lwd = 2,
  col = "grey"
)

legend(
  0,
  1,
  c("Triang. h=0.5", "Triang. h=0.7", "Triang. h=0.95", "Epan. h=2"),
  lty = c(1, 2, 3, 4),
  pch = c(1, 2, 3, 4),
  col = c("black", "grey", "black", "grey"),
  lwd = c(2, 2, 2, 2),
  cex = 1.2
)

# Restore option
par(oldpar)
```