---
title: "PPQ Power Assessment Theoretical Results"
author: "Yalin Zhu  (yalin.zhu@merck.com)"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{PPQ Power Assessment Theoretical Results}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  TeX: { equationNumbers: { autoNumber: "AMS" } }
});
</script>

# Preliminaries

Without loss of generality, suppose $n$ outcomes of the Critical Quality Attribute (CQA) are normally distributed, which is denoted by $X_i \stackrel{\text{i.i.d}}{\sim} \mathcal{N}(\mu, \sigma^2)$, where $i=1,\dots, n$, then the distributions of sample mean and standard deviation are as known: \begin{equation}
\bar{X} \sim \mathcal{N}(\mu, \frac{\sigma^2}{n})
\end{equation} 
and 
\begin{equation} \label{diststd}
\dfrac{(n-1)S^2}{\sigma^2} \sim \chi^2 (n-1).
\end{equation}
Moreover, sample mean and sample standard deviation are independent under normal distribution assumption. 

Denote the lower and upper specification limits as $L$ and $U$, respectively. The prediction or tolerance interval can be expressed by 
\begin{equation} 
\left[ Y_1, Y_2 \right] = \left[ \bar{X} - kS, \ \bar{X} + kS \right] ,
\end{equation}
where $k$ is a specific multiplier for the interval. For example, for prediction interval, $k=t_{1-\alpha/2,n-1}\sqrt{1+\frac{1}{n}}$.

# Specification test for one release batch

The outcome at release can be any one of the sample, so $X_{rl} \sim \mathcal{N}(\mu, \sigma^2)$, then the probability of passing PPQ at release should be


\begin{equation}
\begin{split}
\Pr(\text{Passing Specification for Release}) & = \Pr(L \le X_{rl} \le U) \\
& = \Phi (U) - \Phi(L)
\end{split}
\end{equation}

This probability is very easy to calculate using software, such as `pnorm()` in R.

# Test for PPQ Batches

\begin{equation} \label{probpass}
\begin{split}
\Pr(\text{Passing a Single PPQ Batch}) & = \Pr(L \le Y_1 \le Y_2 \le U) \\
& = \int_{L}^{U} \int_{L}^{y_2}f_{Y_1,Y_2}(y_1, y_2) dy_1 dy_2
\end{split}
\end{equation}

Now it is essential to obtain the bivariate joint distribution of the lower and upper prediction/tolerance interval, that is, find joint probability density function (PDF) $f_{Y_1,Y_2}(y_1,y_2)$. 

Since $Y_1=\bar{X} - k S$ and  $Y_2=\bar{X} + k S$, we can use another bivariate PDF $f_{\bar{X},S}(x,s)$ to calculate $f_{Y_1,Y_2}(y_1,y_2)$ by using Jacobian transformation.

Solve $\bar{X}$ and $S$ as $x=\dfrac{y_1+y_2}{2}$ and $s=\dfrac{y_2-y_1}{2k}$, then Jacobian of the transformation is 
\begin{equation}
|J|= \left| \begin{array}{cc}
\dfrac{\partial x}{\partial y_1} & \dfrac{\partial x}{\partial y_2}\\
\\
\dfrac{\partial s}{\partial y_1} & \dfrac{\partial s}{\partial y_2}
\end{array} \right| = \left| \begin{array}{cc}
\dfrac{1}{2} & \dfrac{1}{2}\\
\\
-\dfrac{1}{2k} & \dfrac{1}{2k}
\end{array} \right| = \dfrac{1}{2k}.
\end{equation}

Thus, (\ref{probpass}) can be calculated as

\begin{equation} \label{extend}
\begin{split}
\int_{L}^{U} \int_{L}^{y_2}f_{Y_1,Y_2}(y_1, y_2) dy_1 dy_2 & = 
\int_{L}^{U} \int_{L}^{y_2}f_{\bar{X},S}\left(\dfrac{y_1+y_2}{2}, \dfrac{y_2-y_1}{2k}\right) |J| dy_1 dy_2 \\
& = \dfrac{1}{2k} \int_{L}^{U} \int_{L}^{y_2}f_{\bar{X}}\left(\dfrac{y_1+y_2}{2}\right) f_{S}\left( \dfrac{y_2-y_1}{2k}\right) dy_1 dy_2. 
\end{split}
\end{equation}
The second equation follows from normal sample mean and standard deviation being independent.


Similarly, we can obtain the PDF of sample standard deviation $f_S(s)$. By (\ref{diststd}), let $v= \dfrac{(n-1)s^2}{\sigma^2}$, then Jacobian of the transformation is
\begin{equation}
|J| = \left|\dfrac{dv}{ds}\right| = \dfrac{2(n-1)s}{\sigma^2}.
\end{equation}
Thus, 
\begin{equation} \label{Spdf}
\begin{split}
f_S(s) & = f_V\left(\dfrac{(n-1)s^2}{\sigma^2}\right) |J| \\
& = \dfrac{2(n-1)s}{\sigma^2}f_V\left(\dfrac{(n-1)s^2}{\sigma^2}\right)
\end{split}
\end{equation}

Plug (\ref{Spdf}) in (\ref{extend}), we can get the final results.
\begin{equation} 
\begin{split}
& \Pr(\text{Passing a Single PPQ Batch}) \\ 
= & \dfrac{1}{2k} \int_{L}^{U} \int_{L}^{y_2}f_{\bar{X}}\left(\dfrac{y_1+y_2}{2}\right) \dfrac{2(n-1)\dfrac{y_2-y_1}{2k}}{\sigma^2}f_V\left\{\dfrac{(n-1)\left[\dfrac{y_2-y_1}{2k}\right]^2}{\sigma^2}\right\}  dy_1 dy_2 \\
= & \dfrac{n-1}{2k^2 \sigma^2}\int_{L}^{U} \int_{L}^{y_2}f_{\bar{X}}\left(\dfrac{y_1+y_2}{2}\right) f_V\left\{\dfrac{(n-1)(y_2-y_1)^2}{4k^2\sigma^2}\right\} (y_2-y_1) dy_1 dy_2,
\end{split}
\end{equation}
where  $\bar{X} \sim \mathcal{N}(\mu, \frac{\sigma^2}{n})$ and $V \sim \chi^2 (n-1)$. Then this quantity can be easily calculated by software, such as functions `dnorm()`, `dchisq()` and `integrate()` in R.

We can also calculate the probability of passing $m$ PPQ batches, then under the assumption of independence and similar expected performance across batches, the probability will be
$$ \Pr(\text{Passing } m \text{ batches }) = \{\Pr(\text{Passing a Single PPQ Batch}) \}^m$$