Evaluating point estimators

Evaluating the mean squared of the sample mean and the weighted sample mean

Now that we have created an optimal adaptive design, we can investigate the performance characteristics of various estimators for the mean in that design. To this end, the evaluate_estimator function can be used.

evaluate_estimator(
 score = MSE(),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                           Sample mean
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Expectation:                                                          0.3056727
#>  Bias:                                                               0.005672677
#>  Variance:                                                            0.03777784
#>  MSE:                                                                 0.03781002

The mean squared error of the sample mean depends on the true underlying value of the paramter $\mu$ , which of course is unknown. Therefore, it may be advisable to use the evaluate_estimator function on an array of values for $\mu$ to investigate the distributional properties of an estimator.

In the following, the MSE of the sample mean vs. a weighted sample mean with fixed weights will be plotted.

mse_mle <- evaluate_estimator(
  score = MSE(),
  estimator = SampleMean(),
  data_distribution = Normal(two_armed = TRUE),
  design = get_example_design(two_armed = TRUE),
  mu = seq(-0.75, 1.32, .03),
  sigma = 1
)
mse_weighted_sample_means <- evaluate_estimator(
  score = MSE(),
  estimator = WeightedSampleMean(w1 = .8),
  data_distribution = Normal(two_armed = TRUE),
  design = get_example_design(two_armed = TRUE),
  mu = seq(-0.75, 1.32, .03),
  sigma = 1
)
plot(c(mse_mle, mse_weighted_sample_means))

Evaluating median unbiased estimators

Median unbiased estimators are estimators where the probability of overestimation is equal to the probability of underestimation. They can be derived by choosing a sample space ordering. In adestr, 5 different sample-space orderings are implemented: The MLE (maximum likelihood estimator) ordering, the LR (likelihood ratio) test ordering, the ST (score test) ordering, the SWCF (stage-wise combination function) ordering, and the NP (Neyman-Pearson) ordering. The latter (NP ordering) is only useful for the calculation of p-values, estimators derived from that ordering are usually nonsense.

Lets look at the ‘median-unbiased’ property for estimators derived from the four orderings:

evaluate_estimator(
 score = OverestimationProbability(),
 estimator = MedianUnbiasedMLEOrdering(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                        Median unbiased (MLE ordering)
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.4
#> Results:
#>  Probability of overestimation:                                        0.4988042

evaluate_estimator(
 score = OverestimationProbability(),
 estimator = MedianUnbiasedLikelihoodRatioOrdering(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                    Median unbiased (LR test ordering)
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.4
#> Results:
#>  Probability of overestimation:                                        0.4990921

evaluate_estimator(
 score = OverestimationProbability(),
 estimator = MedianUnbiasedScoreTestOrdering(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                 Median unbiased (Score test ordering)
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.4
#> Results:
#>  Probability of overestimation:                                        0.5037828

evaluate_estimator(
 score = OverestimationProbability(),
 estimator = MedianUnbiasedStagewiseCombinationFunctionOrdering(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                       Median unbiased (SWCF ordering)
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.4
#> Results:
#>  Probability of overestimation:                                        0.4976513

Compare this to the Overestimation probability of the sample mean:

evaluate_estimator(
 score = OverestimationProbability(),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.4,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                           Sample mean
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.4
#> Results:
#>  Probability of overestimation:                                        0.5943933

Evaluating bias-reduced unbiased estimators

Apart from sample-space ordering based methods, there are various other ways of defining alternative point estimators which may have desirable properties. Here are a couple that are presented in our paper evaluated for mu=0.3 and sigma=1.

evaluate_estimator(
 score = MSE(),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                           Sample mean
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Expectation:                                                          0.3056727
#>  Bias:                                                               0.005672677
#>  Variance:                                                            0.03777784
#>  MSE:                                                                 0.03781002
evaluate_estimator(
 score = MSE(),
 estimator = BiasReduced(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                       Bias reduced MLE (iterations=1)
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Expectation:                                                          0.3042277
#>  Bias:                                                               0.004227723
#>  Variance:                                                            0.03453116
#>  MSE:                                                                 0.03454903
evaluate_estimator(
 score = MSE(),
 estimator = PseudoRaoBlackwell(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                              Pseudo Rao-Blackwellized
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Expectation:                                                          0.3022287
#>  Bias:                                                               0.002228688
#>  Variance:                                                            0.03268444
#>  MSE:                                                                 0.03268941
evaluate_estimator(
 score = MSE(),
 estimator = RaoBlackwell(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                     Rao-Blackwellized
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Expectation:                                                           0.300153
#>  Bias:                                                              0.0001530249
#>  Variance:                                                            0.03548938
#>  MSE:                                                                 0.03548941
evaluate_estimator(
 score = MSE(),
 estimator = AdaptivelyWeightedSampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = 0.3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                AdaptivelyWeightedSampleMean(w1^2=50%)
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Expectation:                                                          0.3039742
#>  Bias:                                                               0.003974222
#>  Variance:                                                            0.03817511
#>  MSE:                                                                  0.0381909

Evaluating interval estimators

Evaluating coverage

The coverage of an interval estimator is the probability of that an estimator will cover the true value of the parameter in question. It may be evaluated like this:

evaluate_estimator(
 score = Coverage(),
 estimator = NaiveCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .07,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                              Naive CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                 0.07
#> Results:
#>  Coverage:                                                              0.933259

As you can see, the naive confidence interval does not have the correct coverage of 95%. Here is an example for an estimator that achieves the correct coverage:

evaluate_estimator(
 score = Coverage(),
 estimator = LikelihoodRatioOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .07,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                   LR test ordering CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                 0.07
#> Results:
#>  Coverage:                                                             0.9500889

For the parameter of mu that we chose in this example, the naive confidence interval performs particularly bad. We can plot the coverage of the naive confidence interval for an array of values like this:

coverage_naive <- evaluate_estimator(
 score = Coverage(),
 estimator = NaiveCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = seq(-0.75, 1.32, .03),
 sigma = 1
)
plot(coverage_naive)

Evaluating the widht of an interval estimator

Amongst the interval estimators which have the correct coverage, one might be interested in selecting the one with the least width. We can evaluate the expected width of an interval estimator for a particular set of assumptions like this:

evaluate_estimator(
 score = Width(),
 estimator = MLEOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                       MLE ordering CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Width:                                                                0.6453952
evaluate_estimator(
 score = Width(),
 estimator = LikelihoodRatioOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                   LR test ordering CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Width:                                                                0.6351904
evaluate_estimator(
 score = Width(),
 estimator = ScoreTestOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                Score test ordering CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Width:                                                                0.6502033
evaluate_estimator(
 score = Width(),
 estimator = StagewiseCombinationFunctionOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                      SWCF ordering CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Width:                                                                0.6575988

Evaluating the centrality of a point estimator with respect to an interval estimator

When choosing a combination of point and interval estimator to report at the end of a trial, one might want the point estimator to be more or less in the middle of the respective interval. To evaluate the ‘centrality’ of an estimator, which in this case is defined as the difference between the distance of the point estimator the lower end of an interval and the upper end of an interval, one can use the following command:

evaluate_estimator(
 score = Centrality(interval = StagewiseCombinationFunctionOrderingCI()),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                           Sample mean
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Centrality with respect to SWCF ordering CI:                         0.00715509
evaluate_estimator(
 score = Centrality(interval = NaiveCI()),
 estimator = SampleMean(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                           Sample mean
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Centrality with respect to Naive CI:                              -4.072497e-18

Evaluating agreement with the primary test decision of the design for an interval estimator

In the framework of optimal adaptive designs, the rejection boundaries for the primary testing decision of a design are optimized directly and are not derived from a common test statistic. Therefore, confidence intervals derived from such statistics do not necessarily agree with the primary test decision. One may evaluate the chance of agreement like this:

evaluate_estimator(
 score = TestAgreement(),
 estimator = MLEOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                       MLE ordering CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Agreement with test decision:                                         0.9867598
evaluate_estimator(
 score = TestAgreement(),
 estimator = LikelihoodRatioOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                   LR test ordering CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Agreement with test decision:                                         0.9859001
evaluate_estimator(
 score = TestAgreement(),
 estimator = ScoreTestOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                Score test ordering CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Agreement with test decision:                                         0.8269152
evaluate_estimator(
 score = TestAgreement(),
 estimator = StagewiseCombinationFunctionOrderingCI(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                      SWCF ordering CI
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Agreement with test decision:                                          1.000034

The confidence interval derived from the stage-wise combination function ordering (by its construction) always agrees with the primary testing decision.

Evaluating p-values

Evaluating agreement with the primary test decision of the design for a p-value

Like confidence intervals, p-values always come with an associated test decision. However, for the same reason as described in the chapter about interval estimators, these test decision derived from various ways of calculating p-values may not necessarily agree with the primary test decision of an optimal adaptive design. One may evaluate the probability of agreement for a particular set of assumptions like this:

evaluate_estimator(
 score = TestAgreement(),
 estimator = MLEOrderingPValue(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                  MLE ordering p-value
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Agreement with test decision:                                         0.9846693
evaluate_estimator(
 score = TestAgreement(),
 estimator = LikelihoodRatioOrderingPValue(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                              LR test ordering p-value
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Agreement with test decision:                                         0.9859001
evaluate_estimator(
 score = TestAgreement(),
 estimator = ScoreTestOrderingPValue(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                           Score test ordering p-value
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Agreement with test decision:                                         0.8269606
evaluate_estimator(
 score = TestAgreement(),
 estimator = StagewiseCombinationFunctionOrderingPValue(),
 data_distribution = Normal(two_armed = TRUE),
 design = get_example_design(two_armed = TRUE),
 mu = .3,
 sigma = 1
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Estimator:                                                 SWCF ordering p-value
#> Assumed sigma:                                                                 1
#> Assumed mu:                                                                  0.3
#> Results:
#>  Agreement with test decision:                                          1.000034

Again, we see that only the p-value derived from the stage-wise combination function ordering always agrees with the primary testing decision.

Conducting larger scale investigations in the performance characteristics of end-of-trial statistics

So far we have only looked at evaluating performance characteristics for a particular set of assumptions, a choice of performance characteristics and a choice of a few estimators. However, when designing a trial, one might want to produce results for a variety of different scenarios, which can be time consuming. The evaluation of performance characteristics in adestr can run in parallel using the future framework.

library(future.apply)
#> Loading required package: future
# Change to e.g. plan(multisession) for parallel computing
plan(sequential)

# Scenario 1:
scores1 <- list(MSE(), OverestimationProbability())
estimators1 <- list(SampleMean(), BiasReduced())
dist1 <- list(Normal(two_armed = TRUE))
designs1 <- list(get_example_design(two_armed = TRUE))
mu1 <- seq(-1,1,.5)
sigma1 <- 1

# Scenario 2:
scores2 <- list(Coverage(), Width())
estimators2 <- list(NaiveCI(), StagewiseCombinationFunctionOrderingCI())
dist2 <- list(Normal(two_armed = TRUE))
designs2 <- list(get_example_design(two_armed = TRUE))
mu2 <- seq(-1,1,.5)
sigma2 <- 1

# Evaluate in parallel
res <- evaluate_scenarios_parallel(
  score_lists = list(scores1, scores2),
  estimator_lists =  list(estimators1, estimators2),
  data_distribution_lists = list(dist1, dist2),
  design_lists =  list(designs1, designs2),
  mu_lists = list(mu1, mu2),
  sigma_lists = list(sigma1, sigma2)
)

res[[1]]
#>                          estimator data_distribution
#> 1  Bias reduced MLE (iterations=1) Normal<two-armed>
#> 2  Bias reduced MLE (iterations=1) Normal<two-armed>
#> 3  Bias reduced MLE (iterations=1) Normal<two-armed>
#> 4  Bias reduced MLE (iterations=1) Normal<two-armed>
#> 5  Bias reduced MLE (iterations=1) Normal<two-armed>
#> 6                      Sample mean Normal<two-armed>
#> 7                      Sample mean Normal<two-armed>
#> 8                      Sample mean Normal<two-armed>
#> 9                      Sample mean Normal<two-armed>
#> 10                     Sample mean Normal<two-armed>
#>                                         design   mu sigma Expectation
#> 1  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -0.5     1 -0.49939216
#> 2  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -1.0     1 -0.99999772
#> 3  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  0.0     1 -0.01606505
#> 4  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  0.5     1  0.51833127
#> 5  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  1.0     1  0.99918705
#> 6  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -0.5     1 -0.50010511
#> 7  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -1.0     1 -0.99999807
#> 8  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  0.0     1 -0.02491862
#> 9  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  0.5     1  0.52702761
#> 10 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  1.0     1  1.00028935
#>    error_Expectation functionEvaluations_Expectation          Bias   error_Bias
#> 1       1.960169e-05                              96  6.078357e-04 1.960169e-05
#> 2       9.514441e-04                              62  2.284733e-06 9.514441e-04
#> 3       1.777337e-05                             213 -1.606505e-02 1.777337e-05
#> 4       4.691721e-04                             591  1.833127e-02 4.691721e-04
#> 5       1.116012e-05                             383 -8.129478e-04 1.116012e-05
#> 6       8.315412e-06                              96 -1.051082e-04 8.315412e-06
#> 7       9.514289e-04                              62  1.932112e-06 9.514289e-04
#> 8       1.594939e-05                             213 -2.491862e-02 1.594939e-05
#> 9       2.894096e-04                             523  2.702761e-02 2.894096e-04
#> 10      9.213421e-06                             383  2.893473e-04 9.213421e-06
#>    functionEvaluations_Bias   Variance error_Variance
#> 1                        96 0.03584328   1.763680e-05
#> 2                        62 0.03550108   6.144227e-09
#> 3                       213 0.02797111   7.090928e-06
#> 4                       591 0.02881243   2.126220e-05
#> 5                       383 0.03583686   1.865170e-05
#> 6                        96 0.03539081   9.104316e-06
#> 7                        62 0.03549953   4.460077e-09
#> 8                       213 0.02779093   7.299455e-06
#> 9                       523 0.02945197   9.840547e-06
#> 10                      383 0.03520539   1.135277e-05
#>    functionEvaluations_Variance        MSE    error_MSE functionEvaluations_MSE
#> 1                            96 0.03584365 3.723849e-05                     192
#> 2                           122 0.03550108 9.514503e-04                     184
#> 3                           247 0.02822920 2.486430e-05                     460
#> 4                           707 0.02914847 4.904343e-04                    1298
#> 5                           213 0.03583752 2.981182e-05                     596
#> 6                            96 0.03539082 1.741973e-05                     192
#> 7                           122 0.03549953 9.514333e-04                     184
#> 8                           247 0.02841187 2.324884e-05                     460
#> 9                           553 0.03018246 2.992501e-04                    1076
#> 10                          213 0.03520547 2.056619e-05                     596
#>    Probability of overestimation error_Probability of overestimation
#> 1                      0.5001404                        2.251825e-04
#> 2                      0.5000217                        3.779495e-04
#> 3                      0.5055955                        1.861311e-03
#> 4                      0.5141305                        6.286754e-03
#> 5                      0.4997007                        3.746175e-04
#> 6                      0.4999989                        2.248484e-05
#> 7                      0.5000217                        3.779495e-04
#> 8                      0.4736938                        1.457061e-03
#> 9                      0.5550895                        1.395147e-02
#> 10                     0.4999743                        1.249999e-04
#>    functionEvaluations_Probability of overestimation
#> 1                                                528
#> 2                                                362
#> 3                                               1393
#> 4                                               1333
#> 5                                                407
#> 6                                                528
#> 7                                                362
#> 8                                               1303
#> 9                                               1363
#> 10                                               377
#>                                                                                                                    EstimatorScoreResult
#> 1  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 2  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 3  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 4  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 5  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 6  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 7  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 8  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 9  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 10 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
res[[2]]
#>           estimator data_distribution
#> 1          Naive CI Normal<two-armed>
#> 2          Naive CI Normal<two-armed>
#> 3          Naive CI Normal<two-armed>
#> 4          Naive CI Normal<two-armed>
#> 5          Naive CI Normal<two-armed>
#> 6  SWCF ordering CI Normal<two-armed>
#> 7  SWCF ordering CI Normal<two-armed>
#> 8  SWCF ordering CI Normal<two-armed>
#> 9  SWCF ordering CI Normal<two-armed>
#> 10 SWCF ordering CI Normal<two-armed>
#>                                         design   mu sigma  Coverage
#> 1  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -0.5     1 0.9500604
#> 2  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -1.0     1 0.9499647
#> 3  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  0.0     1 0.9449795
#> 4  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  0.5     1 0.9372047
#> 5  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  1.0     1 0.9500391
#> 6  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -0.5     1 0.9499836
#> 7  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -1.0     1 0.9499647
#> 8  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  0.0     1 0.9500699
#> 9  TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  0.5     1 0.9499538
#> 10 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>  1.0     1 0.9498600
#>    error_Coverage functionEvaluations_Coverage     Width  error_Width
#> 1    0.0008288183                          712 0.7383716 9.444517e-06
#> 2    0.0008820986                          482 0.7384423 3.414488e-04
#> 3    0.0010153071                         1273 0.6918503 2.783059e-05
#> 4    0.0034991405                         1573 0.6778015 1.804778e-04
#> 5    0.0008299689                         1453 0.7382501 1.050093e-05
#> 6    0.0008194698                          542 0.7384927 8.305015e-06
#> 7    0.0008820986                          482 0.7384423 3.414488e-04
#> 8    0.0021817246                         1273 0.7163671 1.228666e-04
#> 9    0.0006645727                         1063 0.6996062 1.999098e-04
#> 10   0.0008090573                          467 0.7384132 1.013179e-05
#>    functionEvaluations_Width
#> 1                        164
#> 2                         62
#> 3                        179
#> 4                        421
#> 5                        383
#> 6                         62
#> 7                         62
#> 8                        281
#> 9                        455
#> 10                       383
#>                                                                                                                    EstimatorScoreResult
#> 1  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 2  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 3  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 4  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 5  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 6  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 7  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 8  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 9  <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 10 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>

You will get back a list of data.frames containing the results for each scenario. Note that within one scenario, the evaluation will take place for the cross-product of all arguments that are supplied! This means that within one scenario, every estimator is evaluated with every score at every point of mu for every sigma and every design. Depending on your settings, this can get very time-consuming very fast, making parallelization essential.

Analyzing datasets

Next, let us look at how to the package can be used to calculate estimates after data has been collected.

The first stage data of a trial might look like this:

set.seed(321)
dat <- data.frame(
 endpoint = c(rnorm(56, .3, 1), rnorm(56, 0, 1)),
 group = factor(rep(c("trt", "ctl"),
                    c(56,56)), levels = c("trt", "ctl")),
 stage = rep(1, 112)
)
head(dat)
#>      endpoint group stage
#> 1  2.00490322   trt     1
#> 2 -0.41203856   trt     1
#> 3  0.02201509   trt     1
#> 4  0.18035098   trt     1
#> 5  0.17603938   trt     1
#> 6  0.56818377   trt     1

analyze(data = dat,
        statistics = list(),
        data_distribution = Normal(two_armed = TRUE),
        design = get_example_design(two_armed = TRUE),
        sigma = 1)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Observed number of stages:                                                     1
#> Observed n1 (group 1)                                                         56
#> Observed n1 (group 2)                                                         56
#> Observed n1 (total)                                                          112
#> Z1                                                                          1.75
#> Interim decision:                                       continue to second stage
#> Calculated n2(Z1) (per group)                                           46.99923
#> Calculated c2(Z1)                                                           1.14

The results suggest recruiting 23 more patients per group for the second stage.

We will simulate 47 more patients per group:

dat <- rbind(dat,
             data.frame(
               endpoint = c(rnorm(47, .3, 1), rnorm(47, 0, 1)),
               group = factor(rep(c("trt", "ctl"),
                                  c(47, 47)), levels = c("trt", "ctl")),
               stage = rep(2, 94)
             ))

Now, we can use the analyze function to evaluate a selection of estimators on the complete dataset:

analyze(
 data = dat,
 statistics = c(
   SampleMean(),
   BiasReduced(),
   PseudoRaoBlackwell(),
   MedianUnbiasedStagewiseCombinationFunctionOrdering(),
   StagewiseCombinationFunctionOrderingCI(),
   StagewiseCombinationFunctionOrderingPValue()
   ),
 data_distribution = Normal(two_armed = TRUE),
 sigma = 1,
 design = get_example_design(two_armed = TRUE)
)
#> Design:                              TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution:                                             Normal<two-armed>
#> Observed number of stages:                                                     2
#> Observed n1 (group 1)                                                         56
#> Observed n1 (group 2)                                                         56
#> Observed n1 (total)                                                          112
#> Z1                                                                          1.75
#> Interim decision:                                       continue to second stage
#> Calculated n2(Z1) (per group)                                           46.99923
#> Calculated c2(Z1)                                                           1.14
#> Observed n2 (group 1)                                                         47
#> Observed n2 (group 2)                                                         47
#> Observed n2 (in total)                                                        94
#> Z2                                                                          2.71
#> Final test decision:                                                 reject null
#> 
#> Stage 2 results:
#>  Sample mean:                                                           0.434684
#>  Bias reduced MLE (iterations=1):                                      0.4221533
#>  Pseudo Rao-Blackwellized:                                             0.2658506
#>  Median unbiased (SWCF ordering):                                      0.3047428
#>  SWCF ordering CI:                                       [0.04435513, 0.5484439]
#>  SWCF ordering p-value:                                               0.01097266

The estimates presented here are for the difference in means of the two normal distributions. Keep in mind that a difference of $\mu=0.3$ was used in the simulation.

Note that while the median unbiased estimator performs well in this particular example, this is not universally true.

Introduction to adestr

Jan Meis

2024-07-12

Introduction

Fitting a design with adoptr