---
title: "Exploring Clinical Trial Data with clintrialx"
author: "Indraneel Chakraborty"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Exploring Clinical Trial Data with clintrialx}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Introduction

Welcome to the `clintrialx` vignette! This package simplifies the process of fetching and analyzing clinical trial data. In this guide, we'll demonstrate how to use `clintrialx` alongside popular R packages to examine and visualize clinical trial data specifically for cancer studies in India. ðŸš€

## Setup

To start, load the necessary libraries. We use `suppressPackageStartupMessages` to keep the output clean:

```{r, eval=FALSE}
# Load required libraries
invisible(suppressPackageStartupMessages({
  library(clintrialx)  # For fetching clinical trial data
  library(ggplot2)     # For data visualization
  library(plotly)      # For interactive plots
  library(dplyr)       # For data manipulation
  library(lubridate)   # For date handling
}))
```

## Fetching Data

Retrieve clinical trial data related to cancer studies in India using the `ctg_bulk_fetch` function:

```{r, eval=FALSE}
# Fetch cancer study data in India
df <- ctg_bulk_fetch(condition = "cancer", location = "India")
```

## Visualizing Study Status Distribution

Understand the distribution of study statuses by creating a bar plot:

```{r, eval=FALSE}
# Create a table of study statuses
status_counts <- table(df$`Study Status`)

# Convert the table to a data frame
status_df <- data.frame(status = names(status_counts), count = as.numeric(status_counts))

# Generate the bar plot
ggplotly(ggplot(status_df, aes(x = reorder(status, -count), y = count)) +
  geom_bar(stat = "identity", fill = "orange") +
  theme_minimal() +
  labs(title = "Distribution of Study Statuses",
       x = "Study Status",
       y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  geom_text(aes(label = count), vjust = -0.5))
```

![image](https://github.com/user-attachments/assets/633ed6a3-7fe9-4044-92c4-4ec4c26f4cf6)

This plot provides an overview of the number of studies in each status category. ðŸ“‰

## Analyzing Enrollment by Study Phase

Compare enrollment numbers across different study phases using an interactive box plot:

```{r, eval=FALSE}
# Create an interactive box plot of enrollment by study phase
ggplotly(ggplot(df, aes(x = Phases, y = Enrollment)) +
  geom_boxplot(fill = "lightblue", outlier.colour = "red", outlier.shape = 1) +
  geom_jitter(color = "darkblue", size = 0.5, alpha = 0.5, width = 0.2) +
  theme_minimal(base_size = 14) +
  labs(title = "Enrollment by Study Phase",
       x = "Study Phase",
       y = "Enrollment") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),
        plot.title = element_text(hjust = 0.5)))
```

![image](https://github.com/user-attachments/assets/3aef0bff-792b-4689-a80e-ef56b3074765)

This interactive plot allows you to explore enrollment numbers across different phases and identify trends. ðŸ”

## Visualizing Study Duration Timeline

Examine the timeline of studies with a scatter plot:

```{r, eval=FALSE}
# Convert date strings to Date objects
df$start_date <- as.Date(df$`Start Date`, format = "%Y-%m-%d")
df$completion_date <- as.Date(df$`Completion Date`, format = "%Y-%m-%d")

# Create a scatter plot with a horizontal line at 2024
ggplot(df, aes(x = start_date, y = completion_date, color = `Study Status`)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = as.Date("2024-01-01"), linetype = "dashed", color = "blue") +
  theme_minimal() +
  labs(title = "Study Duration Timeline",
       x = "Start Date",
       y = "Completion Date") +
  scale_color_brewer(palette = "Set1")
```

![image](https://github.com/user-attachments/assets/353f12d2-1a08-4e30-879b-bbac1d5d9651)

This scatter plot helps visualize study durations and their statuses, providing insights into timelines. â³

## Analyzing Funding Sources and Study Types

Examine the relationship between funding sources and study types using a stacked bar plot:

```{r, eval=FALSE}
# Summarize and plot funding sources by study type
df_summary <- df %>%
  count(`Funder Type`, `Study Type`) %>%
  group_by(`Funder Type`) %>%
  mutate(prop = n / sum(n))

ggplotly(ggplot(df_summary, aes(x = `Funder Type`, y = prop, fill = `Study Type`)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Funding Sources and Study Types",
       x = "Funder Type",
       y = "Proportion") +
  scale_fill_brewer(palette = "Set2") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)))
```

![image](https://github.com/user-attachments/assets/ec71369c-491f-4b26-9b65-cbde1f4e99f0)

This plot uncovers patterns in how different funders support various study types. ðŸ’¡

## Conclusion

Using the `clintrialx` package along with visualization tools like `ggplot2` and `plotly`, you can extract valuable insights from clinical trial data. This vignette has illustrated techniques for analyzing cancer clinical trials in India, and these methods are adaptable to other datasets fetched with `clintrialx`. Happy analyzing! ðŸ˜Š