--- title: "Exploring Clinical Trial Data with clintrialx" author: "Indraneel Chakraborty" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Exploring Clinical Trial Data with clintrialx} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction Welcome to the `clintrialx` vignette! This package simplifies the process of fetching and analyzing clinical trial data. In this guide, we'll demonstrate how to use `clintrialx` alongside popular R packages to examine and visualize clinical trial data specifically for cancer studies in India. 🚀 ## Setup To start, load the necessary libraries. We use `suppressPackageStartupMessages` to keep the output clean: ```{r, eval=FALSE} # Load required libraries invisible(suppressPackageStartupMessages({ library(clintrialx) # For fetching clinical trial data library(ggplot2) # For data visualization library(plotly) # For interactive plots library(dplyr) # For data manipulation library(lubridate) # For date handling })) ``` ## Fetching Data Retrieve clinical trial data related to cancer studies in India using the `ctg_bulk_fetch` function: ```{r, eval=FALSE} # Fetch cancer study data in India df <- ctg_bulk_fetch(condition = "cancer", location = "India") ``` ## Visualizing Study Status Distribution Understand the distribution of study statuses by creating a bar plot: ```{r, eval=FALSE} # Create a table of study statuses status_counts <- table(df$`Study Status`) # Convert the table to a data frame status_df <- data.frame(status = names(status_counts), count = as.numeric(status_counts)) # Generate the bar plot ggplotly(ggplot(status_df, aes(x = reorder(status, -count), y = count)) + geom_bar(stat = "identity", fill = "orange") + theme_minimal() + labs(title = "Distribution of Study Statuses", x = "Study Status", y = "Count") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + geom_text(aes(label = count), vjust = -0.5)) ``` ![image](https://github.com/user-attachments/assets/633ed6a3-7fe9-4044-92c4-4ec4c26f4cf6) This plot provides an overview of the number of studies in each status category. 📉 ## Analyzing Enrollment by Study Phase Compare enrollment numbers across different study phases using an interactive box plot: ```{r, eval=FALSE} # Create an interactive box plot of enrollment by study phase ggplotly(ggplot(df, aes(x = Phases, y = Enrollment)) + geom_boxplot(fill = "lightblue", outlier.colour = "red", outlier.shape = 1) + geom_jitter(color = "darkblue", size = 0.5, alpha = 0.5, width = 0.2) + theme_minimal(base_size = 14) + labs(title = "Enrollment by Study Phase", x = "Study Phase", y = "Enrollment") + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12), plot.title = element_text(hjust = 0.5))) ``` ![image](https://github.com/user-attachments/assets/3aef0bff-792b-4689-a80e-ef56b3074765) This interactive plot allows you to explore enrollment numbers across different phases and identify trends. 🔍 ## Visualizing Study Duration Timeline Examine the timeline of studies with a scatter plot: ```{r, eval=FALSE} # Convert date strings to Date objects df$start_date <- as.Date(df$`Start Date`, format = "%Y-%m-%d") df$completion_date <- as.Date(df$`Completion Date`, format = "%Y-%m-%d") # Create a scatter plot with a horizontal line at 2024 ggplot(df, aes(x = start_date, y = completion_date, color = `Study Status`)) + geom_point(alpha = 0.6) + geom_hline(yintercept = as.Date("2024-01-01"), linetype = "dashed", color = "blue") + theme_minimal() + labs(title = "Study Duration Timeline", x = "Start Date", y = "Completion Date") + scale_color_brewer(palette = "Set1") ``` ![image](https://github.com/user-attachments/assets/353f12d2-1a08-4e30-879b-bbac1d5d9651) This scatter plot helps visualize study durations and their statuses, providing insights into timelines. ⏳ ## Analyzing Funding Sources and Study Types Examine the relationship between funding sources and study types using a stacked bar plot: ```{r, eval=FALSE} # Summarize and plot funding sources by study type df_summary <- df %>% count(`Funder Type`, `Study Type`) %>% group_by(`Funder Type`) %>% mutate(prop = n / sum(n)) ggplotly(ggplot(df_summary, aes(x = `Funder Type`, y = prop, fill = `Study Type`)) + geom_bar(stat = "identity", position = "dodge") + theme_minimal() + labs(title = "Funding Sources and Study Types", x = "Funder Type", y = "Proportion") + scale_fill_brewer(palette = "Set2") + theme(axis.text.x = element_text(angle = 45, hjust = 1))) ``` ![image](https://github.com/user-attachments/assets/ec71369c-491f-4b26-9b65-cbde1f4e99f0) This plot uncovers patterns in how different funders support various study types. 💡 ## Conclusion Using the `clintrialx` package along with visualization tools like `ggplot2` and `plotly`, you can extract valuable insights from clinical trial data. This vignette has illustrated techniques for analyzing cancer clinical trials in India, and these methods are adaptable to other datasets fetched with `clintrialx`. Happy analyzing! 😊