---
title: "Getting started"
description: "Introduction to topics in R."
author: ""
opengraph:
  image: 
    src: "http://r-text.org/articles/text_files/figure-html/unnamed-chunk-5-1.png"
  twitter:
    card: summary_large_image
    creator: "@oscarkjell"
output: github_document #rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{text}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
evaluate = FALSE
```
The *topics*-package enables Differential Language Analysis using words, phrases and topics

Please reference our tutorial article when using the package: **Language visualisation methods for psychological assessments** and Ackermann L., Zhuojun G. & Kjell O.N.E. (2024). An R-package for visualizing text in topics. https://github.com/theharmonylab/topics. `DOI:zenodo.org/records/11165378`..

This Getting Started tutorial is going through the most central *topics* functions.  

## Usage
In an example where the topics are used to predict the PHQ-9 score, the pipeline can be run as follows:

**1. Data Preprocessing**<br>
To preprocess the data, run the following command:

```{r dtm, eval = TRUE, warning=TRUE, message=TRUE}

library(topics)

dtm <- topicsDtm(
  data = dep_wor_data$Depword)

# Check the results from the dtm and refine stopwords and removal rates if necessary
dtm_evaluation <- topicsDtmEval(
  dtm)
dtm_evaluation$frequency_plot
```

**2. Model Training**<br>
To train the LDA model, run the following command:

```{r model, eval = TRUE, warning=FALSE, message=FALSE}

model <- topicsModel(
  dtm = dtm,
  num_topics = 20,
  num_iterations = 1000)

```

**3. Model Inference**<br>
To infer the topic term distribution of the documents, run the following command:
```{r preds, eval = TRUE, warning=FALSE, message=FALSE}

preds <- topicsPreds(
  model = model,
  data = dep_wor_data$Depword)

```


**4. Statistical Analysis**<br>
To analyze the relationship between the topics and the prediction variable, run the following command:
```{r test, eval = TRUE, warning=FALSE, message=FALSE}

test <- topicsTest(
  data = dep_wor_data,
  model = model,
  preds = preds,
  x_variable = "PHQ9tot",
  controls = c("Age"),
  test_method = "linear_regression")

```

**5. Visualization**<br>
To visualize the significant topics as wordclouds, run the following command:
```{r plot_list, eval = TRUE, warning=FALSE, message=FALSE}

plot_list <- topicsPlot(
  model = model,
  test = test,
  figure_format = "png")

# showing some of the plots
plot_list$square1
```



### Articles using the topics-package
Differentiating balance and harmony through natural language analysis: A cross-national exploration of two understudied wellbeing-related concepts


### Other relevant references
The below list consists of papers analyzing human language in a similar fashion that is possible in *topics*.

***Methods Articles***

[Gaining insights from social media language: Methodologies and challenges.](http://www.peggykern.org/uploads/5/6/6/7/56678211/kern_2016_-_gaining_insights_from_social_media_language-_methodologies_and_challenges.pdf).  
*Kern et al., (2016). Psychological Methods.*


***Computer Science: Python Software***  

[DLATK: Differential language analysis toolkit.](https://aclanthology.org/D17-2010/)
*Schwartz, H. A., Giorgi, et al., (2017).  In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations* 


[DLATK](https://github.com/dlatk/dlatk)