---
title: "PubMedMining-vignette"
author: "Jeff DIDIER"
output: rmarkdown::html_vignette
description: >
  Easy function for text-mining the PubMed repository based on defined sets of terms. The relationship between fix-terms (related to your research topic) and pub-terms (terms which pivot around your research focus) is calculated using the pointwise mutual information algorithm (pmi). A text file is generated with the pmi-scores for each fixterm. Then for each collocation pairs (a fix-term + a pub-term), a text file is generated with related article titles and publishing years. Additional Author section will follow in the next version updates.
vignette: >
  %\VignetteIndexEntry{PubMedMining-vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, setup, echo=FALSE}
library(PubMedMining)
```

This package has been created for easy and fast term-based text mining of the broad PubMed article repository. To find relevant articles to your research topic, you must:  

* Figure out the main terms of your research focus (here fixterms)  
* Figure out important terms that might pivot around your focus (here pubterms)  
* (optional) define an output for the results files (Default = current location)  
* Have stable internet access  

The terms are stored as character strings in the according variables "fixterms" and "pubterms". The desired output pathway can be stored in the "output" variable.

```{r, example, eval=FALSE}
fixterms = c("bike", "downhill")
pubterms = c("dangerous", "extreme", "injuries")
output = getwd() #or "YOUR/DESIRED/PATHWAY"
pubmed_textmining(fixterms, pubterms, output)
```

Two kinds of results are generated by the function (.txt files):  

* PMI-scores: Point-wise mutual information score table for each fix-term with scores for each   pub-term  
* relevant articles: for each fixterm+pubterm pair, a text file with relevant article titles and publishing year is generated  

__Definition of Pointwise Mutual Information (PMI) scoring:__  
Good collocation pairs have high PMI because the probability of co-occurrence is only slightly lower than the probabilities of occurrence of each word. Conversely, a pair of words whose probabilities of occurrence are considerably higher than their probability of co-occurrence gets a small PMI score. If PMi = -Inf, no articles found for the respective collocation pair.