---
title: "Regular expression logic"
author: "Laurent R. Bergé"
date: "`r Sys.Date()`"
output:
  html_document:
    theme: journal
    highlight: haddock
    number_sections: yes
    toc: yes
    toc_depth: 3
    toc_float:
      collapsed: no
      smooth_scroll: no
  pdf_document:
    toc: yes
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{ref_regex_logic}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(stringmagic)
```

In `stringmagic`, any time you use a regular expression (regex) to *detect a pattern* in
a character string, you can use regex logic. The syntax to logically combine
regular expressions is intuitive: simply use regular logical operators and it will work!

The functions for whcih regex logic is available are: a) pattern detection functions 
(`string_is`, `string_get`, etc), and b) string replacement functions (`string_clean`, 
`string_replace`) with the `total` flag (see the [vignette on regex flags](https://lrberge.github.io/stringmagic/articles/ref_regex_flags.html)).


# Logically combining regex patterns {#detect_logic}

Assume `"pat1"` and `"pat2"` are two regular expression patterns and we want 
to test whether the string `x` contains a combination of these patterns. 
Then:

- `"pat1 & pat2"` = `x` contains `pat1` AND `x` contains `pat2`
- `"pat1 | pat2"` = `x` contains `pat1` OR `x` contains `pat2`
- `"!pat1"` = `x` does not contain `pat1`
- `"!pat1 & pat2"` = `x` does not contain `pat1` AND `x` contains `pat2`

Hence the three logial operators are:

- `" & "`: logical AND, it **must** be a space + an ampersand + a space 
(just the `&` *does not work*)
- `" | "`: logical OR, it **must** be a space + a pipe + a space 
(just the `|` *does not work*)
- `"!"`: logical NOT, it works only when it is the first character of the pattern.
Note that anything after it (including spaces and other `!`) *is part of the regular expression*

The parsing of the logical elements is done before any regex interpretation. 
The logical evaluations are done from left to right and are sequentially combined. 

Ex: selecting cars.
```{r}
cars = row.names(mtcars)
print(cars)

# which one...
# ... contains all letters 'a', 'e', 'i' AND 'o'?
string_get(cars, "a & e & i & o")

# ... does NOT contain any digit?
string_get(cars, "!\\d")
```

You **cannot** combine logical statements with parentheses. 

For example: `"hello | (world & my lady)"` leads to: `x` contains `"hello"` or contains `"(world"`, 
and contains `"my lady)"`. The two latter are invalid regexes but can make sense if you
have the flag "fixed" turned on. To escape the meaning of the logical operators, 
see the [dedicated section](#logical_escape).

The logical `"not"` always apply to a single pattern and **not** to the full pattern.

### Escaping the meaning of the logical operators {#logical_escape}

To escape the meaning of the logical operators, there are two solutions to 
escape them:

- use two backslashes just before the operator: `"a \\& b"` means `x`
contains `"a & b"`
- use a regex hack: the previous example is equivalent to `"a [&] b"` in regex parlance
and won't be parsed as a logical AND

The two solutions work for the three operators: `" & "`, `" | "` and `"!"`.

### How do regex flags work with logically combined regexes? {#logical_flags}

All `stringmagic` regexes accept optional flags. Please see the [associated vignette](https://lrberge.github.io/stringmagic/articles/ref_regex_flags.html).

When you add flags to a pattern, these apply to *all* regex sub-patterns. 
This means that `"f/( | )"` treats the two parentheses as "fixed". 
*You cannot add flags specific to a single sub-pattern.*