---
title: "Introduction to the IRISSeismic Package"
author:
- Jonathan Callahan (jonathan@mazamascience.com), Mazama Science
- Robert Casey (rob.casey@earthscope.org), EarthScope
- Mary Templeton (mary.templeton@earthscope.org), EarthScope
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to the IRISSeismic Package}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Introduction

The `IRISSeismic` package for seismic data analysis was developed by 
[Mazama Science](http://mazamascience.com) for [EarthScope](https://www.earthscope.org/) (formerly IRIS Data Management Center)
This development is part of the **MUSTANG** 
project for automated QC of seismic data.

The goal of this package is to make it easy to obtain and work with data from
EarthScope [web services](https://service.earthscope.org/).  This introduction
will demonstrate some of the core functionality of the `IRISSeismic` package
and how it can be used in interactive sessions.  Detailed information about 
object properties and function arguments can be found in the package documentation.

The core objects in this package, especially `Trace` and `Stream` objects,
borrow heavily from concepts and features found in the Python [ObsPy](https://github.com/obspy/obspy/wiki/) package.  References to specific **ObsPy** classes can be found in
the source code.

## Getting Started

For those who are not used to working with `R`, the [Using R](https://web.archive.org/web/20120409050248/http://mazamascience.com/WorkingWithData/?series=using-r) series of
blog posts offers tips on how to get started and includes links to other introductory
documentation.

### RStudio

Users of the `IRISSeismic` package are encouraged to first download and install
the [RStudio](https://posit.co/) integrated development environment for 
**R**.  Newcomers to `R` will find **RStudio** a much friendlier 
environment in which to work.

```{r, out.width = "600px", echo=FALSE}
knitr::include_graphics("rstudio-IRISSeismic.png")
```

### First Example
Once you have an **R** environment up and running, the first step is to 
load the `IRISSeismic` package.  Then you can create a new 
`IrisClient` object that will be responsible for all subsequent communication
with DMC provided web services.

```{r first, results="hide"}
library(IRISSeismic)
iris <- new("IrisClient")
```

In order to get data from one of the EarthScope web services we must specify all 
the information needed to create a webservice request:  `network, station, 
location, channel, starttime, endtime`.  Each unique combination of these
elements is known as a *SNCL*.  These elements are passed to the
`getDataselect()` method of the `IrisClient` as a series of character 
strings except for the times which are of type `POSIXct`.  The user is
responsible for creating datetime objects of class `POSIXct`.

The first three commands in the following code chunk use the `IrisClient`
object to communicate with web services and return a `Stream` object full
of data from EarthScope.  The fourth line checks to see how many distinct
chunks of seismic data exist.  The last line passes this `Stream` object to
a function that will plot the times at which this channel was collecting data.

```{r second, fig.height=4, fig.width=5}
starttime <- as.POSIXct("2002-04-20", tz="GMT")
endtime <- as.POSIXct("2002-04-21", tz="GMT")
result <- try(st <- getDataselect(iris,"US","OXF","","BHZ",starttime,endtime))
if (inherits(result,"try-error")) {
  message(geterrmessage())
} else {
  length(st@traces)
  plotUpDownTimes(st, min_signal=1, min_gap=1)
}
```


This station had a few minor data dropouts, causing the data to be broken up into
several separate signals that the `IRISSeismic` package stores in `Trace` 
objects.

We can get more information on the gaps between traces with the `getGaps()`
function.  The duration (secs) of gaps between traces is displayed along with the
number of samples that were missed during the gap.

```{r third}
if (exists("st")){
  getGaps(st)
}
```

Next we can examine various statistics for each individual trace with
the following `parallel-` functions.

```{r fourth}
if (exists("st")){
  parallelLength(st)
  parallelMax(st)
  parallelSd(st)
}
```

It looks like the third trace, with a larger maximum and standard deviation,
might have a signal.  Metadata for this trace is stored in the `stats`
slot of the `Trace` object.

```{r fifth}
if (exists("st")){
  tr <- st@traces[[3]]
  tr@stats
}
```

Finally, we can look at the seismic signal with the `plot` method.

```{r sixth, fig.height=4, fig.width=6}
if (exists("tr")){
  plot(tr)
}
```

This small seismic signal was recorded in Oxford, Mississippi and is from a quake
that occurred in New York state

**Note:**  By default, data are subsampled before plotting to *greatly!* improve plotting speed. You can sometimes improve the appearance of a plot by reducing the amount of
subsampling used. The `plot` method accepts a `subsampling` parameter
to specify this.

### Understanding `Stream` and `Trace` objects

In order to work effectively with the `IRISSeismic` package you must first
understand the structure of the new `S4` objects it defines. 
The package documentation gives a full description of each object but we can
also interrogate them using the `slotNames()` function.

```{r seventh}
if (exists("st")){
  slotNames(st)
}
```

The `Stream` object has the following *slots* (aka *properties* or *attributes*):

  * `url` -- full web services URL used to obtain data
  * `requestedStarttime` -- POSIXct datetime of the requested start
  * `requestedEndtime` -- POSIXct datetime of the requested end
  * `act_flags` -- activity flags from the miniSEED record
  * `io_flags` -- I/O flags from the miniSEED record
  * `dq_flags` -- data quality flags from the miniSEED record
  * `timing_qual` -- timing quality from the miniSEED record
  * `traces` -- list of `Trace` objects

When in doubt about what a particular *slot* contains, it is always a good
idea to ask what type of object it is.

```{r eighth}
if (exists("st")){
  class(st@url)
  class(st@requestedStarttime)
  class(st@traces)
}
```

The next code chunk examines the first `Trace` in our `Stream`.

**Note:**  `R` uses double square brackets, `[[...]]` to access list items.

```{r ninth}
if (exists("st")){
  slotNames(st@traces[[1]])
}
```

```{r include=FALSE}
if (exists("st")) { rm(st) }
if (exists("tr")) { rm(tr) }
```

The `Trace` object has the following slots:

  * `id` -- SNCLQ identifier of the form "US.OXF..BHZ.M"
  * `stats` -- `TraceHeader` object (metadata from the trace)
  * `Sensor` -- instrument equipment name
  * `InstrumentSensitivity` -- instrument total sensitivity (stage 0 gain)
  * `SensitivityFrequency` -- the frequency (Hz) at which the `InstrumentSensitivity` is correct
  * `InputUnits` -- instrument data acquisition input units
  * `data` -- vector of `numeric` data (the actual signal)


The `TraceHeader` metadata and the actual signal come from the
[dataselect webservice](https://service.iris.edu/fdsnws/dataselect/).
The instrument metadata are obtained from the
[station webservice](https://service.iris.edu/fdsnws/station/).

### Be careful with times

Time stamps associated with seismic data should be given as "Universal" or "GMT" times.
When specifying times to be used with methods of the `IRISSeismic` package you must
be careful to specify the timezone as R assumes the local timezone by default.

Also, R assumes that datetime strings are formatted with a space
separating date and time as opposed to the
[ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) 'T' separator.  If  an 
ISO 8601 character string is provided without specific formatting
instructions, the time portion of the string will be lost *without any warning*!
So it is very important to be careful and consistent if you write code that 
converts ASCII strings into times.

A few examples will demonstrate the issues:

```{r tenth}
as.POSIXct("2010-02-27", tz="GMT") # good
as.POSIXct("2010-02-27 04:00:00", tz="GMT") # good
as.POSIXct("2010-02-27T04:00:00", tz="GMT",
           format="%Y-%m-%dT%H:%M:%OS") # good

as.POSIXct("2010-02-27") # BAD -- no timezone
as.POSIXct("2010-02-27T04:00:00", tz="GMT") # BAD -- no formatting
```

## Example Operations

The example at the beginning of this vignette already demonstrated how to obtain 
seismic data from DMC web services, how to learn about the number and size of 
individual traces within the requested time range and how to generate a first
plot of the seismic signal.  This section will introduce more use cases that
delve further into the capabilities of the `IRISSeismic` package.  For complete
details on available functions, please see the package documentation.

```{r eleventh, results="hide"}
help("IRISSeismic",package="IRISSeismic")
```

### Closer examination of a seismic signal

Once seismic data are in memory, performing mathematical analysis on those data can
be very fast.  All mathematical operations are performed on every data point.  
But plotting can still be a slow process.

**Note:**  The `plot()` method of `Stream` objects deals with gaps by
first calling `mergeTraces()` to fill all gaps with missing values (`NA`).
Then the single, merged trace is plotted with the `plot()` method for `Trace` objects.
Any gaps of a significant size will be now visible in the resulting plot.

By default, the `plot()` method of `Trace` and `Stream` objects subsamples the data
so that approximately 5,000 points are used in the plot.  This dramatically speeds
up plotting.  One of the first things 
you will want to do with a full day's worth of seismic signal is clip it to a 
region of interest.  One way to do that would be to  modify the `starttime`
and `endtime` parameters to `getDataselect` and then make a data request 
covering a shorter period of time.  A simpler technique, if the signal is already
in memory, is to use the `slice()` method.

```{r twelfth,fig.height=8, fig.width=6}
starttime <- as.POSIXct("2010-02-27", tz="GMT")
endtime <- as.POSIXct("2010-02-28", tz="GMT")
result <- try(st <- getDataselect(iris,"IU","ANMO","00","BHZ",starttime,endtime))
if (inherits(result,"try-error")) {
  message(geterrmessage())
} else {
  start2 <- as.POSIXct("2010-02-27 06:40:00", tz="GMT")
  end2 <- as.POSIXct("2010-02-27 07:40:00", tz="GMT")

  tr1 <- st@traces[[1]]
  tr2 <- slice(tr1, start2, end2)

  layout(matrix(seq(2)))        # layout a 2x1 matrix
  plot(tr1)
  plot(tr2)
  layout(1)                     # restore original layout
}
```
```{r include=FALSE}
if (exists("st")) { rm(st) }
if (exists("tr1"))  { rm(tr1) }
if (exists("tr2"))  { rm(tr2) }
```

### Detecting events with STA/LTA

Access to triggering algorithms for detecting events is provided by the
`STALTA()` method of `Trace` objects.
( *cf* [A Comparison of Select Trigger Algorithms for Automated Global Seismic 
Phase and Event Detection](https://pubs.geoscienceworld.org/ssa/bssa/article-abstract/88/1/95/102726/A-comparison-of-select-trigger-algorithms-for)). The `STALTA()` method has the following
arguments and defaults:

  * `x` -- `Trace` being analyzed
  * `staSecs` -- size of the short window in secs
  * `ltaSecs` -- size of the long window in secs
  * `algorithm` -- named algorithm (default="classic_LR")
  * `demean` -- should the signal have the mean removed (default=`TRUE`)
  * `detrend` -- should the signal have the trend removed (default=`TRUE`)
  * `taper` -- portion of the seismic signal to be tapered at each end (default=0.0)
  * `increment` -- integer increment to use when sliding the averaging windows to the next location (default=1)

The `STALTA()` method returns a *picker*, a vector of numeric values, one
for every value in the `Trace@data` slot.  Note that this is a fairly 
compute-intensive operation.  This picker can then be used with the `triggerOnset()`
function to return the **approximate** start of the seismic signal.

We'll test this with our original seismic signal.

```{r thirteenth}
starttime <- as.POSIXct("2002-04-20", tz="GMT")
endtime <- as.POSIXct("2002-04-21", tz="GMT")
result <- try(st <- getDataselect(iris,"US","OXF","","BHZ",starttime,endtime))
if (inherits(result,"try-error")) {
  message(geterrmessage())
} else {
  tr <- st@traces[[3]]
  picker <- STALTA(tr,3,30)
  threshold <- quantile(picker,0.99999,na.rm=TRUE)
  to <- triggerOnset(tr,picker,threshold)
}
```

**NOTE:**  The `STALTA()` method is intended to be used for crude, automatic event 
detection, not precise determination of signal arrival.  Optimal values 
for the  arguments to the `STALTA()` method will depend on the details 
of the seismic signal.

The `eventWindow()` method allows you to focus on the region identified by 
the picker by automatically finding the trigger onset time and then slicing out 
the region of the trace centered on that time.  This method has the following 
arguments and defaults:

  * `x` -- `Trace` being analyzed
  * `picker` -- picker returned by `STALTA()`
  * `threshold` -- threshold value as used in `triggerOnset()`
  * `windowSecs` -- total window size (secs)
  
```{r fourteenth, fig.height=12, fig.width=6}
if (exists("tr")){
  layout(matrix(seq(3)))        # layout a 3x1 matrix
  closeup1 <- eventWindow(tr,picker,threshold,3600)
  closeup2 <- eventWindow(tr,picker,threshold,600)
  plot(tr)
  abline(v=to, col='red', lwd=2)
  plot(closeup1)
  abline(v=to, col='red', lwd=2)
  plot(closeup2)
  abline(v=to, col='red', lwd=2)
  layout(1)                     # restore original layout
}
```
```{r include=FALSE}
if (exists("st")) { rm(st) }
if (exists("tr")) { rm(tr) }
```

### Data availability

The `IrisClient` also provides functionality for interacting with
other web services at the DMC.  The `getAvailability()` method allows users
to query what SNCLs are available, obtaining that information from the
[station webservice](https://service.iris.edu/fdsnws/station/).

Information is returned as a dataframe containing all the information
returned by ws-availability.  Standard
DMC webservice wildcards can be used as in the example below which tells us
what other 'B' channels are available at our station of interest during
the time of the big quake above.

```{r fiftheenth}
starttime <- as.POSIXct("2010-02-27", tz="GMT")
endtime <- as.POSIXct("2010-02-28", tz="GMT")
result <- try(availability <- getAvailability(iris,"IU","ANMO","*","B??",starttime,endtime))
if (inherits(result,"try-error")) {
  message(geterrmessage())
} else {
  availability
}
```

The `getAvailability()` method accepts the following arguments:

  * `obj` -- an `IrisClient` object 
  * `network` -- network code 
  * `station` -- station code 
  * `location` -- location code 
  * `channel` -- channel code 
  * `starttime` -- POSIXct starttime (GMT) 
  * `endtime` -- POSIXct endtime (GMT) 
  * `includerestricted` -- optional flag whether to report on restricted data (default=`FALSE`) 
  * `latitude` -- optional latitude when specifying location and radius 
  * `longitude` -- optional longitude when specifying location and radius 
  * `minradius` -- optional minimum radius when specifying location and radius 
  * `maxradius` -- optional maximum radius  when specifying location and radius 

### Other EarthScope web services

Several methods of the `IrisClient` class work very similarly to the
`getAvailability()` method in that they return dataframes of information obtained
from web services of the same name.  The suite of methods returning dataframes includes:

  * `getAvailability`
  * `getChannel`
  * `getDataselect`
  * `getEvalresp`
  * `getEvent`
  * `getNetwork`
  * `getSNCL`
  * `getStation`
  * `getTraveltime`
  * `getUnavailability`

The following example demonstrates the use of several of these services together to do the following:

  1. find seismic events on a particular day
  2. find available US network BHZ channels in the hour after the biggest event that day
  3. determine the easternmost of those channels
  4. get the P and S travel times to that station
  5. plot the seismic signal detected at that station with markers for P and S arrival times

```{r sixteenth, fig.height=4, fig.width=6}
# Open a connection to EarthScope webservices
iris <- new("IrisClient")

# Two days around the "Nisqually Quake"
starttime <- as.POSIXct("2001-02-27", tz="GMT")
endtime <- starttime + 3600 * 24 *2

# Find biggest seismic event over these two days -- it's the "Nisqually"
result <- try(events <- getEvent(iris, starttime, endtime, minmag=5.0))
if (inherits(result,"try-error")) {
  message(geterrmessage())
} else {
  bigOneIndex <- which(events$magnitude == max(events$magnitude))
  bigOne <- events[bigOneIndex[1],]
}

# Find US stations that are available within 10 degrees of arc of the 
# event location during the 15 minutes after the event
if (exists("bigOne")){
  start <- bigOne$time
  end <- start + 900
  result <- try(av <- getAvailability(iris, "US", "", "", "BHZ", start, end,
                      latitude=bigOne$latitude, longitude=bigOne$longitude,
                      minradius=0, maxradius=10))
  if (inherits(result,"try-error")) {
    message(geterrmessage())
  } else {
    
# Get the station the furthest East
    minLonIndex <- which(av$longitude == max(av$longitude))
    snclE <- av[minLonIndex,]
  }
}

# Get travel times to this station
result <- try(traveltimes <- getTraveltime(iris, bigOne$latitude, bigOne$longitude, bigOne$depth,
                             snclE$latitude, snclE$longitude))
if (inherits(result,"try-error")) {
  message(geterrmessage())
} else {

# Look at the list                             
  traveltimes

# Find the P and S arrival times
  pArrival <- start + traveltimes$travelTime[traveltimes$phaseName=="P"]
  sArrival <- start + traveltimes$travelTime[traveltimes$phaseName=="S"] 

# Get the BHZ signal for this station
  result <- try(st <- getDataselect(iris,snclE$network,snclE$station,
                    snclE$location,snclE$channel,
                    start,end))
  if (inherits(result,"try-error")) {
    message(geterrmessage())
  } else {

# Check that there is only a single trace
    length(st@traces)
# Plot the seismic trace and mark the "P" and "S" arrival times
    tr <- st@traces[[1]]
    plot(tr, subsampling=1) # need subsampling=1 to add vertical lines with abline()
    abline(v=pArrival, col='red')
    abline(v=sArrival, col='blue')
  }
}

```