---
title: "3. Analysis of acoustic-Doppler data"
author: "Dan Kelley (https://orcid.org/0000-0001-7808-5911)"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    number_sections: true
    fig_caption: yes
    fig_width: 3.5
    fig_height: 3
    dpi: 72
vignette: >
  %\VignetteIndexEntry{3. Analysis of acoustic-Doppler data}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

<!-- HOW TO BUILD THE VIGNETTE. -->
<!-- Look in oce.Rmd for instructions. -->

```{r, echo=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

**Abstract.** This vignette explains the basics of using the oce package to
handle measurements made by both mooring and ship based acoustic Doppler
profilers.  As in the related vignette about working with [data-quality
flags](E_flags.html), the material is organized to reflect typical workflow.  See
[the main oce vignette](A_oce.html) for general information about oce.  The
sample code provided here relies on the oce package, loaded as follows.

```{r}
library(oce)
```

# Reading adp files

Acoustic Doppler profiling instruments are called ADCPs by some manufacturers
and ADPs by others.  The class name `adp` is used in oce, to handle both types.
It is important to realize that there are *many* data formats for data of this
type, and that the files are normally in a dense binary format that cannot be
read with a text editor. It is necessary to use software to work with such
files. Although oce can handle many file formats, and although its authors do
their best to keep up with changes to file formats (at least for popular
instruments), users will sometimes be forced to use manufacturer software to
read the data, or at least convert to a format that can be manually read into
R, and then converted to an `adp` object using the `as.adp()` function.

Suppose `f` is a character string naming a data file. In many cases,
```{r eval=FALSE}
d <- read.oce(f)
```
will read the file, because `read.oce` will look inside the file to try to
guess the format. If this does not work, one ought to try e.g.
```{r eval=FALSE}
d <- read.adp(f)
```
which, again, will try to determine which subtype of adp file has been
provided.  If that fails, it might help to try a specialized function, such as
`read.adp.rdi()` for an RDI file, `read.adp.nortek()` for a Nortek file, or
`read.adp.sontek()` for a Sontek file. (Actually, each of these can handle a
variety of sub-formats).

If any of these `read.` functions is called with just a single argument naming
the file, then the entire dataset will be read in. This can be slow for large
files (e.g. taking of order 0.1s per Mb), and so it can help to use the `by`
argument, e.g. for a 138Mb file on the author's computer,
```{r eval=FALSE}
f <- "/data/archive/sleiwex/2008/moorings/m09/adp/rdi_2615/raw/adp_rdi_2615.000"
dall <- read.oce(f)
```
takes 15.2s of user time, but
```{r eval=FALSE}
d100 <- read.oce(f, by = 100)
```
takes 0.2s. It is also possible to window the data, using the `from` and `to`
arguments, which (like `by`) can be integers that count ensembles (or "profiles")
or a POSIX time (see `?read.adp`). For example, the
adp dataset provided by oce was read with
```{r eval=FALSE}
read.oce(f,
    from = as.POSIXct("2008-06-26", tz = "UTC"),
    to = as.POSIXct("2008-06-27", tz = "UTC"),
    by = "60:00",
    latitude = 47.88126, longitude = -69.73433
)
```
with `f` as defined above. This also illustrates that it is possibly to specify
the location of an instrument; the further processing that was done to create
`data(adp)` will be explained in due course.

+**Note**: Users who are not familiar with ADP data may noticed that there is a
combination of many file types when dealing with ADP data. Some of these
include: .ENR, which are raw data only received from the ADP, .ENS, similar to
.ENR but also contain data merged from other file types without processing,
.ENX, merged data that are processed including calculations to transformed data
including east, north, and vertical velocities, .STA, which are a short term
average of measurements, and .LTA, which are a long term average of
measurements.


# Summarizing adp data

The generic function `summary` provides a useful overview of an adp data
object.  For example,
```{r results="hide"}
data(adp)
summary(adp)
```
(results not shown \ldots the goal is to encourage the user to try this!)
shows the sort of information normally present in an RDI file. Other
manufacturer types will have somewhat different summaries, because both the
metadata and the data differ from instrument to instrument. In all cases, there
will be a vector holding time (named `time`), another holding the distance from
instrument to the centre of a measurement bin, measured along a line
equidistant between the beams (named `distance`), along with an array
holding velocity (named `v`). (Other arrays may include, as in this RDI case,
`q` for data quality, `a` for backscatter amplitude, and `g` for percentage
goodness.)

Note that the summary reveals that instrument had been set up to record in
`beam` coordinates, but that processing had been done to convert to `enu`
(east-north-up) coordinates.

**Exercise 1.** Determine the name of the `metadata` item that holds the
coordinate system of an adp object.

After completing Exercise 1, the user will notice that both `latitude` and
`longitude` are available in the metadata for this built-in, moored, adp
dataset. It should be noted that if this dataset was a ship-based adp dataset,
then `latitude` and `longitude` would not be included. Instead, the data would
include `firstLatitude`, `firstLongitude`, `lastLatitude`, and `lastLongitude`,
which provide an indication of the ship drift during sampling.

# Changing coordinate systems

It is critical for analysts to be aware of the coordinate system of an adp
dataset, simply because those systems are so different. In what oce calls the
`beam` system, velocities in each of the acoustic beams are measured along
the slant angle of the beam. These data may be combined (using a
transformation matrix which is part of the metadata displayed with the
`summary()` function) into what oce calls a `xyz` system that is Cartesian and
fixed to the instrument. Since instruments record heading, pitch, and roll, the
`xyz` velocities can be converted easily to `enu` velocities.

In some applications, the scientific focus is primarily on the `enu` data, but
it is a good idea to calculate and study the data at all stages of processing.
Thus, although the `toEnu()` function will convert data measured in any
coordinate system into an `enu` variant, many analysts will do something along
the lines of
```{r eval=FALSE}
beam <- read.oce(f)
xyx <- beamToXyz(beam)
enu <- xyzToEnu(xyz, declination = -18.1)
```

where the declination illustrated is the value used in creating the `data(adp)`
dataset. (Compensating for declination is necessary because compasses are
calibrated at a particular place and time, and compasses have no way of
"knowing" the local angle between true north and magnetic north.)

# Plotting

The generic `plot()` function is specialized to handle adp objects in a variety
of simple but helpful ways. The type of plot is determined by the `which`
argument. The default for `which` is to create a multi-panel image plot that
shows time-space dependence for each velocity component. The number of panels
depends on the number of beams used by the instrument, and each one has a label
indicating the component name. For example,
```{r fig.height=7, fig.width=4.5, dev.args=list(pointsize=13)}
#| fig.alt: >
#|   Default plot of an adp object.
plot(adp)
```

has velocity panels named `east`, `north` and `up`, along with one
named `error`, which is an estimate of the velocity-inference error.

Options to this plot variant can be used to control how distance from
instrument is shown (which can be useful in visualizing data from upward- and
downward-aligned instruments), what colours are used to represent the signals,
etc.

Dozens of other plot types are also provided; see `?"plot,adp-method"` for
details -- but be warned that a complex plot requires many arguments!

When dealing with ship-based adp data, in order to get the true `east`,
`north`, and `up` velocity, the user must remove the speed of the ship. To do
this, use the `subtractBottomVelocity()` function, which subtracts the bottom
tracking velocities from an adp object.

**Exercise 2.** Create a u-v scatterplot.

**Exercise 3.** Write the steps to plot the `east`, `north`, and `up` velocity
without the speed of the ship for a file named
`COR2019002_20190818T064815_007_000000.ENS`.

# Subsetting

Adp data can be subsetted in a wide variety of ways, e.g.
```{r fig.height=7, fig.width=4.5, dev.args=list(pointsize=13)}
#| fig.alt: >
#|   Plot of a subset of an adp object.
plot(subset(adp, distance < 20))
```

plots only data within 20m of the instrument.  See `?"subset,adp-method"` for
more on subsetting.

**Exercise 4.** Plot the adp data for the first half of the sampling interval.

# Accessing data within adp objects

All oce objects share a common framework, with so-called "slots" for
metadata, data, and processing log. It is possible to examine, and modify, the
elements of each of these slots using base R constructs, but it is usually
better to use oce accessor functions.  The details are provided with
`?"[[,adp-method"` and `?"[[<-,adp-method"`, but a few examples should suffice
to sketch the general pattern.

For example,
```{r}
time <- adp[["time"]]
distance <- adp[["distance"]]
```
extracts the times and distances of the profiles within `data(adp)`, while the
array holding the velocities is extract with
```{r}
v <- adp[["v"]]
```

Altering values uses the `[[` to the left of the assignment operator, but this
is not commonly required in simple work, so it is best for readers to focus
first on accessing data. The exercises may help with this.

Some of the fields contained in adp objects, particularly the amplitude,
correlation, and percent good fields (`a`, `q`, and `g`) are stored as
`raw`-type. This is done primarily to save memory space, as expanded all of
those fields into `numeric` types can be slow and they get large quickly.
However, to actually work with the numbers, users can extract a "numeric"
version by supplying the second argument to the `[[` function, e.g.
```{r}
a <- adp[["a", "numeric"]]
```

This has the advantage of returning an object with the same matrix dimensions
as the original (note that functions like `as.numeric()` typically discard
dimension information, thereby converting an array to a vector).

**Exercise 5.** Plot a time series of mid-depth and depth-averaged eastward velocities.

**Exercise 6.** Perform an eigen-analysis of eastward and northward velocity.

**Exercise 7.** Plot a time series of pressure variations, to reveal tidal
height.


# Solutions to exercises

**Exercise 1.** Determine the name of the `metadata` item that holds the
coordinate system of an adp object.

**Solution.**

With adp as loaded in the text, we may the names in the metadata with:
```{r results="hide"}
sort(names(adp[["metadata"]]))
```
(results not shown), after which a keen eye will notice that two elements
relate to coordinates:
```{r}
adp[["originalCoordinate"]]
adp[["oceCoordinate"]]
```
and comparison with the output of `summary(adp)` reveals that the former is the
coordinate system in which the data were originally measured, while the
latter is the coordinate system of the transformed data. Using
```{r}
processingLogShow(adp)
```
reveals the steps in the data transformation.


**Exercise 2.** Create a u-v scatterplot.

**Solution.**

```{r fig.width=3, fig.height=3, dev.args=list(pointsize=9)}
#| fig.alt: >
#|   A u-v scatterplot.
plot(adp, which = "uv")
```

**Exercise 3.** Write the steps to plot the `east`, `north`, and `up` velocity
without the speed of the ship for a file named
`COR2019002_20190818T064815_007_000000.ENS`.

**Solution.**

```{r, eval=FALSE, message=FALSE, warning=FALSE, error=FALSE}
#| fig.alt: >
#|   Plot velocities after removing ship motion.
library(oce)
adcp <- read.adp("COR2019002_20190818T064815_007_000000.ENS")
enu <- toEnu(adcp)
removeShipSpeed <- subtractBottomVelocity(enu)
plot(removeShipSpeed, which = 1:3)
```


**Exercise 4.** Plot the adp data for the first half of the sampling interval.

**Solution.** (results not shown)

```{r eval=FALSE}
#| fig.alt: >
#|   Plot just the first part of an adp record.
plot(subset(adp, time < median(adp[["time"]])))
```


**Exercise 5.** Plot a time series of mid-depth and depth-averaged eastward velocities.

**Solution.**
```{r fig.height=2, dev.args=list(pointsize=9)}
#| fig.alt: >
#|   Plot mid-depth and depth-averaged eastward velocities.
time <- adp[["time"]]
v <- adp[["v"]]
# The second index is for bin number, the third for beam number
midIndex <- dim(v)[2] / 2
eastMid <- v[, midIndex, 1] # third index is beam
distance <- adp[["distance"]][midIndex]
oce.plot.ts(time, eastMid, ylab = "Eastward velocity [m/s]")
# Depth mean; note that na.rm, is passed by apply() to mean()
eastMean <- apply(v[, , 1], 1, mean, na.rm = TRUE)
lines(time, eastMean, col = 2)
```

Here, `oce.plot.ts()` is used instead of the basic R function `plot()`, because
it provides a margin note on the detailed time interval.


**Exercise 6.** Perform an eigen-analysis of eastward and northward velocity.

**Solution.**
```{r}
u <- adp[["v"]][, , 1]
v <- adp[["v"]][, , 2]
ok <- is.finite(u) & is.finite(v) # remove NA values
u <- u[ok]
v <- v[ok]
eigen(cov(data.frame(u, v)))
```

Note that this action is equivalent to principal component analysis, and it
might be wiser to use
```{r}
pr <- prcomp(data.frame(u, v))
```

to do the analysis, which yields the same principal axes
```{r}
pr
```

but has the advantage of having associated methods, including for plotting.



**Exercise 7.** Plot a time series of pressure variations, to reveal tidal
height.

**Solution.**
```{r fig.height=2, dev.args=list(pointsize=9)}
#| fig.alt: >
#|   Plot pressure time series.
time <- adp[["time"]]
pressure <- adp[["pressure"]]
oce.plot.ts(time, pressure)
```

As a matter of interest, a tidal analysis may be done with
```{r}
m <- tidem(as.sealevel(pressure, time))
```

where the list of undetermined tidal constituents is long, because this tidal
record is so short (see `?tidem`). The fitted constituents are as follows
```{r}
summary(m)
```

(Note that it fitted for M2, but not S2, because the Rayleigh criterion
prevents inferring both, and the conventional tidal analysis favours M2 over S2
if the time series is too short to fit for more than one semi-diurnal
constituent.)

A useful step might be to draw a tidal fit along with the data. It makes sense
to construct a finer time grid, to get a smooth curve, as in the following (see
`?predict.tidem`).
```{r fig.height=2, dev.args=list(pointsize=9)}
#| fig.alt: >
#|   Plot tidal prediction.
oce.plot.ts(time, pressure, type = "p", col = "blue")
timePredict <- seq(min(time), max(time), length.out = 200)
pressurePredict <- predict(m, timePredict)
lines(timePredict, pressurePredict, col = "red")
```