---
title: "State Space Reconstruction (SSR)"
author: "Wenbo Lv"
date: "2025-05-16"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{SSR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---



## Spatial State Space Reconstruction (S-SSR)

Takens theory proves that for a dynamical system $\phi$, if its trajectory converges to an attractor manifold $M$â€”a bounded and invariant set of statesâ€”then there exists a smooth mapping between the system $\phi$ and its attractor $M$. Consequently, the time series observations of $\phi$ can be used to reconstruct the structure of $M$ through delay embedding.

According to the generalized embedding theorem, for a compact $d$-dimensional manifold $M$ and a set of observation functions $\langle h_1, h_2, \ldots, h_L \rangle$, the mapping $\psi_{\phi,h}(x) = \langle h_1(x), h_2(x), \ldots, h_L(x) \rangle$ is an embedding of $M$ when $L \geq 2d + 1$. Here, *embedding* refers to a one-to-one map that resolves all singularities of the original manifold. The observation functions $h_i$ can take the form of time-lagged values from a single time series, lags from multiple time series, or even completely distinct measurements. The former two are simply special cases of the third.

This embedding framework can be extended to *spatial cross-sectional data*, which lack temporal ordering but are observed over a spatial domain. In this context, the observation functions can be defined using the values of a variable at a focal spatial unit and its surrounding neighbors (known as *spatial lags* in spatial statistics). Specifically, for a spatial location $s$, the embedding can be written as:

$$
\psi(x, s) = \langle h_s(x), h_{s(1)}(x), \ldots, h_{s(L-1)}(x) \rangle,
$$

where $h_{s(i)}(x)$ denotes the observation function of the $i$-th order spatial lag unit relative to $s$. These spatial lags provide the necessary diversity of observations for effective manifold reconstruction. In practice, if a given spatial lag order involves multiple units, summary statistics such as the mean or directionally-weighted averages can be used as the observation function to maintain a one-to-one embedding.

## Usage examples

### An example of spatial lattice data

Load the `spEDM` package and its county-level population density data:


``` r
library(spEDM)

popd_nb = spdep::read.gal(system.file("case/popd_nb.gal",package = "spEDM"))
## Warning in spdep::read.gal(system.file("case/popd_nb.gal", package = "spEDM")):
## neighbour object has 4 sub-graphs
popd = readr::read_csv(system.file("case/popd.csv",package = "spEDM"))
## Rows: 2806 Columns: 7
## â”€â”€ Column specification â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
## Delimiter: ","
## dbl (7): x, y, popd, elev, tem, pre, slope
## 
## â„¹ Use `spec()` to retrieve the full column specification for this data.
## â„¹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
popd_sf = sf::st_as_sf(popd, coords = c("x","y"), crs = 4326)
popd_sf
## Simple feature collection with 2806 features and 5 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 74.9055 ymin: 18.2698 xmax: 134.269 ymax: 52.9346
## Geodetic CRS:  WGS 84
## # A tibble: 2,806 Ã— 6
##     popd  elev   tem   pre slope          geometry
##  * <dbl> <dbl> <dbl> <dbl> <dbl>       <POINT [Â°]>
##  1  780.     8  17.4 1528. 0.452 (116.912 30.4879)
##  2  395.    48  17.2 1487. 0.842 (116.755 30.5877)
##  3  261.    49  16.0 1456. 3.56  (116.541 30.7548)
##  4  258.    23  17.4 1555. 0.932  (116.241 30.104)
##  5  211.   101  16.3 1494. 3.34   (116.173 30.495)
##  6  386.    10  16.6 1382. 1.65  (116.935 30.9839)
##  7  350.    23  17.5 1569. 0.346 (116.677 30.2412)
##  8  470.    22  17.1 1493. 1.88  (117.066 30.6514)
##  9 1226.    11  17.4 1526. 0.208 (117.171 30.5558)
## 10  137.   598  13.9 1458. 5.92  (116.208 30.8983)
## # â„¹ 2,796 more rows
```

Embedding the variable `popd` from county-level population density:


``` r
v = embedded(popd_sf,"popd",E = 9,tau = 0,trend.rm = FALSE)
v[1:5,c(2,5,6)]
##           [,1]      [,2]     [,3]
## [1,]  440.7237  962.7204 1664.756
## [2,]  500.0166  919.6000 2408.766
## [3,]  494.4070 1435.0165 1958.686
## [4,] 1520.0814 1488.2727 2066.748
## [5,]  298.5497 2326.8429 1290.188
```


``` r
plot3D::scatter3D(v[,2], v[,5], v[,6], colvar = NULL, pch = 19,
                  col = "red", theta = 45, phi = 10, cex = 0.45,
                  bty = "f", clab = NA, tickmarks = FALSE)
## Warning: no DISPLAY variable so Tk is not available
```

![**Figure 1**. The reconstructed shadow manifolds for `popd`.](../man/figures/ssr/fig1-1.png)

### An example of spatial grid data

Load the `spEDM` package and its soil pollution data:


``` r
library(spEDM)

cu = terra::rast(system.file("case/cu.tif", package="spEDM"))
cu
## class       : SpatRaster 
## dimensions  : 131, 125, 3  (nrow, ncol, nlyr)
## resolution  : 5000, 5000  (x, y)
## extent      : 360000, 985000, 1555000, 2210000  (xmin, xmax, ymin, ymax)
## coord. ref. : North_American_1983_Albers 
## source      : cu.tif 
## names       :         cu, industry, ntl 
## min values  :   1.201607,    0.000,   0 
## max values  : 319.599274,    0.615,  63
```

Embedding the variable `cu` from soil pollution data:


``` r
r = spEDM::embedded(cu,"cu",E = 9,tau = 0,trend.rm = FALSE)
r[1:5,c(1,5,9)]
##          [,1]     [,2]     [,3]
## [1,] 14.80663 13.84810 15.40681
## [2,] 14.06487 14.07629 15.67661
## [3,] 14.07302 14.15578 15.78888
## [4,] 13.38009 14.11540 15.63029
## [5,] 13.81006 14.25782 15.30874
```


``` r
plot3D::scatter3D(r[,1], r[,5], r[,9], colvar = NULL, pch = 19,
                  col = "#e77854", theta = 45, phi = 10, cex = 0.45,
                  bty = "f", clab = NA, tickmarks = FALSE)
```

![**Figure 2**. The reconstructed shadow manifolds for `cu`.](../man/figures/ssr/fig2-1.png)

### Determining Optimal Embedding Dimension Using False Nearest Neighbours

The false nearest neighbours (FNN) method helps identify the appropriate embedding dimension for reconstructing the state space of a time series or spatial cross-sectional. A low embedding dimension that minimizes false neighbours is considered optimal.

Below are two examples applying the `fnn()` function. In both cases, `eps` is set to one-tenth of the standard deviation of the variable of interest, providing a relative threshold for identifying false neighbours.


``` r
fnn(popd_sf,"popd",
    eps = stats::sd(popd_sf$popd) / 10)
##       E:1       E:2       E:3       E:4       E:5       E:6       E:7       E:8 
## 0.9707769 0.6404134 0.4885959 0.4012830 0.3346401 0.3296507 0.3260870 0.3150392 
##       E:9 
## 0.3150392
```


``` r
fnn(cu,"cu",
    eps = stats::sd(terra::values(cu[["cu"]]),na.rm = TRUE) / 10)
##        E:1        E:2        E:3        E:4        E:5        E:6        E:7 
## 0.98577099 0.56042748 0.13215267 0.08103817 0.07731298 0.07517557 0.06998473 
##        E:8        E:9 
## 0.06632061 0.05673282
```