Processing math: 45%

Spatial derivatives of Gaussian process models

Collin Erickson

2017-09-10

This vignette covers the derivatives of the surface created by a Gaussian process model with respect to the spatial dimensions. The other vignette has derivatives of the deviance (likelihood) with respect to the parameters.

Gradient of mean function

The mean function is ˆy(x)=ˆμ+rTR1(Y1ˆμ). r is the only part that depends on x, and is defined below, where K is the correlation function.

r=(r1(x),,rd(x))T ri(x)=K(z(x),z(ui)) ˆy(x)xi=rxiTR1(Y1ˆμ)

rjxi=K(z(x),z(uj))xi This value depends on the correlation function K. This will give the vector rxi which is calculates a single partial derivative, then the vector of these gives the gradient xˆy(x)

Hessian of mean function

The second derivatives can be calculated similarly 2ˆy(x)xixk=2rxixkTR1(Y1ˆμ)

Each element of this matrix is 2rjxixk=2K(z(x),z(uj))xixk

Gaussian correlation derivative

The equations above work for any correlation function, but then we need to have the derivatives of the correlation function with respect to the spatial variables.

K(z(x),z(uj))=exp(d=1θ(xuj)2)

K(z(x),z(uj))xi=2θi(xiuji)exp(d=1θ(xuj)2) The second derivative with respect to the same dimension is

2K(z(x),z(uj))x2i=(2θi+4θ2i(xiuji)2)exp(d=1θ(xuj)2)

The cross derivative is 2K(z(x),z(uj))xixk=4θiθk(xiuji)(xkujk)exp(d=1θ(xuj)2)

Gradient distribution

A big problem with using the gradient of the mean function of a GP is that it doesn’t give an idea of its distribution/randomness. The gradient could be zero in a region where the surface is not flat just because we don’t any information in that region yet.

Can we find the distribution of the gradient by taking a limit?

In d dimensions we

First start with the directional derivative in the direction of the ith component. ˆy(x)xi=lim

Now we find the joint distribution of $(x + e_i) $ and $ (x)$

Joint distribution

If (y,z) \sim N((\mu_1, \mu_2), \Sigma) then z|y \sim N(\mu_{z|y}, \Sigma_{z|y} ) \mu_{z|y}= \mu_2 + \Sigma_{zy}\Sigma_{yy}^{-1} (y - \mu_1) \Sigma_{z|y} = \Sigma_{zz} - \Sigma_{zy}\Sigma_{yy}^{-1}\Sigma_{yz}

Linear combination of normal variables is normal

E[\hat{y}(x + \delta e_i) - \hat{y}(x)] = (\Sigma_{z_1y} - \Sigma_{z_2y})\Sigma_{yy}^{-1} (y - \mu_1) = (r_1-r_2)R^{-1}(y-\mu_1)

E[\frac{\hat{y}(x + \delta e_i) - \hat{y}(x)}{\delta}] = (\frac{\Sigma_{z_1y} - \Sigma_{z_2y}}{\delta})\Sigma_{yy}^{-1} (y - \mu_1) = (\frac{r_1 - r_2}{\delta})R^{-1} (y - \mu_1)

\frac{\Sigma_{z_1y} - \Sigma_{z_2y}}{\delta} = \frac{\hat{\sigma}^2 (r_1 - r_2)}{\delta} The jth element of this is \lim_{\delta -> 0} \frac{R(x+\delta e_i, X_j) - R(x, X_j)}{\delta} = \frac{\partial R(x, X_j)}{\partial x_i} = \frac{\partial r_{(j)}}{\partial x_i} The mean of the gradient distribution is \frac{\partial r}{\partial x_i}^TR^{-1} (y - \mu_1) This is the same as the derivative of the mean function.

The variance is the more difficult part.

Variance of lincom

Var\big[ \frac{\hat{y}(x + \delta e_i) - \hat{y}(x)}{\delta} \big] = \frac{1}{\delta^2}Var\big[ \hat{y}(x + \delta e_i) - \hat{y}(x) \big] =\frac{1}{\delta^2}\big( Var\big[ \hat{y}(x + \delta e_i) \big] + Var\big[ \hat{y}(x) \big] + -2 Cov\big[ \hat{y}(x + \delta e_i), \hat{y}(x) \big] \big) Var\big[ \hat{y}(x + \delta e_i) |Y \big] = \Sigma_{11} - \Sigma_{1Y} \Sigma_{YY}^{-1} \Sigma_{Y1} Var\big[ \hat{y}(x) |Y \big] = \Sigma_{22} - \Sigma_{2Y} \Sigma_{YY}^{-1} \Sigma_{Y2} Cov\big[ \hat{y}(x + \delta e_i), \hat{y}(x)|Y \big] = Cov\big[ \hat{y}(x + \delta e_i), \hat{y}(x) \big] - Cov\big[ \hat{y}(x + \delta e_i), Y \big] \Sigma_{YY}^{-1} Cov\big[ Y, \hat{y}(x) \big] = \Sigma_{12} - \Sigma_{1Y} \Sigma_{YY}^{-1} \Sigma_{Y2} = \hat{\sigma}^2 \big(R\big[ \hat{y}(x + \delta e_i), \hat{y}(x) \big] + r_1 R^{-1} r_2 \big)

=\frac{1}{\delta^2}\big( \Sigma_{11} - \Sigma_{1Y} \Sigma_{YY}^{-1} \Sigma_{Y1} + \Sigma_{22} - \Sigma_{2Y} \Sigma_{YY}^{-1} \Sigma_{Y2} + -2 (\Sigma_{12} - \Sigma_{1Y} \Sigma_{YY}^{-1} \Sigma_{Y2}) =\frac{1}{\delta^2}\big( \Sigma_{11} + \Sigma_{22} -2 \Sigma_{12} +2 (\Sigma_{1Y}- \Sigma_{2Y}) \Sigma_{YY}^{-1} ( \Sigma_{1Y}-\Sigma_{Y2})

CONTINUE HERE USE THIS AS GUIDE http://mlg.eng.cam.ac.uk/mchutchon/DifferentiatingGPs.pdf

=\frac{1}{\delta^2}\big( Var\big[ \hat{y}(x + \delta e_i) \big] + Var\big[ \hat{y}(x) \big] + -2 Cov\big[ \hat{y}(x + \delta e_i), \hat{y}(x) \big] \big)

I could have just used this formula: Var[(1,-1)^T (z_1, z_2)] = (1,-1)^T Cov((z_1, z_2)) (1,-1)

\frac{\partial R(z_1, z_2)}{\partial \delta} =