Utils
DropoutLayer
Bases: Module
Customized dropout layer where the nodes values are dropped with probability p.
Attributes:
Name | Type | Description |
---|---|---|
p |
float
|
The drop probability |
Source code in geospaNN/utils.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
EarlyStopping
Early stopping to stop the training when the loss does not improve after certain epochs.
Attributes:
Name | Type | Description |
---|---|---|
patience |
int
|
How many epochs to wait before stopping when loss is not improving |
min_delta |
float
|
Minimum difference between new loss and old loss for new loss to be considered as an improvement |
Source code in geospaNN/utils.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
LRScheduler
Learning rate scheduler. If the validation loss does not decrease for the
given number of patience
epochs, then the learning rate will decrease by
by given factor
.
Attributes:
Name | Type | Description |
---|---|---|
optimizer |
the optimizer we are using |
|
patience |
int
|
How many epochs to wait before updating the lr. Default being 5. |
min_lr |
float
|
Least lr value to reduce to while updating. |
factor |
float
|
Factor by which the lr should be updated, i.e. new_lr = old_lr * factor. |
Source code in geospaNN/utils.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
NNGP_cov
Bases: Sparse_B
A subclass of Sparse_B designed for the decorrelation using NNGP. The whole object is an NNGP approximation of the inverse square-root of a covariance matrix \Sigma. i.d. F^{-1/2}(I-B) ~ \Sigma^{-1/2} ...
Attributes:
Name | Type | Description |
---|---|---|
F_diag |
torch.Tensor A vector of length n representing the diagonal matrix F. |
Methods:
Name | Description |
---|---|
correlate |
Approximately correlate the matrix X to \Sigma^{1/2}X. |
decorrelate |
Approximately decorrelate the matrix X to \Sigma^{-1/2}X. |
Source code in geospaNN/utils.py
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
|
correlate(x)
Approximately correlate the matrix X to \Sigma^{1/2}X by calculating (I_B)^{-1}F^{1/2}X.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Input tensor X |
required |
Returns:
Name | Type | Description |
---|---|---|
x_cor |
Tensor
|
Correlated X |
Source code in geospaNN/utils.py
242 243 244 245 246 247 248 249 250 251 252 253 254 |
|
decorrelate(x)
Approximately decorrelate the matrix X to \Sigma^{-1/2}X by calculating F^{-1/2}(I_B)X.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Input tensor X |
required |
Returns:
Name | Type | Description |
---|---|---|
x_decor |
Tensor
|
Decorrelated X |
Source code in geospaNN/utils.py
256 257 258 259 260 261 262 263 264 265 266 267 268 |
|
Sparse_B
A sparse representation of the lower-triangular neighboring nxn matrix B_dense, where each row contains at most p non-zero values.
Attributes:
Name | Type | Description |
---|---|---|
B |
nxp array contains all non-zero values in B_dense. |
|
n |
The number of samples. |
|
neighbor_size |
i.e. k in the documentation, the largest number of non-zero values in each row of B_dense. |
|
Ind_list |
The nxp index array indicating the location where values in B was in B_dense. For example, the [i,j]'s index is k means that B_dense[i,k] = B[i,j]. |
Methods:
Name | Description |
---|---|
to_numpy |
Transform B to np.array |
to_tensor |
Transform B to torch.Tensor |
matmul |
Calculate the matrix product of B_dense[idx,:] and X |
invmul |
Calculate the matrix-vector product of B_dense^{-1} and y. Only used when the diagonal of B_dense is constantly 1. |
Fmul |
Return a new Sparse_B object where B_dense is replaced by the matrix product of F * B_dense. F_diag is the vector representation of the nxn diagonal matrix F. |
to_dense |
Return the dense form of B_dense as an np.array object. |
Source code in geospaNN/utils.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
|
Simulation(n, p, neighbor_size, fx, theta, coord=None, range=[0, 1], X_pattern='uniform', sparse=True)
Simulate spatial data
Simulate the spatial data based on the following model: Y(s) = f(X(s)) + w(s) + delta(s), where s are the spatial locations, X(s) is the spatial covariates, w(s) is the spatial effect (correlated noise), and delta(s) is the i.i.d random noise (nuggets).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Sample size. |
required |
p
|
int
|
Dimension of covariates. |
required |
fx
|
Callable
|
Function for the covariates' effect. |
required |
theta
|
tuple[float, float, float]
|
theta[0], theta[1], theta[2] represent sigma^2, phi, tau in the exponential covariance family, used for generating spatial random effect. |
required |
coord
|
Optional[tensor]
|
A nxd tensor as the locations where to simulate the data, if not specified, randomly sample from [range[0], range[1]]^2 square. |
None
|
range
|
tuple[float, float]
|
A tuple [a, b] as the range of spatial locations. The spatial coordinates are sampled uniformly from [a, b]^2. |
[0, 1]
|
sparse
|
Optional[bool]
|
To be implemented. Mainly interact with the rmvn() function. |
True
|
Returns:
Name | Type | Description |
---|---|---|
X |
Tensor
|
torch.Tensor nxp array sampled uniformly from [0,1] as the covariates. |
Y |
Tensor
|
torch.Tensor Length n vector as the observations consists of fixed covariate effect, spatial random effect, and random noise. |
coord |
Tensor
|
torch.Tensor Simulated spatial locations. |
cov |
Tensor | NNGP_cov
|
Covariance matrix based on the simulated coordinates. |
corerr |
Tensor
|
Simulated spatial random effect. |
Source code in geospaNN/utils.py
662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 |
|
distance(coord1, coord2)
Distance matrix between two sets of points
Calculate the pairwise distance between two sets of locations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
coord1
|
Tensor
|
The nxd coordinates array for the first set. |
required |
coord12
|
The nxd coordinates array for the second set. |
required |
Returns:
Name | Type | Description |
---|---|---|
dist |
Tensor
|
The distance matrix. |
Source code in geospaNN/utils.py
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 |
|
distance_np(coord1, coord2)
The numpy version of distance()
Calculate the pairwise distance between two sets of locations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
coord1
|
array
|
The nxd coordinates array for the first set. |
required |
coord2
|
array
|
The nxd coordinates array for the second set. |
required |
Returns:
Name | Type | Description |
---|---|---|
dists |
array
|
The distance matrix. |
See Also
distance : Distance matrix between two sets of points
Source code in geospaNN/utils.py
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 |
|
krig_pred(w_train, coord_train, coord_test, theta, neighbor_size=20, q=0.95)
Kriging prediction (Gaussian process regression) with confidence interval.
Kriging prediction on testing locations based on the observations on the training locations. The kriging procedure assumes the observations are sampled from a Gaussian process, which is paramatrized here to have an exponential covariance structure using theta = [sigma^2, phi, tau]. NNGP appriximation is involved for efficient computation of matrix inverse. The conditional variance (kriging variance) is used to build the confidence interval using the quantiles (a/2, 1-a/2). (see https://arxiv.org/abs/2304.09157, section 4.3 for more details.)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
w_train
|
Tensor
|
Training observations of the spatial random effect without any fixed effect. |
required |
coord_train
|
Tensor
|
Spatial coordinates of the training observations. |
required |
coord_test
|
Tensor
|
Spatial coordinates of the locations for prediction |
required |
theta
|
tuple[float, float, float]
|
theta[0], theta[1], theta[2] represent sigma^2, phi, tau in the exponential covariance family. |
required |
neighbor_size
|
Optional[int]
|
The number of nearest neighbors used for NNGP approximation. Default being 20. |
20
|
q
|
Optional[float]
|
Confidence coverage for the prediction interval. Default being 0.95. |
0.95
|
Returns:
Name | Type | Description |
---|---|---|
w_test |
Tensor
|
torch.Tensor The kriging prediction. |
pred_U |
Tensor
|
torch.Tensor Confidence upper bound. |
pred_L |
Tensor
|
torch.Tensor Confidence lower bound. |
See Also
Zhan, Wentao, and Abhirup Datta. 2024. “Neural Networks for Geospatial Data.” Journal of the American Statistical Association, June, 1–21. doi:10.1080/01621459.2024.2356293.
Source code in geospaNN/utils.py
1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 |
|
make_bf(coord, theta, neighbor_size=20)
Obtain NNGP approximation with implicit covariance matrix
Find the upper triangular matrix B and diagonal matrix F such that (I-B)'F^{-1}(I-B) appriximate the precision matrix (inverse of the covariance matrix). (see https://arxiv.org/abs/2102.13299 for more details.) Here only coordinates and spatial parameters are needed to represent the exponential covariance implicitly, thus being more memory-efficient than make_bf_from_cov.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
coord
|
Tensor
|
The nxd covariate array. |
required |
neighbor_size
|
Optional[int]
|
The number of nearest neighbors used used for NNGP approximation. Default being 20. |
20
|
theta
|
tuple[float, float, float]
|
theta[0], theta[1], theta[2] represent sigma^2, phi, tau in the exponential covariance family. |
required |
Returns:
Name | Type | Description |
---|---|---|
I_B |
Sparse_B
|
The sparse representation of I-B. |
F |
Tensor
|
The vector representation of the diagonal matrix. |
See Also
make_bf_from_cov : Obtain NNGP approximation of a covariance matrix Datta, Abhirup. "Sparse nearest neighbor Cholesky matrices in spatial statistics." arXiv preprint arXiv:2102.13299 (2021).
Source code in geospaNN/utils.py
472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 |
|
make_bf_from_cov(cov, neighbor_size)
Obtain NNGP approximation of a covariance matrix
Find the upper triangular matrix B and diagonal matrix F such that (I-B)'F^{-1}(I-B) appriximate the precision matrix (inverse of the covariance matrix). The level of approximation increase with the neighbor size. When using the full neighbor, the NNGP appriximation degrade to the Cholesky decomposition. (see https://arxiv.org/abs/2102.13299 for more details.)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cov
|
Tensor
|
The nxn covariance matrix. |
required |
neighbor_size
|
int
|
The number of nearest neighbors used for NNGP approximation. |
required |
Returns:
Name | Type | Description |
---|---|---|
I_B |
Sparse_B
|
Sparse_B The sparse representation of I-B. |
F |
Sparse_B
|
torch.Tensor The vector representation of the diagonal matrix. |
See Also
make_bf : Obtain NNGP approximation with implicit covariance matrix Datta, Abhirup, et al. "Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets." Journal of the American Statistical Association 111.514 (2016): 800-812.
Datta, Abhirup. "Sparse nearest neighbor Cholesky matrices in spatial statistics." arXiv preprint arXiv:2102.13299 (2021).
Source code in geospaNN/utils.py
417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 |
|
make_cov(coord, theta, NNGP=True, neighbor_size=20)
Compose covariance matrix.
Compose a covariance matrix in the exponential covariance family using the coordinates and spatial parameters. NNGP approximation is introduced for efficient representation. (see https://arxiv.org/abs/2102.13299 for more details.)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
coord
|
Tensor
|
The nxd covariate array. |
required |
theta
|
tuple[float, float, float]
|
theta[0], theta[1], theta[2] represent sigma^2, phi, tau in the exponential covariance family. |
required |
NNGP
|
Optional[bool]
|
Whether use NNGP approximation (recommended and used by default). |
True
|
neighbor_size
|
int
|
Number of nearest neighbors used for NNGP approximation, default value is 20. |
20
|
Returns:
Name | Type | Description |
---|---|---|
cov |
Tensor
|
A covariance matrix as torch.Tensor (dense representation) or NNGP_cov (sparse representation). |
See Also
Datta, Abhirup. "Sparse nearest neighbor Cholesky matrices in spatial statistics." arXiv preprint arXiv:2102.13299 (2021).
Source code in geospaNN/utils.py
625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 |
|
make_cov_full(dist, theta, nuggets=False)
Compose covariance matrix from the distance matrix with dense representation.
Compose a covariance matrix in the exponential covariance family (other options to be implemented) from the distance matrix. The returned object class depends on the input distance matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dist
|
Tensor | ndarray
|
The nxn distance matrix |
required |
theta
|
tuple[float, float, float]
|
theta[0], theta[1], theta[2] represent sigma^2, phi, tau in the exponential covariance family. |
required |
nuggets
|
Optional[bool]
|
Whether to include nuggets term in the covariance matrix (added to the diagonal). |
False
|
Returns:
Name | Type | Description |
---|---|---|
cov |
Tensor | ndarray
|
A covariance matrix. |
Source code in geospaNN/utils.py
585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 |
|
make_graph(X, Y, coord, neighbor_size=20, Ind_list=None)
Compose the data with graph information to work on.
This function connects each node to its nearest neighbors and records the edge indexes in two forms for the downstream graph operations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
nxp array of the covariates. |
required |
Y
|
Tensor
|
Length n vector as the response. |
required |
coord
|
Tensor
|
nxd array of the coordinates |
required |
neighbor_size
|
Optional[int]
|
The number of nearest neighbors used for NNGP approximation. Default being 20. |
20
|
Ind_list
|
Optional
|
An optional index list. If provided, ommit the make_rank() step in the function. |
None
|
Returns:
Name | Type | Description |
---|---|---|
data |
Data
|
torch_geometric.data.Data Data that can be processed by the torch_geometric framework. data.x contains the covariates array, data.y contains the response vector, data.pos contains the spatial coordinates, data.edge_index contains the indexes of form [i,j] where location j is in the nearest neighbor of location i. data.edge_attr contains the concatenated numbering of the neighbors. For each location, the numbering is of the form [0, 1, ... , k] where k is the number of the nearest neighbors. This attribute is mainly used in messaging passing. |
See Also: make_rank : Compose the nearest neighbor index list based on the coordinates. torch_geometric.data.Data : A data object describing a homogeneous graph.
Source code in geospaNN/utils.py
790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 |
|
make_rank(coord, neighbor_size, coord_ref=None)
Compose the nearest neighbor index list based on the coordinates.
Find the indexes of nearest neighbors in reference set for each location i in the main set. The index is based on the increasing order of the distances between ith location and the locations in the reference set.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
coord
|
Tensor
|
The nxd coordinates array of target locations. |
required |
neighbor_size
|
int
|
|
required |
coord_ref
|
Optional
|
The n_refxd coordinates array of reference locations. If None, use the target set itself as the reference. (Any location's neighbor does not include itself.) |
None
|
Returns:
Name | Type | Description |
---|---|---|
rank_list |
ndarray
|
A nxp array. The ith row is the indexes of the nearest neighbors for the ith location, ordered by the distance. |
Source code in geospaNN/utils.py
320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 |
|
rmvn(mu, cov, sparse=True)
Randomly generate sample from multivariate normal distribution
Generate random sample from a multivariate normal distribution with specified mean and covariance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mu
|
Tensor
|
The additional mean of multivariate normal of length n. |
required |
cov
|
Tensor | NNGP_cov
|
|
required |
sparse
|
Optional[bool]
|
Designed for sparse representation, not implemented yet. |
True
|
Returns:
Name | Type | Description |
---|---|---|
sample |
Tensor
|
A random sample from the multivariate normal distribution. |
Source code in geospaNN/utils.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
|
spatial_order(X, Y, coord, method='max-min', numpropose=2, seed=2024)
Spatial ordering for data
Order the data according to their spatial locations. A spatial ordering is necessary for NNGP to represent a valid spatial process. (Datta et.al 2016) Method 'coord-sum' stands for a simple spatial ordering by the summation of coordinates. Method 'max-min' stands for the max-min ordering based on Euclidean distance. "Basically, this ordering starts at a point in the center, then adds points one at a time that maximize the minimum distance from all previous points in the ordering." (Katzfuss & Guinness 2021)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
nxp array of the covariates. |
required |
Y
|
Tensor
|
Length n vector as the response. |
required |
coord
|
Tensor
|
nxd array of the coordinates |
required |
method
|
str
|
Method 'coord-sum' stands for a simple spatial ordering by the summation of coordinates. Method 'max-min' stands for the max-min ordering based on Euclidean distance. (Katzfuss & Guinness 2021) |
'max-min'
|
Returns:
Type | Description |
---|---|
Tuple[Tensor, Tensor, Tensor, Tensor]
|
Ordered X, Ordered Y, Ordered coordinates, order |
See Also
Datta, Abhirup, et al. "Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets." Journal of the American Statistical Association 111.514 (2016): 800-812. Katzfuss, Matthias & Guinness, Joseph. "A General Framework for Vecchia Approximations of Gaussian Processes." Statist. Sci. 36 (1) 124 - 141, February 2021. https://doi.org/10.1214/19-STS755
Source code in geospaNN/utils.py
725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 |
|
split_data(X, Y, coord, neighbor_size=20, val_proportion=0.2, test_proportion=0.2)
Split the data into training, validation, and testing parts and add the graph information respectively.
This function split the data with a user specified proportions and build the graph information. The output are the data sets that can be directly processed within the torch_geomtric framework.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
nxp array of the covariates. |
required |
Y
|
Tensor
|
Length n vector as the response. |
required |
coord
|
Tensor
|
nxd array of the coordinates |
required |
neighbor_size
|
Optional[int]
|
The number of nearest neighbors used for NNGP approximation. Default being 20. |
20
|
val_proportion
|
float
|
The proportion of training set splitted as validation set. |
0.2
|
test_proportion
|
float
|
The proportion of whole data splitted as testing set. |
0.2
|
Returns:
Type | Description |
---|---|
tuple[Data, Data, Data]
|
data_train, data_val, data_test |
See Also
make_graph : Compose the data with graph information to work on. torch_geometric.data.Data : A data object describing a homogeneous graph.
Source code in geospaNN/utils.py
850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 |
|
split_loader(data, batch_size=None)
Create mini-batches for GNN training
This functions further split a data for mini-batch training of GNNs on large-scale graphs where full-batch training is not feasible. Note that only source nodes are splited into batches, each batch will contain edges originate from those source nodes. There might be interactions among the target nodes across batches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Data
|
Data with graph information (output of split_data() or make_graph()). |
required |
batch_size
|
Optional[int]
|
Size of mini-batches, default value being n/20. |
None
|
Returns:
Name | Type | Description |
---|---|---|
loader |
DataLoaders
|
A dataloader that can be enumerated for mini-batch training. |
See Also
make_graph : Compose the data with graph information to work on. split_data : Split the data into training, validation, and testing parts and add the graph information respectively. torch_geometric.loader : A data loader that performs neighbor sampling as introduced in the
Hamilton, Will, Zhitao Ying, and Jure Leskovec. "Inductive representation learning on large graphs." Advances in neural information processing systems 30 (2017).
Source code in geospaNN/utils.py
897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 |
|
theta_update(w, coord, theta0=None, neighbor_size=20, BRISC=True, min_tau=0.001)
Update the spatial parameters using maximum likelihood.
This function updates the spatial parameters by assuming the observations are from a Gaussian Process with exponential covariance function. Spatial coordinates and initial values of theta are input for building the covariance. By default, L-BFGS-B algorithm is used to optimize the likelihood. Since the likelihood is computed repeatedly, NNGP approximation is used for efficient computation of the log-likelihood, with a default neighbor size being 20. Note that we currently use the Cpp implementation of L-BFGS-B from the R-package BRISC (Saha & Datta 2018) as default, which is faster than the python version. In the future, we will introduce a python wrapper for the Cpp implementation to get free of R component.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
theta0
|
Optional[Tensor]
|
Initial values of the spatial parameters. theta[0], theta[1], theta[2] represent sigma^2, phi, tau in the exponential covariance family. |
None
|
w
|
Tensor
|
Length n observations of the spatial random effect without any fixed effect. |
required |
coord
|
Tensor
|
nx2 spatial coordinates of the observations. |
required |
neighbor_size
|
Optional[int]
|
The number of nearest neighbors used for NNGP approximation. Default being 20. |
20
|
BRISC
|
Optional[bool]
|
Whether to use the optimization from BRISC. Default being True. Setting as False will largely increase the running time. |
True
|
min_tau
|
Optional[float]
|
A minimum value of tau to avoid sinularity issue in matrix inversion. Default being 0.001. |
0.001
|
Returns:
Name | Type | Description |
---|---|---|
theta_updated |
array
|
An updated tuple of the spatial paramaters. |
See Also
Datta, Abhirup, et al. "Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets." Journal of the American Statistical Association 111.514 (2016): 800-812.
Zhu, Ciyou, et al. "Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization." ACM Transactions on mathematical software (TOMS) 23.4 (1997): 550-560.
Saha, Arkajyoti, and Abhirup Datta. "BRISC: bootstrap for rapid inference on spatial covariances." Stat 7.1 (2018): e184.
Source code in geospaNN/utils.py
939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 |
|
plot_PDP(model, X, names=[], save_path='./', save=False, split=False)
Partial dependency plot for model on the data.
A Partial Dependence Plot (PDP) is a visualization tool used to illustrate the relationship between a selected feature and the predicted outcome of a machine learning model, while averaging out the effects of other features. This helps to understand the marginal influence of a single feature on the model's predictions in a more interpretable way.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Usually a model in nngls class. Can take model with .() method that take tensor X as input and predicted scalar value Y as output. (to implement for more models) |
required | |
X
|
tensor
|
nxp array of the covariates. |
required |
names
|
Optional[list]
|
List of names for variable, if not specified, use "variable 1" to "variable p". |
[]
|
save_path
|
Optional[str]
|
Directory to save the plot. Defaults to "./". |
'./'
|
save
|
bool
|
Whether to save the PDPs to the working directory. Default False. |
False
|
split
|
bool
|
Whether to return the PDPs as a list of PartialDependenceDisplay object for single variabls or a whole |
False
|
Returns:
Type | Description |
---|---|
A sklearn.inspection.PartialDependenceDisplay object contains PDPs for each variable. |
Source code in geospaNN/visualize.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|
plot_PDP_list(model_list, model_names, X, feature_names=[], save_path='./', save=False, split=True)
Partial dependency plot for model on the data.
A Partial Dependence Plot (PDP) is a visualization tool used to illustrate the relationship between a selected feature and the predicted outcome of a machine learning model, while averaging out the effects of other features. This helps to understand the marginal influence of a single feature on the model's predictions in a more interpretable way.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_list
|
list
|
A list of models described in plot_PDP. |
required |
model_names
|
list
|
A list of model names for PDP visualization, expect the same length as model_list. |
required |
X
|
tensor
|
nxp array of the covariates for PDP integration. |
required |
feature_names
|
Optional[list]
|
List of names for variable, if not specified, use "variable 1" to "variable p". |
[]
|
save_path
|
Optional[str]
|
Directory to save the plot. Defaults to "./". |
'./'
|
save
|
bool
|
Whether to save the PDPs to the working directory. Default False. |
False
|
split
|
bool
|
Whether to save the PDP's for different features seperately or as one figure. The default is True. |
True
|
Returns:
Type | Description |
---|---|
A sklearn.inspection.PartialDependenceDisplay object contains PDPs for each variable. |
Source code in geospaNN/visualize.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|
plot_log(training_log, theta, save_path='./', save=False)
Output visualization for NN-GLS training.
This is a simple visualization of the training log from geospaNN.nngls_train.train()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
training_log
|
list
|
|
required |
theta
|
tuple[float, float, float]
|
theta[0], theta[1], theta[2] represent sigma^2, phi, tau in the exponential covariance family. |
required |
save_path
|
Optional[str]
|
Directory to save the plot. Defaults to "./". |
'./'
|
save
|
bool
|
Whether to save the PDPs to the working directory. Default False. |
False
|
Source code in geospaNN/visualize.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
|
spatial_plot_surface(variable, coord, title='Variable', save_path='./', file_name='spatial_surface.png', grid_resolution=1000, method='CloughTocher', cmap='viridis', show=False)
Plots a smooth surface for spatial data.
The function interpolates scattered data onto a regular grid and generates a smooth surface plot. The resulting plot is saved as a PNG file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
variable
|
array - like
|
Values to plot, corresponding to the coordinates. |
required |
coord
|
array - like
|
Coordinates of shape (n, 2) where each row represents a point (x, y). |
required |
title
|
str
|
Title of the plot. Defaults to "Variable". |
'Variable'
|
save_path
|
str
|
Directory to save the plot. Defaults to "./". |
'./'
|
file_name
|
str
|
Name of the saved plot file. Defaults to "spatial_surface.png". |
'spatial_surface.png'
|
grid_resolution
|
int
|
Resolution of the interpolation grid. Higher values produce finer surfaces. Defaults to 100. |
1000
|
cmap
|
str
|
Colormap to use for the plot. Defaults to 'viridis'. |
'viridis'
|
Raises:
Type | Description |
---|---|
ValueError
|
If the lengths of |
Returns:
Name | Type | Description |
---|---|---|
None |
None
|
The function saves the plot to a file and does not return any value. |
Example
import numpy as np coord = np.random.uniform(0, 10, (50, 2)) variable = np.sin(coord[:, 0]) + np.cos(coord[:, 1]) spatial_plot_surface(variable, coord, title="Example Plot", grid_resolution=200)
Source code in geospaNN/visualize.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|