In this vignette, we consider a novel supervised dimensional reduction method guided partial least squares (guided-PLS).
Test data is available from toyModel
.
## List of 8
## $ X1 : int [1:100, 1:300] 86 101 95 106 113 85 88 103 106 84 ...
## $ X2 : int [1:200, 1:150] 106 81 91 101 91 105 111 81 113 105 ...
## $ Y1 : int [1:100, 1:50] 101 77 77 87 101 89 111 113 101 112 ...
## $ Y1_dummy: num [1:100, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
## $ Y2 : int [1:200, 1:50] 107 81 102 90 84 106 97 90 88 115 ...
## $ Y2_dummy: num [1:200, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
## $ col1 : chr [1:100] "#66C2A5" "#66C2A5" "#66C2A5" "#66C2A5" ...
## $ col2 : chr [1:200] "#66C2A5" "#66C2A5" "#66C2A5" "#66C2A5" ...
You will see that there are three blocks in the data matrix as follows.
suppressMessages(library("fields"))
layout(c(1,2,3))
image.plot(data$Y1_dummy, main="Y1 (Dummy)", legend.mar=8)
image.plot(data$Y1, main="Y1", legend.mar=8)
image.plot(data$X1, main="X1", legend.mar=8)
Here, suppose that we have two data matrices X1 (N × M) and X2 (S × T), and the row vectors of them are assumed to be centered. Since these two matrices have no common row or column, integration of them is not trivial. Such a data structure is called “diagonal” and known as a barrier to omics data integration (Argelaguet 2021).
Here is a simpler way to set up the problem; suppose that we have another set of matrices Y1 (M × I) and Y2 (T × I), which are the label matrices for X1 and X2, respectively.
In guided-PLS, the data matrices X1 and X2 are projected into lower dimension via Y1 and Y2, and then PLS-SVD are performed against the Y1X1 and Y2X2 as follows:
maxW1, W2tr(W1TX1TY1TY2X2W2) s.t. W1TW1 = W2TW2 = IK
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] fields_16.3 viridisLite_0.4.2 spam_2.11-1 guidedPLS_0.99.0
## [5] rmarkdown_2.29
##
## loaded via a namespace (and not attached):
## [1] cli_3.6.3 knitr_1.49 rlang_1.1.5 xfun_0.50
## [5] dotCall64_1.2 jsonlite_1.8.9 buildtools_1.0.0 htmltools_0.5.8.1
## [9] maketools_1.3.1 sys_3.4.3 sass_0.4.9 grid_4.4.2
## [13] evaluate_1.0.3 jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.10
## [17] lifecycle_1.0.4 compiler_4.4.2 irlba_2.3.5.1 Rcpp_1.0.14
## [21] maps_3.4.2.1 lattice_0.22-6 digest_0.6.37 R6_2.5.1
## [25] bslib_0.9.0 Matrix_1.7-2 tools_4.4.2 cachem_1.1.0