--- title: "2. Guided Partial Least Squares (guided-PLS)" author: - name: Koki Tsuyuzaki affiliation: Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research email: k.t.the-answer@hotmail.co.jp date: "`r Sys.Date()`" bibliography: bibliography.bib package: guidedPLS output: rmarkdown::html_vignette vignette: | %\VignetteIndexEntry{2. Guided Partial Least Squares (guided-PLS)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction In this vignette, we consider a novel supervised dimensional reduction method guided partial least squares (guided-PLS). Test data is available from `toyModel`. ```{r data, echo=TRUE} library("guidedPLS") data <- guidedPLS::toyModel("Easy") str(data, 2) ``` You will see that there are three blocks in the data matrix as follows. ```{r data2, echo=TRUE, fig.height=6, fig.width=6} suppressMessages(library("fields")) layout(c(1,2,3)) image.plot(data$Y1_dummy, main="Y1 (Dummy)", legend.mar=8) image.plot(data$Y1, main="Y1", legend.mar=8) image.plot(data$X1, main="X1", legend.mar=8) ``` # Guided Partial Least Squares (guided-PLS) Here, suppose that we have two data matrices $X_1$ ($N \times M$) and $X_2$ ($S \times T$), and the row vectors of them are assumed to be centered. Since these two matrices have no common row or column, integration of them is not trivial. Such a data structure is called "diagonal" and known as a barrier to omics data integration [@diagonal]. Here is a simpler way to set up the problem; suppose that we have another set of matrices $Y_1$ ($M \times I$) and $Y_2$ ($T \times I$), which are the label matrices for $X_1$ and $X_2$, respectively. In guided-PLS, the data matrices $X_1$ and $X_2$ are projected into lower dimension via $Y_1$ and $Y_2$, and then PLS-SVD are performed against the $Y_{1} X_{1}$ and $Y_{2} X_{2}$ as follows: $$ \max_{W_{1},W_{2}} \mathrm{tr} \left( W_{1}^{T} X_{1}^{T} Y_{1}^{T} Y_{2} X_{2} W_{2} \right)\ \mathrm{s.t.}\ W_{1}^{T}W_{1} = W_{2}^{T}W_{2} = I_{K} $$ ## Basic Usage `guidedPLS` is performed as follows. ```{r plssvd, echo=TRUE} out <- guidedPLS(X1=data$X1, X2=data$X2, Y1=data$Y1, Y2=data$Y2, k=2) ``` ```{r plotplssvd, echo=TRUE, fig.height=4, fig.width=4} plot(rbind(out$scoreX1, out$scoreX2), col=c(data$col1, data$col2), pch=c(rep(2, length=nrow(out$scoreX1)), rep(3, length=nrow(out$scoreX2)))) legend("bottomleft", legend=c("XY1", "XY2"), pch=c(2,3)) ``` # Session Information {.unnumbered} ```{r sessionInfo, echo=FALSE} sessionInfo() ``` # References