--- title: "1. Independent Component Analysis (ICA)" author: - name: Koki Tsuyuzaki affiliation: Department of Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University email: k.t.the-answer@hotmail.co.jp date: "`r Sys.Date()`" bibliography: bibliography.bib package: iTensor output: rmarkdown::html_vignette vignette: | %\VignetteIndexEntry{1. Independent Component Analysis (ICA)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction In this vignette, we consider approximating a data matrix as a product of multiple low-rank matrices (a.k.a., factor matrices). Test data is available from `toyModel`. ```{r data, echo=TRUE} library("iTensor") data1 <- iTensor::toyModel("ICA_Type1") data2 <- iTensor::toyModel("ICA_Type2") data3 <- iTensor::toyModel("ICA_Type3") data4 <- iTensor::toyModel("ICA_Type4") data5 <- iTensor::toyModel("ICA_Type5") dim(data1$X_observed) dim(data2$X_observed) dim(data3$X_observed) dim(data4$X_observed) dim(data5$gene) # N < P systems ``` Summary of these data is as follows: 1. ICA with time-independent sub-Gaussian data 2. ICA with time-independent super-Gaussian data 3. ICA with data mixed with signals having no time dependence and different kurtosis 4. ICA with time-dependent data 5. IPCA in N < P systems You will see that the stuctures of these data as follows. ```{r data2, echo=TRUE, fig.height=4, fig.width=4} pairs(data1$X_observed, main="data1") pairs(data2$X_observed, main="data2") pairs(data3$X_observed, main="data3") pairs(data4$X_observed, main="data4") dim(data5$gene) ``` To perform and visualize all the ICA at once, here we write some functions as follows. ```{r setting, echo=TRUE} allICA <- function(X, J){ # Classic ICA Methods out.FastICA <- ICA(X=X, J=J, algorithm="FastICA") out.InfoMax <- ICA(X=X, J=J, algorithm="InfoMax") out.ExtInfoMax <- ICA(X=X, J=J, algorithm="ExtInfoMax") # Modern ICA Method out.JADE <- ICA2(X=X, J=J, algorithm="JADE") # Output list(out.FastICA=out.FastICA, out.InfoMax=out.InfoMax, out.ExtInfoMax=out.ExtInfoMax, out.JADE=out.JADE) } CorrIndexAllICA <- function(S, out){ tmp <- lapply(out, function(x, S){CorrIndex(cor(S, Re(x$S)))}, S=S) Name <- gsub("out.", "", names(tmp)) Value <- round(unlist(tmp), 3) names(Value) <- NULL cbind(Name, Value) tmp <- as.data.frame(cbind(Name, Value)) colnames(tmp) <- c("Method", "CorrIndex") knitr::kable(tmp) } ``` Note that we select only four ICA algorithms for the demonstration but in `iTensor` 12 ICA algorithms are available. For the details, check `?ICA` and `?ICA2`. # 1. ICA with time-independent sub-Gaussian data In this data, according to the independence between estimated source signal, an upright square means that the calculation is successful and other cases such as a rhombus rotated 45 degrees from a square means that the calculation is fail. ```{r ica1, echo=TRUE} set.seed(123456) out1 <- allICA(X=data1$X_observed, J=3) ``` Here we use CorrIndex, which is a performance index to evaluate ICA results [@corrindex]. CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well (the closer to 0, the better the performance). ```{r corrindex1, echo=TRUE} CorrIndexAllICA(data1$S_true, out1) ``` We can see the details of source signals as follows. ```{r pairs_ica1, echo=TRUE, fig.height=4, fig.width=4} pairs(out1$out.FastICA$S, main="FastICA") pairs(out1$out.InfoMax$S, main="InfoMax") pairs(out1$out.ExtInfoMax$S, main="ExtInfoMax") pairs(Re(out1$out.JADE$S), main="JADE") ``` # 2. ICA with time-independent super-Gaussian data In this data, if the outliers are distributed along the lines of x = 0 and y = 0, the calculation is successful and if they are of the cross-shape, it is a failure. ```{r ica2, echo=TRUE} set.seed(123456) out2 <- allICA(X=data2$X_observed, J=3) ``` CorrIndex imply that all the algorithms performed well. ```{r corrindex2, echo=TRUE} CorrIndexAllICA(data2$S_true, out2) ``` We can see the details of source signals as follows. ```{r pairs_ica2, echo=TRUE, fig.height=4, fig.width=4} pairs(out2$out.FastICA$S, main="FastICA") pairs(out2$out.InfoMax$S, main="InfoMax") pairs(out2$out.ExtInfoMax$S, main="ExtInfoMax") pairs(Re(out2$out.JADE$S), main="JADE") ``` # 3. ICA with data mixed with signals having no time dependence and different kurtosis In this data, the uniform distribution should be extracted in such a way that it is not rhombic, and furthermore the normal distribution and the t-distribution with 1 degree of freedom should be extracted independently. Sometimes, it is difficult ot determine visually though. ```{r ica3, echo=TRUE} set.seed(123456) out3 <- allICA(X=data3$X_observed, J=3) ``` CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well. ```{r corrindex3, echo=TRUE} CorrIndexAllICA(data3$S_true, out3) ``` We can see the details of source signals as follows. ```{r pairs_ica3, echo=TRUE, fig.height=4, fig.width=4} pairs(out3$out.FastICA$S, main="FastICA") pairs(out3$out.InfoMax$S, main="InfoMax") pairs(out3$out.ExtInfoMax$S, main="ExtInfoMax") pairs(Re(out3$out.JADE$S), main="JADE") ``` # 4. ICA with time-dependent data In this data, if the mesh pattern is reproduced, the calculation is successful. ```{r ica4, echo=TRUE} set.seed(123456) out4 <- allICA(X=data4$X_observed, J=3) ``` CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well. ```{r corrindex4, echo=TRUE} CorrIndexAllICA(data4$S_true, out4) ``` We can see the details of source signals as follows. ```{r pairs_ica4, echo=TRUE, fig.height=4, fig.width=4} pairs(out4$out.FastICA$S, main="FastICA") pairs(out4$out.InfoMax$S, main="InfoMax") pairs(out4$out.ExtInfoMax$S, main="ExtInfoMax") pairs(Re(out4$out.JADE$S), main="JADE") ``` # 5. IPCA in N < P systems This kind of data is an thin and long matrix, which is a very common data structure in omics data. In `iTensor`, we re-implemented the IPCA function of `mixOmics` package. ```{r ica5, echo=TRUE} library("mixOmics") set.seed(123456) # mixOmics's IPCA res.ipca <- ipca(data5$gene, ncomp=3, mode="deflation") # iTensor's IPCA out5 <- ICA2(X=as.matrix(data5$gene), J=3, algorithm="IPCA") ``` We can see the details of source signals as follows. In this data, we can confirm that the IPCA of `mixOmics` and `iTensor` have the same results, except for the order and constant-fold relationships. ```{r pairs_ica5, echo=TRUE, fig.height=4, fig.width=4} pairs(res.ipca$x, main="IPCA (mixOmics)", col=data5$treatment[,"Treatment.Group"], pch=16) pairs(out5$S, main="IPCA (iTensor)", col=data5$treatment[,"Treatment.Group"], pch=16) ``` # Session Information {.unnumbered} ```{r sessionInfo, echo=FALSE} sessionInfo() ``` # References