---
title: "1. Independent Component Analysis (ICA)"
author:
- name: Koki Tsuyuzaki
  affiliation: Department of Artificial Intelligence Medicine,
    Graduate School of Medicine, Chiba University
  email: k.t.the-answer@hotmail.co.jp
date: "`r Sys.Date()`"
bibliography: bibliography.bib
package: iTensor
output: rmarkdown::html_vignette
vignette: |
  %\VignetteIndexEntry{1. Independent Component Analysis (ICA)}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

In this vignette, we consider approximating a data matrix as a product of multiple low-rank matrices (a.k.a., factor matrices).

Test data is available from `toyModel`.

```{r data, echo=TRUE}
library("iTensor")
data1 <- iTensor::toyModel("ICA_Type1")
data2 <- iTensor::toyModel("ICA_Type2")
data3 <- iTensor::toyModel("ICA_Type3")
data4 <- iTensor::toyModel("ICA_Type4")
data5 <- iTensor::toyModel("ICA_Type5")
dim(data1$X_observed)
dim(data2$X_observed)
dim(data3$X_observed)
dim(data4$X_observed)
dim(data5$gene) # N < P systems
```

Summary of these data is as follows:

1. ICA with time-independent sub-Gaussian data
2. ICA with time-independent super-Gaussian data
3. ICA with data mixed with signals having no time dependence and different kurtosis
4. ICA with time-dependent data
5. IPCA in N < P systems

You will see that the stuctures of these data as follows.

```{r data2, echo=TRUE, fig.height=4, fig.width=4}
pairs(data1$X_observed, main="data1")
pairs(data2$X_observed, main="data2")
pairs(data3$X_observed, main="data3")
pairs(data4$X_observed, main="data4")
dim(data5$gene)
```

To perform and visualize all the ICA at once, here we write some functions as follows.

```{r setting, echo=TRUE}
allICA <- function(X, J){
    # Classic ICA Methods
    out.FastICA <- ICA(X=X, J=J, algorithm="FastICA")
    out.InfoMax <- ICA(X=X, J=J, algorithm="InfoMax")
    out.ExtInfoMax <- ICA(X=X, J=J, algorithm="ExtInfoMax")
    # Modern ICA Method
    out.JADE <- ICA2(X=X, J=J, algorithm="JADE")
    # Output
    list(out.FastICA=out.FastICA, out.InfoMax=out.InfoMax,
      out.ExtInfoMax=out.ExtInfoMax, out.JADE=out.JADE)
}

CorrIndexAllICA <- function(S, out){
    tmp <- lapply(out, function(x, S){CorrIndex(cor(S, Re(x$S)))}, S=S)
    Name <- gsub("out.", "", names(tmp))
    Value <- round(unlist(tmp), 3)
    names(Value) <- NULL
    cbind(Name, Value)
    tmp <- as.data.frame(cbind(Name, Value))
    colnames(tmp) <- c("Method", "CorrIndex")
    knitr::kable(tmp)
}
```

Note that we select only four ICA algorithms for the demonstration but in `iTensor` 12 ICA algorithms are available. For the details, check `?ICA` and `?ICA2`.

# 1. ICA with time-independent sub-Gaussian data

In this data, according to the independence between estimated source signal, an upright square means that the calculation is successful and other cases such as a rhombus rotated 45 degrees from a square means that the calculation is fail.

```{r ica1, echo=TRUE}
set.seed(123456)
out1 <- allICA(X=data1$X_observed, J=3)
```

Here we use CorrIndex, which is a performance index to evaluate ICA results [@corrindex]. CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well (the closer to 0, the better the performance).

```{r corrindex1, echo=TRUE}
CorrIndexAllICA(data1$S_true, out1)
```

We can see the details of source signals as follows.

```{r pairs_ica1, echo=TRUE, fig.height=4, fig.width=4}
pairs(out1$out.FastICA$S, main="FastICA")
pairs(out1$out.InfoMax$S, main="InfoMax")
pairs(out1$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out1$out.JADE$S), main="JADE")
```

# 2. ICA with time-independent super-Gaussian data

In this data, if the outliers are distributed along the lines of x = 0 and y = 0, the calculation is successful and if they are of the cross-shape, it is a failure.

```{r ica2, echo=TRUE}
set.seed(123456)
out2 <- allICA(X=data2$X_observed, J=3)
```

CorrIndex imply that all the algorithms performed well.

```{r corrindex2, echo=TRUE}
CorrIndexAllICA(data2$S_true, out2)
```

We can see the details of source signals as follows.

```{r pairs_ica2, echo=TRUE, fig.height=4, fig.width=4}
pairs(out2$out.FastICA$S, main="FastICA")
pairs(out2$out.InfoMax$S, main="InfoMax")
pairs(out2$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out2$out.JADE$S), main="JADE")
```

# 3. ICA with data mixed with signals having no time dependence and different kurtosis

In this data, the uniform distribution should be extracted in such a way that it is not rhombic, and furthermore the normal distribution and the t-distribution with 1 degree of freedom should be extracted independently. Sometimes, it is difficult ot determine visually though.

```{r ica3, echo=TRUE}
set.seed(123456)
out3 <- allICA(X=data3$X_observed, J=3)
```

CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well.

```{r corrindex3, echo=TRUE}
CorrIndexAllICA(data3$S_true, out3)
```

We can see the details of source signals as follows.

```{r pairs_ica3, echo=TRUE, fig.height=4, fig.width=4}
pairs(out3$out.FastICA$S, main="FastICA")
pairs(out3$out.InfoMax$S, main="InfoMax")
pairs(out3$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out3$out.JADE$S), main="JADE")
```

# 4. ICA with time-dependent data

In this data, if the mesh pattern is reproduced, the calculation is successful.

```{r ica4, echo=TRUE}
set.seed(123456)
out4 <- allICA(X=data4$X_observed, J=3)
```

CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well.

```{r corrindex4, echo=TRUE}
CorrIndexAllICA(data4$S_true, out4)
```

We can see the details of source signals as follows.

```{r pairs_ica4, echo=TRUE, fig.height=4, fig.width=4}
pairs(out4$out.FastICA$S, main="FastICA")
pairs(out4$out.InfoMax$S, main="InfoMax")
pairs(out4$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out4$out.JADE$S), main="JADE")
```

# 5. IPCA in N < P systems

This kind of data is an thin and long matrix, which is a very common data structure in omics data. In `iTensor`, we re-implemented the IPCA function of `mixOmics` package.

```{r ica5, echo=TRUE}
library("mixOmics")
set.seed(123456)

# mixOmics's IPCA
res.ipca <- ipca(data5$gene, ncomp=3, mode="deflation")

# iTensor's IPCA
out5 <- ICA2(X=as.matrix(data5$gene), J=3, algorithm="IPCA")
```

We can see the details of source signals as follows. In this data, we can confirm that the IPCA of `mixOmics` and `iTensor` have the same results, except for the order and constant-fold relationships.

```{r pairs_ica5, echo=TRUE, fig.height=4, fig.width=4}
pairs(res.ipca$x, main="IPCA (mixOmics)", col=data5$treatment[,"Treatment.Group"], pch=16)
pairs(out5$S, main="IPCA (iTensor)", col=data5$treatment[,"Treatment.Group"], pch=16)
```

# Session Information {.unnumbered}

```{r sessionInfo, echo=FALSE}
sessionInfo()
```

# References