月度归档:2020年03月

Matlab error when running GISTIC

If you instal MCR (MATLAB Compiler Runtime) provided by GISTIC package, may have the following error. This error could disrupt GISTIC.
libGL error: failed to load driver: swrast

If this situation occurs, rename the file found at $MATLAB_ROOT/sys/os/glnxa64/libstdc++.so.6″ to “libstdc++.so.6.old”, This forces MATLAB to use the OS library.

Works for me.

Ref:
https://ww2.mathworks.cn/matlabcentral/answers/296999-libgl-error-unable-to-load-driver-in-ubuntu-16-04-while-running-matlab-r2013b

GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

Prepare a data frame for sample CNV data

If we want to cluster samples based on CNV data, a dataframe is needed. However, CNV segments in each sample are not the same. Maybe overlap or distinct. I think CNTools package migh solve this challenge. An example is shown as below. The result is a reduced segment data frame.

BiocManager::install("CNTools")
data("sampleData")
seg <- CNSeg(sampleData)
rdseg <- getRS(seg, by = "region", imput = FALSE, XY = FALSE, what = "mean") 
View(rdseg@rs)

Input dataframe has six columns (“ID”,”chrom”,”loc.start”,”loc.end”,”num.mark”,”seg.mean”) including 277 samples and 54825 segments.

The result can be got from rdseg@rs, like this

Cheers

Also, we can use CNRegions from iClusterPlus package.
CNregions(sampleData)

Ref: https://www.rdocumentation.org/packages/CNTools

https://rdrr.io/bioc/iClusterPlus/man/CNregions.html

#####################################################################
#版权所有 转载请告知 版权归作者所有 如有侵权 一经发现 必将追究其法律责
#Author: Jason
#####################################################################

对Autoencoder(自编码器)的理解

通常数据的维度太大,可视化很难,也不利用模型的学习。有时候拿到数据做个PCA或者tSNE,就是把维度缩小到2维(当然也可以3维),便于看数据之间的关系。在机器学习中,Autoencoder也是一种降维的方式, Autoencoder输入层的神经元的数目和输出层的神经元的数目必须,而且要保证输出的结果尽最大可能和输入的结果一致。

图片来自网络

如上图所示,维度由大到小是decode过程,输出的结果可以从中间层经过encode得到,那么中间层保留了输入层的信息(因为输出层的结果从中间层得到),那么中间层的数据结果,就是降维后的结果,可以拿来做其他事情。 网络的复杂程度根据样本数设计。

无监督的聚类,便可以从中间层开始;数据的学习也可以从中间层开始。当输入层是多组学数据时,中间层便是融合后的结果。