Matlab error when running GISTIC

2020-03-27

If you instal MCR (MATLAB Compiler Runtime) provided by GISTIC package, may have the following error. This error could disrupt GISTIC. libGL error: failed to load driver: swrast If this situation occurs, rename the file found at " $MATLAB_ROOT/sys/os/glnxa64/libstdc++.so.6" to “libstdc++.so.6.old”, This forces MATLAB to use the OS library. Works for me. Ref: https://ww2.mathworks.cn/matlabcentral/answers/296999-libgl-error-unable-to-load-driver-in-ubuntu-16-04-while-running-matlab-r2013b GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

Prepare a data frame for sample CNV data

2020-03-26

Default Category

If we want to cluster samples based on CNV data, a dataframe is needed. However, CNV segments in each sample are not the same. Maybe overlap or distinct. I think CNTools package migh solve this challenge. An example is shown as below. The result is a reduced segment data frame.

1
2
3
4
5


BiocManager::install("CNTools")
data("sampleData")
seg <- CNSeg(sampleData)
rdseg <- getRS(seg, by = "region", imput = FALSE, XY = FALSE, what = "mean") 
View(rdseg@rs)

对Autoencoder(自编码器)的理解

2020-03-22

Default Category

通常数据的维度太大，可视化很难，也不利用模型的学习。有时候拿到数据做个PCA或者tSNE，就是把维度缩小到2维（当然也可以3维），便于看数据之间的关系。在机器学习中，Autoencoder也是一种降维的方式， Autoencoder输入层的神经元的数目和输出层的神经元的数目必须，而且要保证输出的结果尽最大可能和输入的结果一致。

FPKM转TPM

2020-02-29

Default Category

R code

1
2
3
4


fpkm2tpm = function(fpkm){
  exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
}
tpm = apply(expMatrix, 2, fpkm2tpm)

可变多聚腺苷酸化Alternative Polyadenylation (APA) 检测

2020-02-03

Default Category

可变多聚腺苷酸化Alternative Polyadenylation (APA)，如下图所示（图片来自参考），在不同的APA信号位点切割，然后添加polyA。这种调控机制属于转录后调控，可能会影响蛋白的序列（发生在编码区），也可能影响蛋白的稳定性（比如非编码区内的miRNA的调控区域）。其实也是可变剪接的一种情况。

常用的软件是Dapars，这个软件现在也有了升级的版本Dapars2。参考： https://github.com/ZhengXia/dapars https://github.com/3UTR/DaPars2 分析流程很相似，Dapars2多了 normalize library sizes 。