BreakDancer检测结构突变SV实战

2016-01-09

Default Category

一，介绍

BreakDancer包含两个互补的程序：BreakDancerMax和BreakDancerMini。

BreakDancerMax根据二代测序read比对时，出现的异常比对，预测插入，缺失，倒位，染色体间或染色体内易位等五种结构突变。

BreakDancerMini则侧重于检测small indel。新版本的breakdancer已经不在包含BreakDancerMini，作者推荐使用Pindel检测small indels (10-80 bp)。

项目地址：https://github.com/genome/breakdancer http://breakdancer.sourceforge.net/

二，安装

BreakDancer利用跨平台编译工具cmake进行编译,如果没有安装cmake，要先安装cmake $ sudo apt-get install cmake

git clone BreakDancer项目到本地，–recursive要添加，因为添加这个参数之后，BreakDancer引用的其他模块才会一并克隆到本地。modules说明在.gitmodules文件中。

$ git clone --recursive https://github.com/genome/breakdancer.git

创建build文件夹，并进入

$ cd breakdancer 
$ mkdir build 
$ cd build

执行cmake命令，指定编译发行版，并制定安装路径

$ cmake .. -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=/usr/local

编译

$ make 
$ sudo make install

有些教程提到要将samtools的路径添加到系统变量中，即在~/.profile或者~./bashrc中export PATH="${PATH}:/path/to/samtools。因为本人的服务器samtools本来就在环境变量中，所以没有设置，我在后续运行过程中发现breakdancer会调用samtools，所以请确保samtools在环境变量中。

在/path/tp/breakdancer/build/bin的目录下，会看到breakdancer-max。运行下试试，是不是正确输出了用法啦。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


$ ./breakdancer-max

breakdancer-max version 1.4.5-unstable-66-4e44b43 (commit 4e44b43)

Usage: breakdancer-max

Options:
-o STRING operate on a single chromosome [all chromosome]
-s INT minimum length of a region [7]
-c INT cutoff in unit of standard deviation [3]
-m INT maximum SV size [1000000000]
-q INT minimum alternative mapping quality [35]
-r INT minimum number of read pairs required to establish a connection [2]
-x INT maximum threshold of haploid sequence coverage for regions to be ignored [1000]
-b INT buffer size for building connection [100]
-t only detect transchromosomal rearrangement, by default off
-d STRING prefix of fastq files that SV supporting reads will be saved by library
-g STRING dump SVs and supporting reads in BED format for GBrowse
-l analyze Illumina long insert (mate-pair) library
-a print out copy number and support reads per library rather than per bam, by default off
-h print out Allele Frequency column, by default off
-y INT output score filter [30]

ubuntu下升级更新R版本

2016-01-07

Default Category

虽说用最早知道R是在大学的时候，那个时候因为生物信息的人都会R。实际上，我倒现在都不会R，一直在用JAVA，现在也转到python上了。感觉做转录组的牛人用R比较多。我也在计划学下R，毕竟很多统计和作图的包都是R包。

废话不多说了，为什么我不会R，却还要发这个帖子呢，因为我在做Fastq文件质控的时候，需要一个R包，我不会写R，但是会照猫画虎的用哈，不过在安装这个包的时候提示我package not available for R。当时想，是不是服务器上的版本有点老啊。于是弱弱的用百度搜了下，竟然有人说要先卸载再安装。我想，这不科学啊，还是谷歌了一下。

上干货

1，这一步的目的是添加cran到apt的源中，cran也可以换成其他的。

sudo echo "deb http://mirrors.aliyun.com/CRAN/bin/linux/ubuntu/ trusty/" >> /etc/apt/sources.list

2，从公钥服务器上获得缺失的公钥，公钥服务器也可以换成其他地方的。Fetch the secure APT key

gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
或者
gpg --hkp://keyserver keyserver.ubuntu.com:80 --recv-key E084DAB9

4，导入公钥 Feed it to apt-key

gpg -a --export E084DAB9 ' sudo apt-key add -

5，

sudo apt-get update && sudo apt-get install r-base

然后完成R语言的更新。

SNPedia 资源介绍

2016-01-06

Default Category

SNPedia: http://www.snpedia.com/index.php/SNPedia

SNPedia是一个基于wiki格式SNP数据库
创建者：遗传学家 Greg Lennon和程序员Mike Cariaso
关注SNP与医学，谱系，表型的联系
内容由用户添加或机器自动搜集

高通量测序领域PPT中常用的两张图cost_per_genome_megabase

2015-12-25

Default Category

When I in senior high school, the Human Genome Project (HGP) was comparable with Project Apollo for human beings.

我上高中的时候，生物课本上有关于人类基因组计划的介绍。那个时候人们以为将人类的基因组序列搞清楚之后，就可以破译生命的奥秘。人类基因组计划耗费了很长时间很多年，但人类还是无法掌握生命的奥秘，更不知道这项工程还有多大价值。

但正是因为这项计划，促使科技的进步使得高通量测序技术出现，发展，普及。现在的生物学研究，已经离不开高通量测序，测序带来的信息量是前所未有的。正是这前所未有的信息和数据，才显得那么迷人。

ControlFreec检测CNV

2015-12-11

Default Category

1,下载control freec

地址：http://bioinfo-out.curie.fr/projects/freec/src/FREEC_Linux64.tar.gz

2，编译

tar -zxvf FREEC_Linux64.tar.gz
make

3，control freec根据配置文件进行工作，运行control freec之前需要进行的准备工作有