三代测序很多年了,刚工作的时候在超算中心做过三代的拼接,没好好研究过之后就再也没接触过,现在要做三代的项目,从头学习,Primary Analysis Data为初步数据分析文件夹,类似下面的文件夹结构
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
/path/to/secondary/storage/2420294/0011
├── Analysis_Results
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.bax.h5
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.log
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.subreads.fasta
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.subreads.fastq
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.bax.h5
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.log
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.subreads.fasta
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.subreads.fastq
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.bax.h5
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.log
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.subreads.fasta
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.subreads.fastq
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.bas.h5
│ ├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.sts.csv
│ └── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.sts.xml
├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.xfer.xml
├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.xfer.xml
├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.xfer.xml
├── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.mcd.h5
└── m140415_143853_42175_c100635972550000001823121909121417_s1_p0.metadata.xml
|
主要文件有
bas.h5文件和bax.h5文件
bas.h5和相关的bax.h5文件是PacBio@RS II初级分析(primary analysis)的主要输出文件,这些文件由设备产生到本地存储位置,作为后续SMRT分析软件进行alignment、consensus和variant分析的输入文件。
PacBio@RS II之前,单个bas.h5文件包含了所有测序数据,随着PacBio@RS II升级,通量和read长度都在增加,现在包含一个bas.h5和3个bax.h5文件(1-3.bax.h5)。bax.h5文件包含测序的base call的信息,bas.h5是三个bax.h5的重要指针。
用h5dupm -n [movie name].bas.h5命令看一下文件
1
2
3
4
5
6
|
FILE_CONTENTS {
group /
group /MultiPart
dataset /MultiPart/HoleLookup
dataset /MultiPart/Parts
}
|