BCL文件

测序产生的原始文件是BCL(binary base call)文件,测序仪在测序的时候,每个cycle都会测量编码不同颜色的通道强度,并确定最有可能的碱基类型。Real Time Analysis (RTA) 软件会将碱基类型和可信度(一个质量分数)。与FASTQ文件不同的是,BCL文件是实时产生,每个cycle的每个tile都会有一个对应文件,文件放在

<run directory>/Data/Intensities/BaseCalls/L<lane>/C<cycle>.1

文件的命名

s_<lane>_<tile>.bcl

bcl2fastq

该文件需要通过Illunima的软件或者第三方分析工具将BCL文件转成FASTQ文件。一般而言,数据下机之后,Illumina测序仪会自动将BCL转成FASTQ文件。有时候,根据实验需要,需要自己手工将BCL文件转成FASTQ文件,比如自己设计的index中含有简并碱基,或者需要调整一下转换的参数等。 Illumina提供bcl2fastq的程序包,共离线处理BCL文件,生成FASTQ文件。

1
2
3
bcl2fastq  -i /paht/to/run/Data/Intensities/BaseCalls/ \
       -o /output/dir --sample-sheet /paht/to/run/SampleSheet.csv \
       -R /paht/to/run/

bcl2fastq文件有很多参数可调,比如在FASTQ中记录read的index(fastq文件中会记录配对的index具体序列,以及会生成额外对应的index文件),可以添加–create-fastq-for-index-reads选项。 如果允许index有mismatch的话,可以通过–barcode-mismatches设置。 –fastq-compression-level可以设置生成的FASTQ.GZ文件的压缩比例 安装如下

1
2
3
4
5
unzip bcl2fastq2-v2.17.1.14.tar.zip
tar -xvzf bcl2fastq2-v2.17.1.14.tar.gz
./bcl2fastq/src/configure --prefix=/path/to/install/
make
make install

安装过程中,如果遇到一下问题,请更新gcc版本 问题1 cc1plus: error: unrecognized command line option “-std=c++11” 问题2 undefined reference to `boost::re_detail::perl_matcher collect2: error: ld returned 1 exit status 软件选项说明如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
BCL to FASTQ file converter
bcl2fastq v2.17.1.14
Copyright (c) 2007-2015 Illumina, Inc.

Command-line options:
  -h [ --help ]                                     produce help message and exit
  -v [ --version ]                                print program version information
  -l [ --min-log-level ] arg (=INFO)              minimum log level
                                                          recognized values: NONE, FATAL, ERROR, WARNING, INFO, DEBUG, TRACE
  -i [ --input-dir ] arg (=<runfolder-dir>/Data/Intensities/BaseCalls/)
                                                          path to input directory
  -R [ --runfolder-dir ] arg (=./)         path to runfolder directory
  --intensities-dir arg (=<input-dir>/../)        path to intensities directory
                                                          if intensities directory is specified, also input directory must be specified
  -o [ --output-dir ] arg (=<input-dir>)                path to demultiplexed output
  --interop-dir arg (=<runfolder-dir>/InterOp/)   path to demultiplexing statistics directory
  --stats-dir arg (=<output-dir>/Stats/)                path to human-readable demultiplexing statistics directory
  --reports-dir arg (=<output-dir>/Reports/)       path to reporting directory
  --sample-sheet arg (=<runfolder-dir>/SampleSheet.csv)
                                                                              path to the sample sheet
  --aggregated-tiles arg (=AUTO)                  tiles aggregation flag determining structure of input files
                                                   recognized values:
                                                      AUTO Try to detect correct setting
                                                      YES  Tiles are aggregated into single input file
                                                      NO   There are separate input files for individual tiles
                                                  
  -r [ --loading-threads ] arg (=4)               number of threads used for loading BCL data
  -d [ --demultiplexing-threads ] arg          number of threads used for demultiplexing
  -p [ --processing-threads ] arg                 number of threads used for processing demultiplexed data
  -w [ --writing-threads ] arg (=4)               number of threads used for writing FASTQ data
                                                                    this must not be higher than number of samples
  --tiles arg                                 Comma-separated list of regular expressions to select only a subset of the tiles 
                                                  available in the flow-cell. Multiple entries allowed, each applies to the corresponding base-calls.
                                                  For example:
                                                   * to select all the tiles ending with '5' in all lanes:
                                                       --tiles [0-9][0-9][0-9]5
                                                   * to select tile 2 in lane 1 and all the tiles in the other lanes:
                                                       --tiles s_1_0002,s_[2-8]
  --minimum-trimmed-read-length arg (=35)         minimum read length after adapter trimming
  --use-bases-mask arg                                             Specifies how to use each cycle.
  --mask-short-adapter-reads arg (=22)            smallest number of remaining bases (after masking bases below the minimum 
                                                                           trimmed read  length) below which whole read is masked
  --adapter-stringency arg (=0.9)                  adapter stringency
  --ignore-missing-bcls                                 assume 'N'/'#' for missing calls
  --ignore-missing-filter                                assume 'true' for missing filters
  --ignore-missing-positions                         assume \[0,i\] for missing positions, where i is incremented starting from 0
  --ignore-missing-controls                           assume 0 for missing controls
  --write-fastq-reverse-complement              Generate FASTQs containing reverse complements of actual data
  --with-failed-reads                                        include non-PF clusters
  --create-fastq-for-index-reads                     create FASTQ files also for index reads
  --find-adapters-with-sliding-window           find adapters with simple sliding window algorithm
  --no-bgzf-compression                                Turn off BGZF compression for FASTQ files
  --fastq-compression-level arg (=4)              Zlib compression level (1-9) used for FASTQ files
  --barcode-mismatches arg (=1)                   number of allowed mismatches per index
                                                                        multiple entries, comma delimited entries, allowed; each entry is applied to the 
                                                                        corresponding index;last entry applies to all remaining indices
  --no-lane-splitting                             Do not split fastq files by lane.

软件地址 http://support.illumina.com.cn/downloads/bcl2fastq-conversion-software-v217.html

文档手册 http://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/bcl2fastq/bcl2fastq_letterbooklet_15038058brpmi.pdf

##################################################################### #版权所有 转载请告知 版权归作者所有 如有侵权 一经发现 必将追究其法律责任 #Author: Jason

####################################################################