Sequencing methods

Chip-Seq

染色质免疫共沉淀(Chromatin Immunoprecipitation,ChIP)与二代测序相结合的表观遗传研究技术,能够高效地在全基因组范围内对DNA和蛋白的相互作用进行检测,通常用于转录因子结合位点或组蛋白特异性修饰位点的研究。

染色质免疫沉淀法 (ChIP) 是一种基于抗体的技术,可用来选择性地使特异性 DNA 结合蛋白及其 DNA 靶标富集。ChIP 可用来研究某种特殊的蛋白-DNA 相互作用、多种蛋白-DNA 相互作用或全基因组或部分基因内的相互作用。

ChIP 使用可选择性地检测和结合蛋白的抗体,包括组蛋白、组蛋白修饰、转录因子、辅因子,以提供有关染色质状态和基因转录的信息。在 ChIP 中结合使用蛋白质组分析和分子生物学技术,能够让研究者理解目的细胞或组织中的基因表达和调节。

Hi-C

Hi-C技术源于染色体构象捕获(Chromosome Conformation Capture, 3C)技术,利用高通量测序技术,结合生物信息分析方法,研究全基因组范围内整个染色质DNA在空间位置上的关系,获得高分辨率的染色质三维结构信息。Hi-C技术不仅可以研究染色体片段之间的相互作用,建立基因组折叠模型,还可以应用于基因组组装、单体型图谱构建、辅助宏基因组组装等,并可以与RNA-Seq、ChIP-Seq等数据进行联合分析,从基因调控网络和表观遗传网络来阐述生物体性状形成的相关机制。

PROGENy pathway signatures

PROGENy is resource that leverages a large compendium of publicly available signaling perturbation experiments to yield a common core of pathway responsive genes. For each pathway, a collection of genes are available along their contribution and significance to it. Inside PROGENy, one can find gene signatures for 14 different pathways: Androgen: involved in the growth and development of the male reproductive organs. EGFR: regulates growth, survival, migration, apoptosis, proliferation, and differentiation in mammalian cells Estrogen: promotes the growth and development of the female reproductive organs.

SAM/BAM文件中的MD标签

我用bowtie比对了序列,想查看reads的错配情况。SAM flag中的M包含了比对上的碱基和错位的碱基,不能区分错配。

参考bowtie的文档,可以看到XM的标签可以指示mismatch的个数,MD标签可以查看具体的错配情况。

XN:i:<N> The number of ambiguous bases in the reference covering this alignment. Only present if SAM record is for an aligned read.
XM:i:<N> The number of mismatches in the alignment. Only present if SAM record is for an aligned read.
MD:Z:<S> A string representation of the mismatched reference bases in the alignment. See SAM Tags format specification for details. Only present if SAM record is for an aligned read.

SAM手册对与MD的介绍

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
String encoding mismatched and deleted reference bases, used in conjunction with the CIGAR and SEQ fields to reconstruct the bases of the reference sequence interval to which the alignment has been mapped. This can enable variant calling without requiring access to the entire original reference.
编码错配或del的字符串,与CIGAR和SEQ一起使用,重建read的比对情况。可以在不需要整个参考序列的情况下,用于突变检测。

The MD string consists of the following items, concatenated without additional delimiter characters:
MD字符串由以下不含分隔符的条目组成。

• [0-9]+, indicating a run of reference bases that are identical to the corresponding SEQ bases;表示与相应SEQ碱基相同的一系列参考碱基;这里[0-9]+是正则表示。
• [A-Z], identifying a single reference base that differs from the SEQ base aligned at that position;在这个位置SEQ碱基与参考碱基不一致
• \^[A-Z]+, identifying a run of reference bases that have been deleted in the alignment.以^开头,表明有个del,注意del前有^。

总结:数字表示匹配,碱基表示错配,^碱基表示del。

As shown in the complete regular expression above, numbers alternate with the other items. Thus if two mismatches or deletions are adjacent without a run of identical bases between them, a ‘0’ (indicating a 0-length run) must be used to separate them in the MD string. 数字和字符交替出现,如果两个连续错配,中间需要用0来隔开。

Clipping, padding, reference skips, and insertions (‘H’, ‘S’, ‘P’, ‘N’, and ‘I’ CIGAR operations) are not represented in the MD string. When reconstructing the reference sequence, inserted and soft-clipped SEQ bases are omitted as determined by tracking ‘I’ and ‘S’ operations in the CIGAR string. (If the CIGAR string contains ‘N’ operations, then the corresponding skipped parts of the reference sequence cannot be reconstructed.)
Clipping, padding, reference skips, and insertions (‘H’, ‘S’, ‘P’, ‘N’, and ‘I’ CIGAR operations)不体现在MD字符串中,所以要与CIGAR结合。

For example, a string ‘10A5^AC6’ means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is different from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches.
10A5^AC6表示10个匹配(10),一个与A不匹配(A),2bp的del(^AC),6碱基的匹配。

JC-单细胞转录组分析揭示人类子宫内膜癌的起源和病理过程

Single-cell transcriptomic analysis highlights origin and pathological process of human endometrioid endometrial carcinoma

https://www.nature.com/articles/s41467-022-33982-7

背景

子宫内膜癌(Endometrial cancer, EC)是妇科最常见的恶性肿瘤之一,子宫内膜样子宫内膜癌(endometrioid endomecancer, EEC)是EC的主要病理类型。

在雌激素依赖性EEC肿瘤发生过程中,子宫内膜在没有孕激素保护的情况下长期暴露于雌激素中,表现出不受控制的增殖,并且可以从正常子宫内膜发展到非典型子宫内膜增生(AEH, EEC癌前阶段),然后逐步发展到EEC。关于ECC的起源过往研究推测包括子宫内膜上皮和基质干成分在内的多种谱系可能是EEC的起源,但证据不足以支持明确起源。

肿瘤微环境由免疫细胞、成纤维细胞、周细胞等组成,在肿瘤的发生、预后和转移中起重要作用,尽管先前的研究已经提示肿瘤微环境在预后和治疗耐药的潜在作用,但从正常子宫内膜到EEC形成的过程仍不明确。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
graph TB
    A[子宫内膜及非典型子宫内膜增生AEH及子宫内膜样子宫内膜癌ECC] --> B[细胞分群]
    B --> C[上皮细胞比例在AEH中增加,在EEC中进一步扩大,CNV变化大]
    B --> D[间质成纤维细胞比例下降]
    C --> F[RNA velocity分析,细胞相似性分析,MET marer基因分析等表明ECC不来源CAF]
    D --> F
    F --> G[EEC的上皮聚类,发现AEH特有非纤毛上皮亚群,并在ECC中存在,推测来源于正常的非纤毛上皮]    
    G --> H[EEC上皮细胞独有亚群为致癌亚群,RNA velocity分析非纤毛上皮腺细胞有可能是致癌亚群中存在的来源]
    H --> I[致癌亚群特征基因,发现LCN2和SAA1/2可能是子宫内膜早期肿瘤发生的一个特征]
    I --> J[类器官实验证明在正常子宫内膜和EEC中成纤维细胞是不可缺少的]
    J --> K[类器官实验证明在正常子宫内膜和EEC中成纤维细胞是不可缺少的]
    K --> L[巨噬细胞和淋巴细胞亚群分析表明免疫环境的失调可导致子宫内膜肿瘤的发生]

非编码小RNA的fasta序列下载资源

snoRNA snRNA https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh38.ncrna.fa.gz piRNA piRDB https://www.pirnadb.org/download/archive piRBase http://bigdata.ibp.ac.cn/piRBase/download/v3.0/fasta/hsa.v3.0.fa.gz piRNA Bank http://pirnabank.ibab.ac.in/ tRNA GtRNAdb high confidence mature tRNA sequences http://gtrnadb.ucsc.edu/genomes/eukaryota/Hsapi38/hg38-mature-tRNAs.fa mitocondrial tRNA sequences from mitotRNAdb http://mttrna.bioinf.uni-leipzig.de/mtDataOutput/ miRNA https://mirbase.org/download/ yRNA 18S (NR_145820.1), 5S (NR_023363.1), 28S (NR_003287.4) and 5.8S (NR_145821.1); RNY1 (NR_004391.1), RNY3 (NR_004392.1), RNY4 (NR_004393.1) and RNY5 (NR_001571.2) rRNA https://www.ncbi.nlm.nih.gov/nucleotide?term=txid9606[Organism] 选择rRNA下载