A Practice Guide to NGS Genomic Sequencing Data Processing and Genetic Analyzing in Linux OS
1. Reference:
download_resource > build_reference_for_bwa (tophat, star) / bwa_index / annotation_gtf
The genome data for reference include genome reference as .fasta or .fa file and annotation file as .gtf or .gff.
- ensemble
FTP site to download .fasta and .gtf file :
ftp://ftp.ensembl.org/pub/current_fasta
ftp://ftp.ensembl.org/pub//current_gtf
Web site to download .fasta and .gtf file :
http://www.ensembl.org/info/data/ftp/index.html
- UCSC
Web site to download .fasta and .gtf file :
http://hgdownload.soe.ucsc.edu/downloads.html
FTP site to download .fasta and .gtf file :
ftp://hgdownload.soe.ucsc.edu/goldenPath/currentGenomes
- NCBI
FTP site for download :
ftp://ftp.ncbi.nlm.nih.gov/genomes
Web site for download :
http://www.ncbi.nlm.nih.gov/home/download.shtml
2. Software and tools:
download_resource > software_installation
- TopHat/Cufflinks/Cuffdiff/Fusion
- BWA (Burrows-Wheeler Aligner)
- Bowtie
- SAMtools
- GATK(GenomeAnalysisToolKits)
- FastQC
- QC3
- RNA-SeQC
- AnnoVar
- R
- Cluster 3.0 (Open Source Clustering Software)
- TreeView
- Circos (flexible and automatable circular data visualization)
- BreakDancer
- IGV
- FusionHunter
- FusionMap
- BEDTools
https://bedtools.googlecode.com/
- bamUtil
http://genome.sph.umich.edu/wiki/BamUtil
- Picard tools
- MuTect
- VarScan
- CNVnator
- CoNIFER (Copy Number Inference From Exome Reads)
- CNVseq
- CPAT
- cummeRbund
- VCFtools
http://vcftools.sourceforge.net/
- ViusFinder
- VirusSeq
- SRAtools
- pindel
- dindel
- Homer
- fastax-toolkit
- MapSplice
- diffsplice
- SVDetect
- HTSeq
3. DNAseq:
fastq > bwa_align > GATK_realign > GATK_recalibration > GATK_markduplicate > GATK_call_SNPs/INDELs > GATK_best_prectice_filter > pass_filter > reformat_data > ANNOVAR >annotated_exon_result
> samtools_mpileup > varscan_call_SNPs/INDELs
4. RNAseq:
fastq > tophat-G_align > cufflink-G > fpkm_table
> cuffdiff > differential_gene_expression / isoforms
> cufflink-g > de-novel_individual >merge_by_location > re-call_fpkm
> cufflink > cuffcompare_gtf > cuffdiff_de-novel_group > differential_gene_expression / isoforms / splicing / promotor / cds
5. Parallel Computing
computer_cluster > linux_OS > organization_of_computation_tasks > processing_script > jobs_submit
6. Linux Command-line Operation
7. Mutation Analysis
8. SNPs Analysis