메뉴 바로가기 본문 바로가기 하단 바로가기

Bio-Express

An intuitive and open integrated analysis system for analyzing genomic big data,
based on a single enterprise-level server platform
  • 1188CPU(cores)
  • 12.4Memory(TB)
  • 2000Storage(TB)
  • 6288Total Users
  • 89366Total Analyses
  • 134Analysis Tools
  • 20Analysis Pipeline

Public Pipelines

Genomics
Variant-analysis Whole Genome Sequencing Somatic Variant Analysis Pipeline
b bio-workflow
${pipeline.pipelineName} + ' 썸네일'
Genomics
Whole Genome Sequencing Somatic Variant Analysis Pipeline

The Bio-Express Somatic WGS Pipeline is a modular analysis pipeline designed to detect somatic mutations from whole-genome sequencing data. It takes raw FASTQ files as input and provides comprehensive somatic mutation calling results, quality assessment, and visualization based on tumor-normal pair analysis. After assessing sequencing quality with FastQC, adapter removal and quality trimming are performed using Cutadapt, followed by mapping to a reference genome sequence using the BWA-MEM2 alignment tool to generate BAM format alignment files. Subsequently, the GATK pipeline is employed for duplicate removal, mapping quality assessment, and filtering of low-quality reads, ensuring consistency of all pair information. Coordinate-based sorting is performed using SAMtools, and PCR duplicates are removed via GATK MarkDuplicates. Base quality score recalibration is then conducted using GATK BaseRecalibrator and ApplyBQSR, leveraging known variant site information as covariates. For the recalibrated BAM files, a comprehensive quality control and sample validation step is performed first. Sample relatedness is verified using Somalier, sample identity is confirmed through variant-SNP marker integration analysis with SNPmatch, sample contamination is assessed with VerifyBamID2, and coverage analysis is conducted using Mosdepth to comprehensively evaluate the quality and reliability of the sequencing data. Next, the pipeline proceeds to the tumor-normal pair analysis stage, where Normal-Tumor pair compatibility is verified and cross-sample contamination levels are estimated using Conpair. Single nucleotide variants and insertion/deletion mutations are detected in parallel using Strelka2 and Mutect2 to maximize the sensitivity and specificity of somatic mutation detection. Finally, tumor purity analysis is performed using TINC, structural variants are called with Manta, and copy number variation analysis is conducted with Canvas, quantifying comprehensive somatic genomic alterations to provide essential information for cancer genomics research and precision medicine. > Default genome: hg38 [IMPORTANT] Sample Type Identification Rule: - Tumor tissue samples: FASTQ filenames must contain "_T" - Normal tissue samples: FASTQ filenames must contain "_N" (e.g.) patient001_T_R1.fastq.gz # Tumor sample, Read 1 patient001_T_R2.fastq.gz # Tumor sample, Read 2 patient001_N_R1.fastq.gz # Normal sample, Read 1 patient001_N_R2.fastq.gz # Normal sample, Read 2

Version1.0
Execution Count 0
Genomics
Variant-analysis Whole Genome Sequencing Germline Variant Analysis Pipeline
b bio-workflow
${pipeline.pipelineName} + ' 썸네일'
Genomics
Whole Genome Sequencing Germline Variant Analysis Pipeline

The Bio-Express Somatic WGS Pipeline is a modular analysis pipeline designed to detect somatic mutations from whole-genome sequencing data. It takes raw FASTQ files as input and provides comprehensive somatic mutation calling results, quality assessment, and visualization based on tumor-normal pair analysis. After assessing sequencing quality with FastQC, adapter removal and quality trimming are performed using Cutadapt, followed by mapping to a reference genome sequence using the BWA-MEM2 alignment tool to generate BAM format alignment files. Subsequently, the GATK pipeline is employed for duplicate removal, mapping quality assessment, and filtering of low-quality reads, ensuring consistency of all pair information. Coordinate-based sorting is performed using SAMtools, and PCR duplicates are removed via GATK MarkDuplicates. Base quality score recalibration is then conducted using GATK BaseRecalibrator and ApplyBQSR, leveraging known variant site information as covariates. For the recalibrated BAM files, a comprehensive quality control and sample validation step is performed first. Sample relatedness is verified using Somalier, sample contamination is assessed with VerifyBamID2, and coverage analysis is conducted using Mosdepth to comprehensively evaluate the quality and reliability of the sequencing data. Subsequently, GVCF file generation using GATK HaplotypeCaller and germline SNV/Indel variant detection in standard VCF format using GenotypeGVCFs are executed. Following this, comprehensive variant statistical analysis using BCFtools is performed, and structural variants are detected using the Manta tool. > Default genome: hg38

Version1.0
Execution Count 0
Epigenomics
DNA-binding-protein-based-analysis ChIP-seq Analysis Pipeline
b bio-workflow
${pipeline.pipelineName} + ' 썸네일'
Epigenomics
ChIP-seq Analysis Pipeline

The Bio-Express ChIP-seq Analysis Pipeline is a modular analysis pipeline designed to detect protein-DNA binding sites from Chromatin Immunoprecipitation Sequencing (ChIP-Seq) data. It uses raw FASTQ files as input and provides comprehensive epigenomic binding site calling results, quality assessment, and visualization based on transcription factor binding sites, histone modification regions, and chromatin structure analysis. After evaluating sequencing quality with FastQC, low-quality base filtering is performed using FASTX-Toolkit, followed by mapping to a reference genome sequence using the Bowtie2 alignment tool to generate SAM format alignment files. Subsequently, the preprocessed alignment files are used to enter the epigenomic signal analysis stage. Statistically significant peak calling is performed using MACS2 (Model-based Analysis of ChIP-Seq) to accurately identify protein-DNA binding sites, providing high-resolution binding regions in narrowPeak format. Finally, comprehensive downstream analysis is conducted using Homer. The annotatePeaks function provides genomic location annotations and nearby gene information for the detected peaks, and the makeUCSCfile function generates bedGraph and bigWig format visualization files compatible with the UCSC Genome Browser, enabling intuitive visualization of the genome-wide distribution patterns of chromatin immunoprecipitation signals. > Default genome: hg38 [IMPORTANT] Sample Type Identification Rule: - Control Files: Must start with "CONTROL_" (Required prefix for automatic identification) - Treatment/ChIP Files: No specific filename requirements (e.g.) CONTROL_input_R1.fastq.gz # Valid control, Read 1 CONTROL_input_R2.fastq.gz # Valid control, Read 2 ChIP_H3K4me3_R1.fastq.gz # Valid treatment, Read 1 ChIP_H3K4me3_R2.fastq.gz # Valid treatment, Read 2

Version1.0
Execution Count 0
Transcriptomics
Single-cell-transcriptomics Single-cell RNA Sequencing Pipeline
b bio-workflow
${pipeline.pipelineName} + ' 썸네일'
Transcriptomics
Single-cell RNA Sequencing Pipeline

The single-cell RNA sequencing pipeline is designed to analyze gene expression data at the single-cell level using 10X Genomics' Cell Ranger along with the Python-based Scanpy package. It takes raw sequencing data in FASTQ format as input, along with metadata describing the samples from which each library was derived. The pipeline performs a comprehensive set of analytical steps including count matrix generation, data preprocessing, normalize and scaling, dimension reduction and clustering, differential gene expression analysis, and functional analysis. As output, it produces a cell-by-gene expression matrix, analytical result tables derived from this matrix, and a variety of visualization plots to support biological interpretation.

Version2.0
Execution Count 0