- 1188CPU(cores)
- 12.4Memory(TB)
- 2000Storage(TB)
- 6288Total Users
- 89366Total Analyses
- 134Analysis Tools
- 20Analysis Pipeline
Analysis Tools
-
Quality Control 11 Assessment of sequencing and alignment quality
-
Assembly 10 Genome reconstruction from sequencing reads
-
Mapping 9 Read alignment to reference genomes
-
Variant 20 Variant detection and interpretation (SNPs, InDels, SVs)
-
Genome Annotation 29 Gene and functional element annotation
-
Structure 4 Genome and protein structure analysis
-
Omics Data Analysis 22 Omics data integration and analysis
-
Utility 4 Tools for data conversion and file processing
-
Statistics and Visualization 3 Statistical analysis and visualization
-
in-house script 0 Customized scripts for Bio-Express pipelines
Information Sharing and Support
[GBox-CLI] M_Check socket error (isOpen: false) Log Output Issue
We would like to inform you about an issue where the log message "M_Check socket error (isOpen: false)" appears after the transmission is complete when using GBox-CLI. This error occurs when using a version prior to the latest patch. To ensure smooth transmission, please download and use the latest version of GBox-CLI. 2024-07-29 16:47:22SUPPORT
For inquiries regarding joint research, education support, disability reporting, etc., feel free to contact us anytime.
Public Pipelines

The Bio-Express Somatic WGS Pipeline is a modular analysis pipeline designed to detect somatic mutations from whole-genome sequencing data. It takes raw FASTQ files as input and provides comprehensive somatic mutation calling results, quality assessment, and visualization based on tumor-normal pair analysis. After assessing sequencing quality with FastQC, adapter removal and quality trimming are performed using Cutadapt, followed by mapping to a reference genome sequence using the BWA-MEM2 alignment tool to generate BAM format alignment files. Subsequently, the GATK pipeline is employed for duplicate removal, mapping quality assessment, and filtering of low-quality reads, ensuring consistency of all pair information. Coordinate-based sorting is performed using SAMtools, and PCR duplicates are removed via GATK MarkDuplicates. Base quality score recalibration is then conducted using GATK BaseRecalibrator and ApplyBQSR, leveraging known variant site information as covariates. For the recalibrated BAM files, a comprehensive quality control and sample validation step is performed first. Sample relatedness is verified using Somalier, sample identity is confirmed through variant-SNP marker integration analysis with SNPmatch, sample contamination is assessed with VerifyBamID2, and coverage analysis is conducted using Mosdepth to comprehensively evaluate the quality and reliability of the sequencing data. Next, the pipeline proceeds to the tumor-normal pair analysis stage, where Normal-Tumor pair compatibility is verified and cross-sample contamination levels are estimated using Conpair. Single nucleotide variants and insertion/deletion mutations are detected in parallel using Strelka2 and Mutect2 to maximize the sensitivity and specificity of somatic mutation detection. Finally, tumor purity analysis is performed using TINC, structural variants are called with Manta, and copy number variation analysis is conducted with Canvas, quantifying comprehensive somatic genomic alterations to provide essential information for cancer genomics research and precision medicine. > Default genome: hg38 [IMPORTANT] Sample Type Identification Rule: - Tumor tissue samples: FASTQ filenames must contain "_T" - Normal tissue samples: FASTQ filenames must contain "_N" (e.g.) patient001_T_R1.fastq.gz # Tumor sample, Read 1 patient001_T_R2.fastq.gz # Tumor sample, Read 2 patient001_N_R1.fastq.gz # Normal sample, Read 1 patient001_N_R2.fastq.gz # Normal sample, Read 2

The Bio-Express Somatic WGS Pipeline is a modular analysis pipeline designed to detect somatic mutations from whole-genome sequencing data. It takes raw FASTQ files as input and provides comprehensive somatic mutation calling results, quality assessment, and visualization based on tumor-normal pair analysis. After assessing sequencing quality with FastQC, adapter removal and quality trimming are performed using Cutadapt, followed by mapping to a reference genome sequence using the BWA-MEM2 alignment tool to generate BAM format alignment files. Subsequently, the GATK pipeline is employed for duplicate removal, mapping quality assessment, and filtering of low-quality reads, ensuring consistency of all pair information. Coordinate-based sorting is performed using SAMtools, and PCR duplicates are removed via GATK MarkDuplicates. Base quality score recalibration is then conducted using GATK BaseRecalibrator and ApplyBQSR, leveraging known variant site information as covariates. For the recalibrated BAM files, a comprehensive quality control and sample validation step is performed first. Sample relatedness is verified using Somalier, sample contamination is assessed with VerifyBamID2, and coverage analysis is conducted using Mosdepth to comprehensively evaluate the quality and reliability of the sequencing data. Subsequently, GVCF file generation using GATK HaplotypeCaller and germline SNV/Indel variant detection in standard VCF format using GenotypeGVCFs are executed. Following this, comprehensive variant statistical analysis using BCFtools is performed, and structural variants are detected using the Manta tool. > Default genome: hg38

The Bio-Express ChIP-seq Analysis Pipeline is a modular analysis pipeline designed to detect protein-DNA binding sites from Chromatin Immunoprecipitation Sequencing (ChIP-Seq) data. It uses raw FASTQ files as input and provides comprehensive epigenomic binding site calling results, quality assessment, and visualization based on transcription factor binding sites, histone modification regions, and chromatin structure analysis. After evaluating sequencing quality with FastQC, low-quality base filtering is performed using FASTX-Toolkit, followed by mapping to a reference genome sequence using the Bowtie2 alignment tool to generate SAM format alignment files. Subsequently, the preprocessed alignment files are used to enter the epigenomic signal analysis stage. Statistically significant peak calling is performed using MACS2 (Model-based Analysis of ChIP-Seq) to accurately identify protein-DNA binding sites, providing high-resolution binding regions in narrowPeak format. Finally, comprehensive downstream analysis is conducted using Homer. The annotatePeaks function provides genomic location annotations and nearby gene information for the detected peaks, and the makeUCSCfile function generates bedGraph and bigWig format visualization files compatible with the UCSC Genome Browser, enabling intuitive visualization of the genome-wide distribution patterns of chromatin immunoprecipitation signals. > Default genome: hg38 [IMPORTANT] Sample Type Identification Rule: - Control Files: Must start with "CONTROL_" (Required prefix for automatic identification) - Treatment/ChIP Files: No specific filename requirements (e.g.) CONTROL_input_R1.fastq.gz # Valid control, Read 1 CONTROL_input_R2.fastq.gz # Valid control, Read 2 ChIP_H3K4me3_R1.fastq.gz # Valid treatment, Read 1 ChIP_H3K4me3_R2.fastq.gz # Valid treatment, Read 2

The single-cell RNA sequencing pipeline is designed to analyze gene expression data at the single-cell level using 10X Genomics' Cell Ranger along with the Python-based Scanpy package. It takes raw sequencing data in FASTQ format as input, along with metadata describing the samples from which each library was derived. The pipeline performs a comprehensive set of analytical steps including count matrix generation, data preprocessing, normalize and scaling, dimension reduction and clustering, differential gene expression analysis, and functional analysis. As output, it produces a cell-by-gene expression matrix, analytical result tables derived from this matrix, and a variety of visualization plots to support biological interpretation.