국가생명연구자원정보센터(KOBIC)

Korea Bioinformation Center

National center for biological research resources and information

KOBIC News

Dr. Yasukazu Nakamura from DDBJ in Japan visited KOBIC

Dr. Yasukazu Nakamura from DDBJ in Japan visited KOBIC 2025. 7. 23. Dr. Yasukazu Nakamura from DDBJ in Japan visited KOBIC on July 23rd and had a discussion on strengthening the collaboration between KOBIC and DDBJ. 2025-07-24

KOBIC participated at the 2025 INSDC Annual Meeting as a guest
2025-06-16
DDBJ curators visited Korea for a data curation workshop
2024-11-22

Recent Research Registrations

BioProject Highly Sensitive Detection of Viral RNA using NIR Signal-based Luminescence Resonance Energy Transfer

Project IDKAP241569

Public Analysis Pipelines

Pipeline Image

Whole-genome sequencing pipeline

The Whole-genome sequencing(WGS) pipeline is a modular toolkit for processing WGS data. This pipeline takes a FASTQ file as input and provides haplotype call results and annotations and visualizations based on GATK pipeline. First, raw read data with well-calibrated base error estimates in FASTQ format are mapped to the reference genome. The BWA mapping tool is used to align reads to the human genome reference, allowing for up to two mismatches in 30-base seeds, and generate a technology-independent SAM/BAM reference file format. Next, duplicate fragments are marked and removed using Picard(http://picard.sourceforge.net), mapping quality is assessed and low-quality mapped reads are filtered, and Paired-read information is also evaluated to ensure that all mate-pair information is in sync between each read. We then refine the initial alignments with local realignment and identify suspicious regions. Using this information as a covariate along with other technical covariates and known sites of variation, the GATK base quality score recalibration(BQSR) is performed. Germline SNPs and indels are called via local reassembly of haplotypes using the recalibrated and realigned BAM files. Finally, we provide Somalier, a tool to quickly assessing sample relevance from sequencing data in BAM, CRAM or VCF format.

#Whole Genome Sequencing

#WGS

#Genomics

#Next Generation Sequencing

#Precision Medicine

#Clinical Genomics

#noncoding genome

#GATK

#fastp

#Cutadapt

#BWA

#SortSam

#MarkDuplicates

#CountBase

#BaseRecalibrator

#ApplyBQSR

#HaplotypeCaller

#somalier

Pipeline Image

Single-cell RNA sequencing pipeline

The Single-cell RNA sequencing pipeline is an extensible toolkit for analyzing single-cell gene expression data using the Scanpy framework. It includes methods for preprocessing, visualization, clustering, and differential expression testing. Its Python-based implementation efficiently handles datasets containing more than one million cells. We introduce ANNDATA, a generic class for managing annotated data matrices. The pipeline features: 1. Regression of confounding variables, normalization, and identification of highly variable genes. 2. t-SNE and graph-based (Fruchterman–Reingold) visualizations that show cell-type annotations derived from comparisons with bulk expression data. 3. Clustering of cells and visualization using the Louvain algorithm, with support for other clustering algorithms as well. 4. Ranking differentially expressed genes in clusters to identify marker genes corresponding to bulk expression labels.

#Single-cell RNA sequencing

#Next-generation sequencing

#Bioinformatics

#Single-cell genomics

#Human Cell Atlas

#Cell_Biology

#Genomics

#transcriptome

#Biotechnology

#heterogeneity

#Multiomics

#scRNA-seq

#scATAC-seq

#Epigenetics

K-BDS 국가바이오데이터스테이션
BIO-EXPRESS
국가 바이오 빅데이터 구축
감염병 연구정보포털

Bio-Express View User Guide

The BioExpress service is the only cloud-based integrated data analysis service in Korea that enables big data analysis in scientific fields through a dynamic container-based automated workflow analysis platform and high-speed data transfer service.

Program Execution Statistics
Pipeline Execution Statistics

Download

Please download
the workbench and high-speed transfer service for the OS that matches your environment.

Bio-Express CLOSHA Workbench

Bio-Express GBox

Bio-Express GBox-CLI

6,356

User

1,120cases

Workspace

90,713cases

Execution Task

Korea BioData Station More

Bio research data refers to all types of data produced through national R&D projects in the life sciences field. As innovative research methods utilizing this data gain attention, bio data is emerging as a key factor driving R&D innovation. To support this, the National Bio Data Station has been established to integrate and provide data scattered across ministries, projects, and researchers, aiming to create a data-driven bio research environment.

Registration Status by Data Type

2,158cases
Bio Project
110,983cases
Bio Sample
2,371,418cases
Registered Data

Bio Project Registration Status

Cumulative Number of Registrations(cases)

National Genome Project More

Bio big data, which forms the foundation of precision medicine, is becoming increasingly important as the focus shifts from post-treatment care to personalized treatment and preventive healthcare. Particularly in the bioindustry, which benefits from first-mover advantages, proactive investment is necessary, and major countries are building large-scale bio big data systems. Therefore, this project was launched to build national bio big data for leading future healthcare. The goal is to establish a foundation at the national level to collect, store, and utilize 'bio big data', the center of the precision medicine era, and to contribute to the promotion of new industries and the improvement of healthy lives.

Collection of Clinical Information

Designating and operating 16 rare disease collaboration institutions to recruit rare disease patients and collect clinical information.

Data Analysis

Transporting the collected rare disease patients' samples to resource production institutions for the production and analysis of genomic data.

Data Sharing

The collected clinical information and genomic data are shared through a consortium formed by three institutions.

Data Utilization

The analyzed data is used for rare disease patient counseling, diagnosis, and research activities.

Data Status

Genomic Data 25,000

Variant Analysis Data 25,000

Clinical Information 25,000

Cohort 7

Infectious Disease Data Portal More

The Infectious Disease Data Portal is a portal service that integrates and provides research data on infectious disease viruses from around the world.In a rapidly changing environment, to understand infectious diseases and develop treatments and vaccines, KOBIC integrates and provides global infectious disease research data to share data and results harmoniously.

Sequence Dashboard

88,386 Domestic Genomic Sequences

1,354 Domestic Protein Sequences

19,685,177 Overseas Genomic Sequences

35,837,682 Overseas Protein Sequences

19,764,289 COVID-19 Genomic Sequences

35,333,179 COVID-19 Protein Sequences

Virus

Provides integrated information on viruses including disease overview, particle and genomic structure, life cycle, epidemiology, mutations, etc.

Data

Provides quality-analyzed genomic and protein sequences, and protein structures collected from around the world.

Statistics

Offers various statistical services on virus data, such as outbreak timing, region, mutations, etc.

Analysis Tools

Simple web-based BLAST service for infectious disease standard genomic sequences.