Chinese intracranial hemorrhage imaging database: constructing a structured multimodal intracranial hemorrhage data warehouse : Chinese Medical Journal

Secondary Logo

Journal Logo


Chinese intracranial hemorrhage imaging database: constructing a structured multimodal intracranial hemorrhage data warehouse

Chen, Yihao1; Chang, Jianbo1; Zhang, Qinghua2; Ye, Zeju3; Tian, Fengxuan4; Li, Zhaojian5; Li, Kaigu6; Chen, Jie7; Ma, Wenbin1; Wei, Junji1; Feng, Ming1; Wang, Renzhi1

Editor(s): Ji, Yuanyuan

Author Information
Chinese Medical Journal ():10.1097/CM9.0000000000002292, December 30, 2022. | DOI: 10.1097/CM9.0000000000002292

To the Editor: China has the largest number of intracerebral hemorrhage (ICH) patients of any single country,[1] and efforts to mine, utilize, and share the relevant data are warranted. As a structured diagnosis and treatment pipeline for ICH has been demonstrated to improve patient outcomes, there is an urgent need to construct a large medical database to store relevant data and facilitate data mining, analysis, and sharing. However, traditional stroke databases are dominated by text data without comprehensive imaging data, and few studies have focused on the construction of databases on ICH.[2,3] To integrate multimodal data on ICH and standardize data processing and feedback, the Chinese Intracranial Hemorrhage Imaging Database (CICHID) was established by Peking Union Medical College Hospital in 2019 and is under continuous refinement. Differing from traditional text-based databases, CICHID is a large multimodal data warehouse implementing comprehensive clinical data storage and multidimensional imaging data analysis.

The CICHID was designed with a customized case report form (CRF) that consists of first visit elements and follow-up elements. Data are collected from the patient's medical records and summarized in a CRF via a secured web-based transmission mode. Detailed scanner imaging can be imported directly from the imaging management system and displayed. The CICHID enables custom real-time tracking for follow-up questionnaires, allowing users to set reminders for return visits. Original medical records in various formats are processed using artificial intelligence (AI) techniques to remove information revealing the patient's identification (e.g., name, residential address). Targeted data can be batch exported for the user according to specific retrieval conditions. The data processing workflow is summarized in Figure 1.

Figure 1:
The data processing workflow of CICHID. Raw data and the CRF are transferred via safe transmission mode to the CICHID. Medical records and imaging files are anonymized with AI assistance. Convolutional neural network-based deep learning models were developed to perform automated segmentation of hematoma, perihematomal edema, and ventricles. Data are stored in a structured manner [see Supplementary Figure 1,]. The targeted data can be batch exported according to the specific retrieval conditions of the user. Data analysis and reporting are conducted after investigation, and feedback is provided to the participating centers for clinical quality improvement and treatment guidance. AI: Artificial intelligence; CICHID: Chinese Intracranial Hemorrhage Imaging Database; CRF: Case report form.

The CICHID is not a routine database, but rather a functional data warehouse that supports the structured storage of all available medical records and image files of patients with ICH. Users can view and measure the targeted imaging within the imaging management system. Raw data are uploaded as an attachment and structured storage is conducted in a manner similar to that for a contents page [Supplementary Figure 1,]. Each attachment is labeled with custom multiple tags for easier storage and classification. This structured storage method facilitates the management of medical records and imaging files while avoiding missing and mismatched data.

Up to July 10, 2021, the data from 6705 patients with ICH have been retrospectively collected into the CICHID, including 27,491 head computed tomography (CT) scans and 64,749 relevant imaging segmentation files (hematoma, perihematomal edema, and ventricles). These data were collected from eight medical centers between January 2016 and December 2020. Furthermore, multimodal data including magnetic resonance imaging, CT angiography, CT perfusion, electroencephalogram, and transcranial Doppler ultrasonography were also enrolled in the CICHID, which should help in the search for novel biomarkers associated with therapeutic strategies and prognosis for patients with ICH. A total of 3644 ICH cases accepted the initial data extraction, and 24 key elements of clinical and imaging information were extracted and recorded. The baseline characteristics and prognostic information of these patients are given in Supplementary Table 1, In these 3644 ICH cases, the median onset age was 60 years, with 65.2% male patients and 34.8% female patients. The median onset time (from ICH onset to initial head CT examination) was 3 h. A total of 2264 (62.13%) patients had poor neurological outcomes (Glasgow Outcome Scale 1–3) at discharge.

High-quality accurate data are an essential part of an outstanding database on ICH and require medical institutions to be equipped with advanced electronic medical record systems and imaging equipment. In addition, high-quality writing and preservation of medical records are required. Training of data entry clerks and auditors on the standardized use of the data, the establishment of a detailed database construction protocol and CRF, and regular review of the data ensure data accuracy and completeness. Schwamm et al[3] reported that construction of an outstanding stroke registry database requires the following conditions: (1) establishment of standardized data elements; (2) adequate funding for the program; (3) use of an electronic data capture system; (4) attention to the completeness of follow-up data; (5) fostering of a teamwork culture; and (6) full protection of patient privacy. We consider that proper setting of data permissions, anonymization of data elements, and creation of a safe data transmission mode are also keys to privacy protection.

Intelligent data storage, processing, and transmission via AI techniques are becoming mainstream elements of novel database construction. For example, natural language processing[4] can be used to quickly establish key data elements to reduce the labor costs involved in creating labeled data. An AI-based imaging platform developed and continuously updated by our research team is scheduled to be built into the CICHID in the immediate future, thereby enabling automatic segmentation of hematoma, perihematomal edema, ventricles, midline, brainstem, and basal cisterns. Furthermore, some essential tasks that can be conducted in the CICHID platform include the prediction of onset time, hematoma expansion, and prognosis, which should promote precise and appropriate treatment for patients with ICH. Relevant results will be reported in due course.

The CICHID is a data warehouse that preserves the complete medical records and imaging files of patients with ICH. All raw data are classified according to a custom-made label and level, to facilitate structuring, curation, and storage. The CICHID continues to adopt AI techniques to better support more valuable ICH data, for both quality control in treatment and to assist in scientific research, to reduce disability, and to promote the health and wellness of patients with ICH in China.


This work was supported by the National Key R&D Program of China (No. 2018YFA0108603), the Non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences (2020-JKCS-026), the Beijing Tianjin Hebei basic research cooperation project (No. 19JCZDJC64600(Z)), and the CAMS Innovation Fund for Medical Sciences (No. 2020-I2M-C&T-B-028).

Conflicts of interest



1. Wang Y, Li Z, Wang Y, Zhao X, Liu L, Yang X, et al. Chinese stroke center alliance: a national effort to improve healthcare quality for acute stroke and transient ischaemic attack: rationale, design, and preliminary findings. Stroke Vasc Neurol 2018;3:256–262. doi: 10.1136/svn-2018-000154.
2. Han JX, See A, King N. Validation of prognostic models to predict early mortality in spontaneous intracerebral hemorrhage: a cross-sectional evaluation of a Singapore stroke database. World Neurosurg 2018;109:e601–e608. doi: 10.1016/j.wneu.2017.10.039.
3. Schwamm L, Reeves MJ, Frankel M. Designing a sustainable national registry for stroke quality improvement. Am J Prev Med 2006;31:S251–S257. doi: 10.1016/j.amepre.2006.08.013.
4. Juhn Y, Liu H. Artificial intelligence approaches using natural language processing to advance EHR-based clinical research. J Allergy Clin Immunol 2020;145:463–469. doi: 10.1016/j.jaci.2019.12.897.

Supplemental Digital Content

Copyright © 2022 The Chinese Medical Association, produced by Wolters Kluwer, Inc. under the CC-BY-NC-ND license.