HiRID, a higher time-resolution icu dataset. Anonymization procedure

Posted Variation: 1.0


HiRID is just a freely available care that is critical containing data associated with very nearly 34 thousand patient admissions into the Department of Intensive Care Medicine of this Bern University Hospital, Switzerland (ICU), an interdisciplinary 60-bed device admitting >6,500 clients each year. The ICU supplies the complete selection of contemporary interdisciplinary intensive care medication for adult clients. The dataset was created in cooperation amongst the Swiss Federal Institute of tech (ETH) ZГјrich, Switzerland and also the ICU.

The dataset contains de-identified information that is demographic a total of 681 regularly gathered physiological factors, diagnostic test outcomes and therapy parameters from very nearly 34 thousand admissions through the duration. Data is kept having a time that is uniquely high of just one entry every 120 seconds.


Critical disease is described as the existence or danger of developing organ dysfunction that is life-threatening. Critically sick clients are usually maintained in intensive care units (ICUs), which focus on providing monitoring that is continuous advanced therapeutic and diagnostic technologies. This dataset had been gathered during routine care in the Department of Intensive Care Medicine for the Bern University Hospital, Switzerland (ICU), an interdisciplinary unit that is 60-bed >6,500 clients each year. It absolutely was initially removed to aid a research regarding the very early forecast of circulatory failure into the intensive care product making use of machine learning 1. The documentation that is latest when it comes to dataset is available2.


The HiRID database has a big www.datingrating.net/political-dating-sites/ choice of all routinely gathered data relating to patient admissions to your Department of Intensive Care Medicine regarding the Bern University Hospital, Switzerland (ICU). The info had been obtained from the ICU individual information Management System that is familiar with prospectively register patient health information, measurements of organ function parameters, outcomes of laboratory tests and treatment parameters from ICU admission to discharge.

Dimensions from bedside monitoring

Dimensions and settings of medical products such as for example technical ventilation

Findings by medical care providers e.g.: GCS, RASS, urine along with other fluid production

Administered drugs, liquids and nourishment

HiRID has a greater time quality than many other posted datasets, above all for bedside monitoring with many parameters recorded every two minutes.

To guarantee the anonymization of an individual in the information set, we accompanied the procedures effectively sent applications for the MIMIC-IIwe and Amsterdam UMC db dataset, which adopted the ongoing health Insurance Portability and Accountability Act (HIPAA) secure Harbor demands and, when it comes to Amsterdam UMC db, also europe’s General information Protection Regulation (GDPR) standards 3,4.

Elimination of all eighteen distinguishing information elements placed in HIPAA

Times were shifted by way of a random offset in a way that the admission date lies. We ensured to protect the seasonality, time of time as well as the day’s week.

Individual age, height and fat are binned into containers of size 5. For patient age, the maximum container is 90 years possesses additionally all older clients.

Dimensions and medicines with changing devices in the long run had been standardised towards the latest device utilized. This standardization had been essential to make a summary about projected admission times, on the basis of the devices found in a patient that is specific impossible.

Complimentary text had been taken out of the database

k-anonymization was applied on patient age, weight, sex and height.

Ethical approval and consent that is patient

The institutional review board (IRB) associated with the Canton of Bern authorized the research. The necessity for acquiring informed patient consent ended up being waived due to the retrospective and nature that is observational of research.

Information Description

The data that are overall for sale in two states: as natural information and/or as pre-processed information. Also you will find three guide tables for adjustable lookup.

Guide tables

adjustable guide – guide dining table for variables (for natural phase)

ordinal adjustable reference – guide dining table for categorical/ordinal variables for string value lookup

pre-processed adjustable guide – guide dining dining dining table for factors (for merged and stage that is imputed

Natural data

The raw information was just prepared if it was necessary for patient de-identification and otherwise left unchanged when compared to source that is original. The origin information provides the complete group of available factors (685 factors). It is comprised of the after tables:

Preprocessed information

The pre-processed information consist of intermediary pipeline phases from the accompanying book by Hyland et al 1. Source factors representing the exact same concepts that are clinical merged into one meta-variable per concept. The info provides the 18 many meta-variables that are predictive, as defined within our book. Two various phases associated with the pipeline can be obtained

Merged phase supply factors are merged into meta-variables by medical ideas e.g. non-opioid-analgesics. Enough time grid is kept unchanged and it is sparse.

Imputed phase the info through the merged stage is down sampled to a five-minute time grid. Enough time grid is filled up with imputed values. The imputation strategy is complex and it is talked about into the publication that is original.

The rule utilized to come up with these phases are available in this GitHub repository beneath the folder 5 that is preprocessing.

Which information to make use of?

The pre-processed data is intended primarily as a fast option to jump-start a task and for use within a proof concept. We advice utilising the supply data whenever you can for regular jobs. It will be the many versatile type and possesses the whole pair of factors into the initial time quality.

Information platforms

Information is obtainable in two platforms: CSV for wide compatibility and Apache Parquet for performance and convenience.

Because the information sets are fairly big, these are generally divided in to partitions, so that they could be prepared in parallel in a simple method. The lookup dining dining dining table mapping patient id to partition id is supplied into the file known as combined with information. The partitions are aligned amongst the different information sets and tables, in a way that the information of an individual can invariably be located within the partition because of the exact same id. Note however, that an individual may well not take place in all data sets, e.g. a patient could be lacking within the data that are preprocessed because an individual did not meet with the demographic requirements become contained in the research.

Patient ID / ICU admission

The dataset treats each ICU admission uniquely which is extremely hard to determine numerous ICU admissions as originating from the exact same client. A unique “Patient ID” is generated for each ICU ( re-)admission.

Information schemata

The schemata of each and every dining dining table are available in the *schemata.pdf* file.

Use Records

While the database contains detailed information about the care that is clinical of, it should be treated with appropriate care and respect.

Scientists have to formally request access via PhysioNet. The user has to be a credentialed PhysioNet user, digitally sign the Data Use Agreement and provide a specific research question to be granted access.

Conflicts of Interest

The writers declare no disputes of great interest


Access Policy: Only PhysioNet credentialed users whom signal the specified DUA can access the files.