TY - JOUR AU1 - Watanabe, Hiroshi AU2 - Takenouchi, Kiyoteru AU3 - Kimura, Michio AB - Introduction A number of studies have attempted to utilize secondary data directly from EMR for clinical research, etc. A potential solution to this problem was first demonstrated by the Adverse Spontaneous Triggered Event Reporting (ASTER) Project [1]. ASTER enables adverse event reporting through current electronic health records (EHR) systems. When a doctor discontinues prescription of a drug for a patient, s/he is prompted with a series of short questions to determine if this is due to an adverse event from the drug. ASTER offers the opportunity to follow event details that are not recorded in an EHR, as it is not regulatory information. After about 20 minutes, a MedWatch report, derived from the EHR, is delivered to the Food and Drug Administration (FDA) to report on the adverse event. When ASTER was tested previously, 20% of these reports were deemed as "serious," 100% had height/weight and lab data, and 91% of participating physicians had not submitted any adverse drug event (ADE) reports the prior year. ASTER’s ease of use, and the opportunity to report adverse events at the point of care within 60 seconds (vs. 34 minutes for a fax report), makes this innovative approach an essential step in enabling safer, more effective drugs. A recent publication of Nordo et al. [2] demonstrated that the use of the Integrating the Healthcare Enterprise® (IHE) Retrieve Form for Data Capture (RFD) Profile as a part of the Epic EHR research model along with the Research Electronic Data Capture (REDCap) electronic data capture (EDC) system and middleware (RADaptor) developed by the Duke University Office of Research Informatics produced significant time and resource savings and improved quality. Specifically, this eSource pilot for a registry study produced a 37% time savings and required one less full-time employee while the error rate was reduced from 9% to 0. For the Duke study, data elements mapped with RADaptor included those contained within the EHR’s continuity of care document (CCD), a standard used to comply with initial U.S. Meaningful Use requirements. Data elements not in CCD could be entered anew into the electronic case report form (eCRF). Both of the above studies were conducted at each single institution. A multi-center study, known as the Electronic Health Record for Clinical Research (EHR4CR) project, was conducted in Europe [3]. The collection of data for this project required substantial effort because the EMR data had not been standardized. In order to obtain high-quality EMR data at low cost from multiple institutions for the purpose of surveillance, it is essential to establish a system that can collect standardized and structured data from EMR. In Japan, output of medical claim data is obtained using receipt codes standardized by the computerized receipt processing system in almost all institutions. Standardized Structured Medical Information eXchange2 (SS-MIX2) Storages were used as the export data from EMR [4]. The SS-MIX project was promoted by Japan’s Ministry of Health, Labor and Welfare (MHLW) and was inherited from The Shizuoka Style EMR project in 2006FY [5]. According to investigations completed by MHLW in 2015FY [6], EMR systems were operating in 2,542 hospitals (34%) out of 7,426 hospitals in Japan and SS-MIX2 Standardized Storage was being implemented in 865 hospitals (34% of the hospitals with operational EMR systems). Confining these metrics to 710 hospitals with more than 400 beds, EMR systems were operating in 550 hospitals (78%) and SS-MIX2 Standardized Storage in 237 hospitals (43% of hospitals with EMR). SS-MIX2 Standardized Storage and Annex Storage is text-data-files storage that stores minimal medical information written by standard code, using multi-hierarchical folder structure (Fig 1: Description of the schematic structure of SS-MIX2 Standardized Storage and Annex Storage). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Description of the schematic structure of SS-MIX2 standardized storage and annex storage. https://doi.org/10.1371/journal.pone.0255863.g001 "SS-MIX2 Standardized Storage: explanation of the structure and guidelines for implementation Ver. 1.2" [7] was authorized as the standard specification of MHLW on 28 March 2016 [8]. Six hundred and thirty hospitals in Japan are storing patient demographics, diagnostic disease classification in ICD-10, prescription orders, and laboratory examination results in HL7 v2.5 format using SS-MIX2 Standardized Storage. Thus, the SS-MIX2 Specification is the de facto standard for the export from EMR in Japan. Hori et al. reported the detection of fluoroquinolone-induced tendon disorders by the secondary use of SS-MIX2 Standardized Storage at an institution [9]. A study on the early detection of drug-related adverse events using standardized patient data (i.e., medical claim data and the data from SS-MIX2 Standardized Storage) from multiple institutions was conducted by Japan’s Pharmaceuticals and Medical Devices Agency (PMDA). Here, we describe this Medical Information for Risk Assessment Initiative (MIHARI) project in detail. In this study, we investigated whether known adverse events could be detected from the data of several institutions that use SS-MIX2 standardized storage and compared the results with those from paper-based medical records and electronic data capture (EDC) systems. The database search engine D*D [10] was used in this study. It utilizes CACHE, which is a tree structure [11]. Based on its usefulness, the PMDA initiated a project named Medical Information Database Network (MID-NET) in 2018 for the detection of adverse events, in an attempt to substitute post-marketing surveillance [12–14]. In this paper, we describe the MIHARI project, in which we investigated whether known adverse events could be detected from data directly collected from the EMR of multiple institutions; this formed the basis of the ongoing MID-MET project. Materials and methods To investigate whether known adverse events can be detected by database surveillance using a hospital information system, serving as a substitute for traditional large-scale surveys in which data from each hospital is individually collected by companies, etc., utilizing a large amount of resources. In this study, medical claim data and SS-MIX2 standardized storage data were used to identify four diseases (diabetes, dyslipidemia, hyperthyroidism, and acute renal failure) and the validity of the outcome definitions was evaluated by calculating positive predictive values (PPV) as indices of validity (Fig 2: Methods for Surveillance). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Methods for surveillance. https://doi.org/10.1371/journal.pone.0255863.g002 Fig 2 shows the flowchart of surveillance that combination of medical claim data and SS-MIX2 standardized storage data are used for calculation of potential cases after anonymization. In this study, we used two methods. Method 1 was used to identify diabetes, dyslipidemia, and hyperthyroidism. Method 2 was used to identify acute renal failure. Method 1: Evaluation of validity based only on laboratory examination results. Subject outcomes: New onset of diabetes, dyslipidemia, and hyperthyroidism. Method 2: Evaluation of validity based on data, including information from medical records (medical records were reviewed by physicians with the support of clinical research coordinators [CRC]). Subject outcome: New onset of acute renal failure. We adopted two different evaluation methods because the validation study of Method 2, including data from medical records, would require a large amount of resources in terms of time and cost for reviewing; therefore, we also used a simple method (Method 1) based only on laboratory examination results for the evaluation of validity of the outcome definitions. {Data sources} SS-MIX2 standardized storage data were obtained from the hospitals listed below. Medical claim data and medical records were obtained from two hospitals among them. ・National University Corporation Kyushu University Hospital ・Social Welfare Organization Saiseikai Imperial Gift Foundation, Inc., Saiseikai Shizuoka General Hospital ・Shizuoka Prefectural Hospital Organization, Shizuoka General Hospital ・Numazu City Hospital ・National University Corporation Hamamatsu University Hospital ・Fukuroi City Hospital {Period of data surveyed} From April 1, 2007 to December 31, 2011 {Anonymization of data} A second PMDA identification (ID) was assigned to each patient whose data were extracted. In Method 1, irreversible anonymization was performed, before providing the data to PMDA, by discarding the correspondence table between original hospital ID and second ID. In Method 2, despite the fact that irreversible anonymization was desirable because their medical records were utilized, the correspondence table between original hospital ID and second ID was kept under lock and key in each hospital so that the patient data was not identified by outsiders. {Case identification by outcome definitions} Codes were provided for each disease name, drug, and medical practice according to disease. Some of the codes are provided in Table 1 (Examples of Code Lists). International Statistical Classification of Diseases and Related Health Problems-10 (ICD-10) codes were used for disease names, YJ codes were used for drugs, and the codes of the computerized receipt processing system were used for the related medical practice. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Examples of code lists. https://doi.org/10.1371/journal.pone.0255863.t001 Using these codes, automatically extracted groups (potential cases) were identified based on eight extraction conditions (outcome definitions) for Method 1 (Table 2: Outcome Definition in Method 1) and one extraction condition (outcome definition) for Method 2 (Table 3: Outcome Definition in Method 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Outcome definition in Method 1. https://doi.org/10.1371/journal.pone.0255863.t002 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Outcome definition in Method 2. https://doi.org/10.1371/journal.pone.0255863.t003 In Method 1, the outcome definitions differed according to data source (medical claim data alone or the use of SS-MIX2 standardized storage data) and the combination of disease name, drug, and medical practice. In Method 2, the outcome definition was based on the combination of disease name and laboratory examination data obtained only from SS-MIX2 standardized storage data. {True case identification} “True cases” were identified from the extracted potential cases. Method 1 (diabetes, dyslipidemia, and hyperthyroidism). The case identification method based on laboratory examination results was adopted, and the method was described with reference to the diagnostic guidelines and the opinions of clinicians (Table 4: The case identification method). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. The case identification method. https://doi.org/10.1371/journal.pone.0255863.t004 Method 2 (Acute renal failure) True cases were identified based on medical records, with reference to laboratory examination results after consultation with clinicians. The case identification method for acute renal failure Cases of acute renal failure were identified by three nephrologists based on acute changes in serum creatinine (Cre) levels. We decided that patients on long-term dialysis and those who had undergone renal transplantation should be excluded from the analysis. Therefore, patients were divided into subgroups according to previous serum Cre levels, according to laboratory tests in the past 3 months, in order to identify those who previously had high serum Cre levels. The previous serum Cr levels were classified into three categories: ≦1.2 mg/dL, >1.2 mg/dL and ≦2.0 mg/dL, and >2.0 mg/dL. Cases were classified as “true cases” or “other cases” according to the identification method for each disease. Subsequently, PPV and 95% confidence intervals (95% CI) were calculated as indices for the evaluation of the validity of the outcome definitions. This time, the Wald method was adopted to estimate the confidence interval [15]. The formula used for calculating PPV is as follows: PPV% = number of patients classified as true cases/number of potential cases × 100. In Method 2 (acute renal failure), true cases were identified from random samples, accounting for 30% of total potential cases. The following formula was used in this case for PPV calculation: PPV% = number of patients classified as true cases/number of subjects × 100. {Ethical background} This trial survey uses the secondary data of electronic medical record stored in the hospital information system in daily medical care. The trial survey was conducted as follows: “Ethical Guidelines for Epidemiological Research June 17, 2002 (Revised December 28, 2004) (Revised June 29, 2005) (Revised August 16, 2007)) (Partial revision on December 1, 2008)”. The implementation of this trial was consulted and approved by the Ethics Review Committee of the International University of Health and Welfare (approval number: 11–169). Results With respect to the outcome definition for newly developed diabetes, the number of cases and PPV based on medical claim data and SS-MIX2 standardized storage data are shown in Table 5 (Number of cases and PPV according to the outcome definition for diabetes). With respect to the outcome definition for diabetes, none of the PPVs reached 50% under any condition. The maximum PPV for “Definition 3–2 Disease Name and Drug (No interval limitation)” using the SS-MIX2 standardized storage data was 44.7%, whereas the maximum PPV for “Definition 6–2 Disease Name, Medical Treatment, and Drug (No interval limitation)” using medical claim data was 40.7%. The PPVs for diabetes were higher among the results obtained from SS-MIX2 standardized storage data than those obtained from medical claim data in any of the definitions. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 5. Number of cases and PPV according to the outcome definition for diabetes. https://doi.org/10.1371/journal.pone.0255863.t005 Considering the outcome definition for newly developed dyslipidemia, the number of cases and PPV calculated using medical claim data and SS-MIX2 standardized storage data are shown in Table 6 (Number of cases and PPV according to the outcome definition for dyslipidemia). The PPV for dyslipidemia was 50% or higher under all of the conditions. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 6. Number of cases and PPV according to the outcome definition for dyslipidemia. https://doi.org/10.1371/journal.pone.0255863.t006 With regard to the outcome definition for newly developed hyperthyroidism, the number of cases and PPV calculated using SS-MIX2 standardized storage data and medical claim data are shown in Table 7(Number of cases and PPV according to the outcome definition for hyperthyroidism). The PPV for hyperthyroidism, based on disease name definition alone, was 20–30%; however, the value exceeded 60% when the drug prescription was included in the conditions. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 7. Number of cases and PPV according to the outcome definition for hyperthyroidism. https://doi.org/10.1371/journal.pone.0255863.t007 Method 2 was applied to two medical institutions and the subject outcome was defined as “newly developed acute renal failure.” A review of medical records was performed with the cooperation of specialists, including multiple nephrologists. For evaluation of the validity using gold standards regarding the information on medical records, the subjects accounting for 30% of the 1,447 potential cases were selected by random sampling. Based on the medical records, each case was classified as a “true case” or “other case.” Evaluation of definition 1 (disease name & acute elevation in serum Cre levels) based on medical record information showed that PPV was as low as 53.7% (Table 8: Number of cases and PPV according to the outcome definition for acute renal failure); however, this value increased to approximately 80–90% when patients who previously had high serum Cre levels were excluded. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 8. Number of cases and PPV according to the outcome definition for acute renal failure. https://doi.org/10.1371/journal.pone.0255863.t008 Discussion This study investigated whether known adverse events could be detected by database surveillance using a hospital information system, in an attempt to substitute traditional large-scale surveys in which data from each hospital is individually collected by companies, etc. Study outcomes considered were newly developed diabetes, dyslipidemia, hyperthyroidism, and acute renal failure. The methods used in this study were relatively simple, because the review of laboratory examination results was performed by a single researcher in PMDA, although the review of medical records was done with the cooperation of multiple physicians. Detection of true cases by Method 2 took time and effort, as careful review of medical records by specialists was needed for validation. Contrastingly, Method 1 was more feasible and effective, detecting true cases and providing sufficient PPV only by reviewing the laboratory examination results; thus, this suggested the validity of the method. We believe that the simple validation performed using laboratory examination results alone is useful, if guideline definitions are specified. In addition, a mini-sentinel surveillance showed a method for identifying true cases based on the information of test results alone and reported that the outcomes considered feasible for validation included hyperglycemia, dyslipidemia, hyperthyroidism, etc. [16]. The PPVs of hyperglycemia, dyslipidemia and hyperthyroidism were calculated only by laboratory examination results without reviewing medical records which required cooperation of specialists. This methodology was so effective to detect ADEs that large scale evaluation of ADEs was possible for many drugs which was used subsequent MID-NET project initiated in 2018 in Japan. SS-MIX Standard Storage Data was used for PPV definition. The reliability of SS-MIX Standardized Storage Data is generally accepted in Japan and also used in clinical studies such as diabetes mellitus [17]. We tried to graph each PPV about the definition of Method 1(Fig 3. Positive Predictive Values (PPV)___Method 1).There were fewer evaluation items and insufficient statistical analysis, but the following were shown. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Positive Predictive Values (PPV)___Method 1. https://doi.org/10.1371/journal.pone.0255863.g003 The PPV for diabetes was relatively low. This may be attributed to the fact that evaluation was made based mainly on laboratory examination results, despite the inclusion of a wide variety of diabetes patients (those with suspected diagnosis, untreated patients, those under treatment, and those with different treatment effects) in the analysis. The PPV for diabetes was higher in the results obtained from SS-MIX2 standardized storage data than in those obtained from medical claim data in any of the definitions. The PPV for dyslipidemia was 50% or higher in all conditions. This may be due to the fact that many cases could be identified by the disease name definition alone, because of its high specificity. The PPV for hyperthyroidism outcome was 20–30% based on disease name definition alone; however, the value exceeded 60% when drug prescription was included in the definition, suggesting that drug prescription has high specificity for this disease. The total PPV for acute renal failure outcome was 53.7%, but increased to approximately 80–90% when patients who previously had high Cre levels were excluded. The PPV for this disease was higher than that of other diseases. Given that Cre levels are higher in patients on dialysis, etc., exclusion of such patients in this study might have led to higher PPV, but the most probable reason may be that “the definition was based on test results.” In addition, PPV based on disease name definition alone was low in all outcomes. The reason for this may be as follows: when medical institutions submit medical care fee claims, they are required to input disease names that are consistent with the medical practice provided; therefore, there may have been cases where the disease name recorded by the medical institution did not accurately reflect the actual clinical conditions. Thus, outcome definitions based on disease name alone may be inappropriate in studies using medical information databases. This study shows that SS-MIX data extracted from multiple collaborating medical institutions can be used for quantitative study on safety assessment, such as the impact of safety measures, examination of adequacy of assessment and outcome definition. In the future, PMDA will promote the medical information database infrastructure development project in cooperation with the Ministry of Health, Labor and Welfare in order to build a medical information database for the purpose of improving safety measures for drugs, etc. [18]. In the system development of the medical information database, the same extraction program was sent to each medical institution based on the experience in the trial survey using the SS-MIX data so far, and the target data was executed by each medical institution. It has become possible to implement a system from the search and extraction of to the simple aggregation. Moreover, it is considered that the knowledge about the characteristics of the hospital information system data of medical institutions and the utilization method obtained from the trial survey using SS-MIX data such as this time can be utilized when using the medical information database in the future. Conclusion When defining a disease, it is important to include the condition specific to the disease; furthermore, it is very useful if laboratory examination results are also included. Therefore, the inclusion of laboratory examination results in the definitions, as in the present study, was considered very useful for the analysis of multi-center SS-MIX2 standardized storage data. In Japan, it is expected that the number of pharmacoepidemiological studies based on the secondary use of large scale databases, such as medical information database, will increase in the future. Accordingly, the importance of validation studies is expected to increase as well. However, it is difficult to evaluate the validity of all outcome definitions using medical records. Therefore, the evaluation method of validity should be selected according to the types of outcomes. For example, the evaluation method using laboratory examination results can be used for diseases that can be diagnosed based on changes in laboratory values. Effective evaluation of the validity of the outcome definitions and accumulation of findings regarding various outcome definitions are considered important. In Japan, the 2018 MID-NET project, involving 24 hospitals (10 medical groups), was initiated based on the promising results presented here (i.e., direct detection of adverse events using EMR data) in an attempt to substitute post-marketing surveillance. TI - MIHARI project, a preceding study of MID-NET, adverse event detection database of Ministry Health of Japan—Validation study of the signal detection of adverse events of drugs using export data from EMR and medical claim data JF - PLoS ONE DO - 10.1371/journal.pone.0255863 DA - 2021-09-08 UR - https://www.deepdyve.com/lp/public-library-of-science-plos-journal/mihari-project-a-preceding-study-of-mid-net-adverse-event-detection-sZ8hLOCPWu SP - e0255863 VL - 16 IS - 9 DP - DeepDyve ER -