When hospitals move from “process electronics” to “data-driven healthcare”, the real problem is not the lack of data, but how to refine the fragments scattered in dozens of systems such as outpatient clinics, hospitalizations, inspections, and pharmacies into data assets that are both compliant and high-value, and allow doctors, patients, and managers to benefit “without feeling”.
The positioning of the hospital information SaaS platform is not a simple process digitization tool, but has evolved into an intelligent center that supports the whole chain of medical services. For product teams, there is a core proposition behind this transformation: how to forge data fragments scattered in dozens of business modules such as outpatient, inpatient, testing, and pharmacy into reusable and high-value data assets? More importantly, how to make the value of these data penetrate into every detail of diagnosis and treatment, management, and patient service through product-oriented design, while balancing value release and risk control within the compliance red line?
Our team has deeply participated in the information platform upgrade project of 3 tertiary hospitals and 12 community health service centers, and explored a set of data assetization paths suitable for medical scenarios in practice. This paper will take the full link of data collection, cleaning, analysis, and application as the skeleton, combined with real project cases, to dismantle the productization landing logic of data assets of the hospital information SaaS platform, hoping to provide peers with a reusable practical reference.
1. Data collection system
Data collection is the source of data assetization, and its quality directly determines the credibility of subsequent analysis and application. However, the particularity of the hospital scene makes this step challenging: the data formats of different systems are like dialects, the data association between outpatient and inpatient is like a jigsaw puzzle, and the requirements for patient privacy protection are like a tight mantra. Based on practical experience, we have summarized the design principles of the collection system that prioritizes scene coverage, flexible adaptation, and embedding compliance at the core.
1.1 Multi-source data integration
The problem of data silos in hospitals is far more complex than imagined. In the project of a tertiary hospital, we found that only the basic information of patients was scattered in four platforms: outpatient HIS, inpatient EMR, physical examination system, and medical insurance settlement system, and there were even three expressions in the name field: Zhang San, Zhang Xiaosan, and ZhangSan.
To achieve these three challenges, product managers will only continue to appreciate
Good product managers are very scarce, and product managers who understand users, business, and data are still in demand when they go out of the Internet. On the contrary, if you only do simple communication, inefficient execution, and shallow thinking, I am afraid that you will not be able to go through the torrent of the next 3-5 years.
View details >
This makes us realize that data collection should not only focus on system docking, but also start from business scenarios to achieve full-process penetration.
1.1.1 Hierarchical docking strategy
In view of the current situation of mixing and matching the old and new hospital systems, we have designed a three-layer docking architecture:
- Standardized direct connection layer: For new systems that follow HL7FHIR international standards (such as the electronic medical record system launched by a hospital in 2022), data is synchronized directly through standardized APIs in real time. Here is a detail: In order to avoid the frequency of API calls being too high to affect the performance of the system, we have jointly formulated peak current limiting rules with the hospital information department – the frequency of API calls during outpatient peak hours (8:00-10:00) is controlled within 5 times per second, and the peak hours are relaxed to 10 times per second.
- Intermediate conversion layer: Develop a lightweight conversion tool for old systems using private protocols (such as the 2008 HIS system still in use in a community hospital). For example, for the DBF format drug inventory data exported by the system, we wrote a format conversion engine with a Python script, which automatically converted the date field from yyyymmdd to yyyy-mm-dd, and matched and completed the drug name with the coding database of the State Food and Drug Administration, and finally output the unified JSON format of the platform.
- Offline make-up layer: For manual records without interfaces (such as the patient treatment response register of the acupuncture department of a hospital), we have designed a QR code scanning + OCR dual-mode collection tool. After the medical staff scans the QR code (bound to the patient ID) on the registration book with PDA, the record content is photographed, and the system recognizes the text through OCR and automatically associates the patient file. For content with an accuracy of less than 85% (such as handwritten scribbled dizziness is recognized as head meat), a manual proofreading reminder will be triggered.
1.1.2 Scenario-based data list
Based on the whole process of patient visit, we sort out the core data collection items of three major scenarios, and distinguish between basic items and extended items:
- Outpatient scene: The basic items include 12 categories such as registration type, medical department, and chief complaint symptoms (covering medical insurance settlement, triage and scheduling needs); Expansion items are supplemented according to the characteristics of the hospital, such as increasing the urgency of the allergy history of the guardian relationship in the pediatric outpatient clinic, and increasing the mobility rating of the geriatric outpatient clinic.
- Hospitalization scene: The basic items include 23 categories such as admission diagnosis, surgical records, and nursing level (meeting the requirements of medical record specifications and quality control); The expansion project is designed for the differentiation between tertiary hospitals and community hospitals – tertiary hospitals add scientific research data labeling for clinical path execution nodes, and community hospitals add contracted doctor IDs associated with family beds.
- Inspection scenarios: The basic items cover 8 categories such as inspection items, sample status, and abnormal identification (support report generation and result traceability); The expansion project takes into account the needs of medical technology departments, such as adding equipment calibration status in the laboratory department and increasing film storage paths in the imaging department.
1.2 Optimization of acquisition frequency
The timeliness of medical data varies greatly: if the number of outpatient waiting lists is delayed by 10 minutes, it may lead to confusion in triage; The patient’s blood type information remains unchanged for life, and it is enough to synchronize once a year. In practice, we have established a three-level frequency system + intelligent scheduling mechanism to ensure data freshness and avoid excessive consumption of system resources.
1.2.1 Three-level frequency system
- Real-time (seconds/minutes): Focuses on scenarios that affect real-time decision-making. For example, the number of outpatient waiting lists is refreshed every 30 seconds, and the data source is the calling queue of the registration system, and the refresh logic is that the new number is added or called to be completed and triggered immediately + 30 seconds of timing to ensure that the triage screen is zero error between the actual queue; The vital signs (heart rate, blood oxygen) of patients in the emergency room are synchronized every second through the monitor interface, and once the threshold (such as blood oxygen < 90%) is exceeded, the sound and light alarm is immediately triggered.
- Near real-time (hourly): Suitable for daytime dynamic monitoring scenarios. Taking the daily medication consumption of inpatients as an example, we set it to be collected every 2 hours, and the data source is the pharmacy’s medication placement record and the nurse station’s execution record, and the two are compared to generate a list of executed/unexecuted/abnormal execution, which assists the pharmacy to dynamically adjust the inventory – after the application of a tertiary hospital, the number of temporary transfers by pharmacies decreased by 37%.
- Offline (Daily/Weekly): Used for non-time-sensitive analysis scenarios. Basic patient information (such as name and gender) is synchronized at 2 a.m. every day (to avoid business peaks), and the incremental update mode is adopted (only records with changes on the day of synchronization) are synchronized, reducing the data transmission volume by 80%; The monthly medical quality indicators are summarized at 20 o’clock every Sunday, leaving enough time for the information department to report before the meeting on Monday morning.
1.2.2 Intelligent scheduling mechanism
During the peak registration period (7:30-9:00) of a hospital’s outpatient clinic, we found that data collection requests caused delays in the response of the HIS system. To this end, we developed a load-aware module with the core logic as:
- Real-time monitoring of CPU usage, memory usage, and interface response time of the hospital’s core system (metrics are collected every 5 seconds);
- When a system load exceeds a threshold (e.g., CPU usage >80% for 1 minute), automatically trigger a downclocking strategy – for example, temporarily adjust the hourly collection of drug inventory to daily level, and suspend non-urgent historical data replenishment.
- After the load drops below the threshold (e.g., CPU usage <60% for 5 minutes), the original frequency is gradually restored, and high-priority data (such as emergency patient information) is restored first.
After implementation, the peak lag rate of the hospital’s core system has dropped from 15% to 2%.
1.3 Compliance and Security
Medical data compliance is an untouchable red line. In a project, we received a rectification notice from the National Health Commission for collecting health records without obtaining patient authorization in advance – which made us deeply realize that compliance cannot stop at post-inspection and must be embedded in the whole collection process.
1.3.1 Compliance verification before collection
We have sorted out the “Data Source Compliance List” to clarify the collection basis and authorization requirements for each type of data:
- Data that does not require authorization: such as outpatient volume and department name (belonging to hospital operation data), directly collect and mark the compliance type: public information;
- Data that requires patient authorization: such as medical record content and examination reports (which are sensitive information), trigger a three-level authorization process before collection: (1) The system pop-up window displays the “Data Use Authorization Letter” (clarifying the purpose, scope, and term); (2) Patient signature confirmation (support electronic signature, paper signature scanning and uploading); (3) Generate a unique authorization number (format is hospital ID + date + random 6 digits) and associate it with the whole link of data flow to ensure that who authorizes, who authorizes, and where it is used can be traced.
1.3.2 Transport layer security reinforcement
The risk of medical data transmission cannot be ignored – there have been cases where hospitals have had test reports tampered with during transmission, leading to errors in diagnosis and treatment. We use encrypted channel + data signing dual protection:
- Encrypted channel: Priority is given to the use of 3 protocols dedicated to the medical industry, and the function of breakpoint resumption + data verification is added to the problem of network instability in primary hospitals – after the transmission is interrupted, the next connection will automatically continue from the breakpoint to avoid repeated transmission;
- Data signature: Each batch of data is attached with a timestamp + device signature (the device signature is the hardware code of the hospital terminal + platform private key encryption result), and when the receiver verifies it, if the time difference between the timestamp and the system is found to be more than 30 minutes (excluding time zone errors), or the signature does not match the device whitelist, it will immediately mark it as abnormal data and trigger an alarm.
2. Data cleaning system
The problem of dirty data in medical data is far more complex than in the average industry. When processing the historical data of a hospital, we found that the date of birth appeared in 3 different years in the hospitalization record of the same patient, and the diagnosis results had 3 expressions: myocardial infarction, acute myocardial infarction, acute myocardial infarction, and acute myocardial infarction – if these problems are not solved, data assets will only become data garbage. Combined with the characteristics of the medical scene, we have designed a set of scenario-based cleaning + automated closed-loop quality control system.
2.1 Medical scenario-based cleaning tasks
2.1.1 Entity consistency check
In medical scenarios, entity matching is a core difficulty. In practice, we summarize the multi-dimensional matching rules to solve the problem of multiple files and multiple expressions of the same patient.
1) Patient identity matching: The ID number is the core key, but in practice it is often encountered that there is no ID number (such as newborns) or ID number errors (such as manual entry errors). To this end, we have added auxiliary field weighted matching – name (weight 40%, support fuzzy matching of homophones and different words, such as Zhang San and Zhang Shan are associated through pinyin similarity algorithm), date of birth (weight 30%), contact number (weight 20%), medical insurance card number (weight 10%), and automatic merging when the comprehensive score ≥ 80 points to generate patient unique identifier (PID). In a maternal and child health hospital, the rule reduced the rate of duplicate patient files from 22% to 3%.
2) Standardization of medical terminology: Establish a non-standard terminology-standard term mapping table in accordance with the standard libraries such as “Disease Classification and Codes (ICD-10)” and “National Medical Service Price Project Specification” issued by the National Health Commission. For example:
- Diagnostic terms: Myocardial infarction and myocardial infarction are uniformly mapped to ICD-10 code 900, and the original text is retained as an alias remark;
- Test indicators: The BGGLU abbreviation of blood glucose was unified as blood glucose (GLU), and the reference value range was associated (distinguishing between adults/children/pregnant women).
- Surgical name: Cholecystectomy laparoscopic cholecystectomy was mapped to the “Surgical Operation Classification Code (ICD-9-CM-3)”, and attributes such as surgical type, incision grade, etc. were supplemented.
2.1.2 Outlier Identification
Outliers in medical data may be real special cases (e.g., premature infants weigh only 1.2 kg) or may be entered incorrectly (e.g., the date of hospitalization is miswritten as 300 days). We designed scenario-based verification rules to avoid one-size-fits-all misjudgments.
1) Verification based on business logic: Sort out 28 core business rules, such as:
- outpatient visit time ≤ test report time (if reversed, it will be judged as a logical error and automatically pushed to the laboratory department for modification);
- The duration of the operation ≤ 5 times the average duration of the same disease (e.g., cholecystectomy takes an average of 1 hour, and when a record is 10 hours, it is marked as manual review with possible cause options: complex case/entry error/mid-term pause). In a tertiary hospital, the rule identifies an average of 120 logical errors per month, 85% of which are verified to be entry errors.
In a tertiary hospital, the rule identifies an average of 120 logical errors per month, 85% of which are verified to be entry errors.
2) Statistical based checks: The test indicators (such as blood routine white blood cell counts) are stratified Z-score algorithm – first grouped by age (newborn/child/adult/elderly) and gender, and then the standard deviation of each group of data is calculated, and when a value deviates from the mean by more than 3 standard deviations, the abnormality is marked. For example, the normal range of neonatal leukocytes is (15-20)×10⁹/L, and adults are (4-10)×10⁹/L, and the accuracy of abnormal identification is increased by 40% after stratification.
2.1.3 Missing value handling
Missing medical data often occurs due to business scenarios (such as outpatients not undergoing CT CT, resulting in missing CT data), and should be treated differently:
1) Compulsory verification of mandatory items: Sort out 16 core compulsory items (such as admission diagnosis and discharge diagnosis of inpatients), and trigger a three-level reminder when the doctor submits – (1) When the field is blank, the pop-up window prompts please add XX information; (2) If it is forcibly submitted, the system will record the missing mark and push it to the quality controller of the department; (3) If it is not completed within 24 hours, it will be related to the doctor’s performance appraisal (accounting for 5%). After implementation, the core field completeness rate of a hospital has increased from 78% to 99%.
2) Optional intelligent filling: For non-core fields (such as professional ethnic groups), use scenario-based probability population:
- When there is a lack of occupation, the filling value is calculated based on the type of medical insurance (employee medical insurance→ the probability of in-service personnel is 60%), the department visited (pediatrics→ the probability of parents is 80%), and age (25-60 years old→ the probability of being employed is 70%).
- Fill in the results to speculate on the algorithm (confidence XX%), and allow medical staff to manually correct them, and synchronize the correction records to the rule optimization database (if it is found that the proportion of retirement in the employee medical insurance + 55-year-old group is high, the weight of the algorithm will be automatically adjusted).
2.2 Automated cleaning process
2.2.1 Modular design of cleaning process
We disassemble the cleaning task into a four-step closed loop of verification, processing, review, and feedback, and each link has a clear responsible entity and operating specifications:
1) Calibration module: Execute rules according to entity consistency→ terminology standardization→ outlier identification→ and missing value processing, and output the “Dirty Data List” (including error type, system, associated business, and scope of influence). For example, when a record is marked as non-standard, the list will indicate the original term: myocardial infarction→ standard term: 900, and related business: cardiology quality control statistics.
2) Processing module: Distinguish between automatic and manual processing:
- Automatic processing: Perform preset operations (delete duplicates, keep the latest records) for clear errors (such as duplicate drug inventory records);
- Manual processing: For errors that need to be judged (such as abnormal test values), generate to-do tasks (push to the medical staff of the corresponding department, with handling suggestions).
3) Audit module: Support medical staff to directly modify in the system (such as correcting patient name entry errors), and the modification interface will automatically display the original data modification reason options (input error/business change/other), and generate the “Modification Track Table” (including modifier, time, IP address, and approver) after submission to meet the requirements of quality control traceability.
4) Feedback module: Generate a “Data Quality Report” every month, and mark the performance of each department with traffic lights – the missing value rate of outpatient departments < 5% is green light, 5%-10% is yellow light, and > 10% is red light; The non-standard rate of laboratory terminology < 3% is green light, 3%-8% is yellow light, and >8% is red light. The report is synchronized to the hospital quality control meeting to promote the optimization of entry habits on the business side.
2.2.2 Dynamic iteration of cleaning rules
Establish a rule-library-scene library linkage mechanism to avoid rule rigidity:
- Rulebase solidification: Solidify the general rules (such as 8-digit numbers in the format of the clinic number) into the system, and update them quarterly by the product team (combined with new national regulations and changes in industry standards);
- Scene library customization: Allow hospitals to add rules according to characteristic services, such as a traditional Chinese medicine hospital needs to verify whether the type of traditional Chinese medicine syndrome differentiation conforms to the “Classification and Code of Traditional Chinese Medicine Syndrome”, and can upload the term base through the rule configuration interface, and trigger a warning when the settings do not match;
- Threshold optimization: Analyze the amount of data hit by the rule every quarter, and if the misjudgment rate of a rule (such as the duration of the operation > the 5 times the mean marking is abnormal) > 10%, the threshold will be automatically adjusted (such as relaxed to 6 times the mean), and the hospital quality control department will be notified for confirmation.
3. Data analysis system
The cleaned high-quality data needs to be converted into decision-making information through scenario-based analysis. The needs of different users for data vary significantly: the dean pays attention to the operational efficiency of the whole hospital, the doctor pays attention to the patient’s diagnosis and treatment plan, and the patient pays attention to his own health management. We have built a 3D analysis system in practice to accurately reach every user with data value.
3.1 Operational data analysis
The core need of hospital management is to quickly grasp the overall situation and accurately locate the problem. We designed a three-level visual cockpit that penetrates layer by layer from macro to micro to meet the needs of different management levels.
3.1.1 Scene-based design of the visual cockpit
- College-level overview layer: For the president and vice president in charge, 12 core indicators (outpatient volume, bed utilization, average hospitalization days, etc.) are displayed on a dynamic dashboard, and the compliance status is marked with red, yellow and green colors (such as the average hospitalization day exceeds the benchmark value by 15% in red). Specially designed index linkage function – click on the low bed utilization rate to automatically display the bed usage details of each department (including the number of vacant beds and the number of people to be discharged); Then click on the occupancy rate of internal medicine beds < 60%, which can penetrate into the occupancy status of specific wards and specific beds.
- Department detail layer: For department directors, department-level data is displayed according to internal medicine/surgery/medical technology classification. For example, the internal medicine module contains 15 indicators such as outpatient referral rate, proportion of difficult cases, and average prescription amount, which supports comparison over the same period (this month vs last month vs the same period last year) and horizontal comparison (compared with the average value of the same department in hospitals at the same level). The director of the internal medicine department of a tertiary hospital found that the outpatient referral rate was abnormally increased through this module, and after retrospection, it was found that the triage standard was not clear, and the referral rate decreased by 20% after timely adjustment.
- Business process layer: For middle managers such as head nurses and pharmacy directors, focus on the efficiency analysis of key processes. For example, the outpatient registration-treatment-payment process uses a time-consuming heat map to display the duration of each link (5 minutes for registration, 30 minutes for waiting, 10 minutes for treatment, and 20 minutes for payment), and intuitively marks the bottleneck links (the payment process takes far more than the standard 5 minutes). Combined with the configuration data of the personnel scheduling window, optimization suggestions are automatically generated (such as adding 2 self-service payment machines, the estimated time can be shortened to 8 minutes).
3.1.2 In-depth dismantling of management indicators
Taking the core indicator of average hospital stay as an example, we use multi-dimensional analysis to dig out the root causes and assist management in implementing precise policies:
- Disease Dimension: The average hospitalization days for each disease (such as 7 days for myocardial infarction and 5 days for pneumonia according to ICD-10 code) are counted, and abnormal diseases are located. A hospital found that the hospitalization days for knee replacement reached 15 days (industry average of 9 days), and after retrospection, it was found that there were not enough postoperative rehabilitation beds, and after adding 3 new rehabilitation beds, the hospitalization days were reduced to 11 days.
- Process dimensions: The composition of hospitalization days (preoperative waiting time, postoperative recovery time, examination waiting time) was disassembled. If the preoperative waiting time for a disease accounts for more than 40%, it suggests that the surgical scheduling efficiency needs to be optimized. Through this analysis, a hospital found that the preoperative waiting time for cataract surgery accounted for 55%, and the waiting time was shortened by 30% after optimizing the scheduling rules (ranked by age and vision priority).
- Department dimension: Compare the hospitalization days of the same disease in different departments (such as 8 days in orthopedic group A and 12 days in group B) to promote experience sharing. After the rapid rehabilitation process of group A (preoperative training, 24-hour postoperative bed discharge) was used by group B, the hospitalization day was reduced to 9 days.
3.2 Clinical data analysis
The core need of doctors is to optimize diagnosis and treatment plans based on data. We compare the two dimensions of patient condition tracking diagnosis and treatment plans, and design analysis tools that fit clinical scenarios.
3.2.1 Dynamic tracking of the patient’s condition
Construct a time series curve based on patient historical data, allowing doctors to intuitively grasp the trend of disease changes:
- Management of chronic patients: For diabetic patients, the previous blood glucose detection values (fasting/postprandial period), medication records (insulin dose, type of oral medication), and dietary suggestions are automatically integrated to generate a blood glucose-medication correlation curve. For example, after adjusting the insulin dose marked on 2023-10-01, the blood sugar dropped from 5mmol/L to 7.3mmol/L within 3 days to help doctors evaluate the effect of medication. After the application in a community hospital, the blood sugar compliance rate of diabetic patients increased from 65% to 78%.
- Inpatient monitoring: Summarize the daily signs (body temperature, blood pressure) and test indicators (white blood cells, C-reactive protein) of inpatients in real time, and generate a trend warning map. When there is an abnormal trend in the indicator (such as elevated white blood cells for 3 consecutive days), it will automatically push a reminder to the doctor in charge with an analysis of possible causes (infection/drug reaction/test error). After the ICU application in a tertiary hospital, the early identification time of infection was on average 2 days earlier.
3.2.2 Comparison of the effects of diagnosis and treatment plans
Assist physicians in choosing the best option with real-world data:
- Comparison of treatment options: The system automatically matches patient groups with the same disease, the same course of disease, and the same underlying disease (such as over 60 years old, type II diabetes, and hypertension), and compares the effect indicators of different treatment regimens (blood glucose control rate, complication rate, and treatment cost). For example, 3-month data from the insulin injection group and the oral hypoglycemic drug group are displayed to help doctors choose a plan according to the patient’s individual situation.
- Comparison of surgical plans: For surgical patients, the postoperative recovery data (time to get out of bed, day of hospitalization, cost) were analyzed. For example, for cholecystectomy, the difference between laparoscopic surgery (average time to get out of bed is 5 days, hospitalization day is 3 days) and laparotomy (average time to get out of bed is 3 days, hospitalization day is 5 days), and laparoscopic surgery with less trauma is recommended for elderly patients and patients with many underlying diseases. After the application in a hospital, the proportion of laparoscopic surgery increased from 45% to 68%, and patient satisfaction increased by 22%.
4. Data application system
The ultimate value of data analysis needs to be transformed into user-perceivable services through product functions. We focus on three major scenarios: improving diagnosis and treatment efficiency, optimizing patient experience, and improving medical quality, and designed a series of data-driven application functions.
4.1 Smart reminders
4.1.1 Accurate reminders at the diagnosis and treatment end
- Reminder of the execution of medical orders: According to the patient’s medication plan (such as three times a day, 30 minutes after meals), combined with the patient’s historical meal time (obtained through canteen consumption records and patient APP check-ins), the system pushes a reminder of the doctor’s order to be executed to the nurse station 25 minutes after the meal (including the patient’s bed number, drug name, and dosage). In order to avoid interfering with the work of nurses, the reminder adopts hierarchical push – text reminder for ordinary drugs, and text + sound and light reminder for special drugs (such as chemotherapy drugs). After the application of a hospital, the rate of missed medical orders dropped from 8% to 5%.
- Check the timeliness reminder: For tests that require dynamic monitoring (such as blood routine on the 1st, 3rd, and 7th days after surgery), the system automatically generates an examination application form at the corresponding time point and pushes it to the doctor’s workstation (with a description of the purpose of the examination: to assess the risk of postoperative infection). If the doctor does not issue a prescription within 24 hours, it will be automatically upgraded to the head nurse of the department. After the application in a hospital, the completion rate of key postoperative examinations increased from 72% to 96%.
4.1.2 Scenario-based reminders on the patient side
- Intelligent calculation of follow-up time: Combined with the characteristics of the disease (monthly follow-up for hypertension and every 3 months after cancer surgery) and the patient’s last visit time, the follow-up date is automatically calculated, and the patient’s working hours (through professional judgment in the APP registration information) are pushed as reminders – office workers avoid the morning rush hour (8:00-9:00) on weekdays, and retirees give priority to recommending the morning session. The reminder is attached to the online registration entrance before follow-up preparation list (such as fasting, with past reports). After the application in a hospital, the follow-up rate of patients increased from 60% to 75%.
- Drug management reminders: The patient scans the barcode on the drug packaging through the APP, and the system automatically enters the expiration date and dosage (associated instructions). When the remaining amount of the drug is less than 3 days, push the renewal reminder (with online pharmacy link, nearby pharmacy inventory); Push drug suspension reminder 7 days before expiration (with recommendations for expired drugs and harmful alternative drugs). After the application in a community hospital, the number of complaints from patients who mistakenly took expired drugs was reduced to 0.
4.2 Personalized service
4.2.1 Construction of patient health profiles
Integrate multi-dimensional data to generate a 360° health portrait, including 6 dimensions:
- Basic information: age, gender, blood type, occupation;
- Health data: allergy history (including severity), chronic disease history (staging/grading), family history;
- Diagnosis and treatment records: department visited, attending doctor, medication reaction (such as rash after taking aspirin);
- Behavioral data: exercise frequency (based on APP steps), dietary preferences (based on ordering records), work and rest habits (based on consultation records);
- Service preferences: time tendency (morning/afternoon), communication method (phone/SMS/APP);
- Risk warning: risk of disease progression (e.g., diabetes mellitus →diabetic nephropathy), underlying health problems (e.g., sedentary + high blood pressure→ higher risk of stroke).
4.2.2 Implementation of service recommendation scenarios
Push personalized services based on health portraits to avoid the same:
- Diagnosis and treatment service recommendation: For patients with diabetes + poor blood sugar control, diabetes clinics (with doctor’s expertise: insulin regimen adjustment) and dynamic blood glucose monitoring services (indicating that blood sugar fluctuations can be recorded for 72 consecutive hours); For patients with repeated cough + smoking history, special respiratory examination (with low-dose CT screening discounts) is recommended.
- Health management recommendation: Combined with behavioral data generation scheme – for patients with hypertension in sedentary offices, it is recommended to get up and move for 5 minutes every 1 hour of sitting (with a simple action diagram in the office), and synchronously push the walking route of the nearby park (marked with 80% shade coverage, suitable for summer exercise); For diabetics who prefer to cook at home, low-GI recipes (with ingredient sourcing links, cooking videos) are recommended, and inappropriate dishes are filtered according to the patient’s taste preferences (such as not liking spicy).
5. Data security and privacy protection system
The sensitivity of medical data dictates that security is a prerequisite for all value release. In practice, we have built a full life cycle protection system, from collection, storage to use and transmission, every link is embedded with security mechanisms to ensure data availability and strictly prevent privacy leakage.
5.1 Data desensitization
At its core, data masking is on-demand masking – protecting privacy without compromising normal use. We have designed a hierarchical desensitization strategy + scenario-based adjustment mechanism.
5.1.1 Hierarchical desensitization strategy
- Core privacy data: For example, ID number and complete medical records, use partial hiding + format to retain desensitization – ID number is displayed as 110101********1234 (retain the first 6 digits of the administrative area code and the last 4 digits of the check code to facilitate the identification of the place of ownership); The patient’s name is displayed as Zhang* in the medical record, but auxiliary information such as gender and age is retained (to facilitate the doctor to identify the patient).
- Semi-sensitive data: For example, the department and medication records are blurred – for unauthorized personnel, the oncology department is displayed as an internal medicine-related department, and the chemotherapy drug is displayed as a special treatment drug; Only the complete information is displayed to authorized personnel (such as attending physicians, department directors).
5.1.2 Desensitization Scene Adaptation
Dynamically adjust the desensitization intensity according to different scenarios to avoid over-desensitization or under-desensitization:
- Outpatient reception scenario: The doctor can view the patient’s full name (easy to verify the identity), but the ID number is still desensitized (displayed as ****);
- Scientific research analysis scenario: All patient identifiers (name, ID number, medical record number) are replaced with random numbers (such as P2023001), and only anonymized data such as disease and treatment are retained.
- Teaching and teaching scenarios: Desensitize private information such as the patient’s name and address in the medical record, and retain the condition description and diagnosis and treatment process (with the patient’s teaching authorization logo).
5.2 Permission management
The core of permission management is to give only those who need it, only the data that needs it. We designed a role-permission matrix + dynamic adjustment mechanism.
5.2.1 Role-Permission Matrix Design
Hospital users are divided into 6 types of core roles, and the authority boundaries are clarified (Table 1):
5.2.2 Dynamic permission adjustment
Support temporary authorization mechanism to solve data access requirements in emergency scenarios:
- When the emergency doctor receives a referred patient, he can submit an application for temporary permission through the system (indicating that the patient is confused and needs to check his past medical history), and after online approval by the department director, he will obtain 4 hours of temporary permission to view the patient’s past medical records;
- After the permission expires, it is automatically recycled, and all access behaviors (viewing time, viewing content, and operation records) are synchronized to the audit log of the hospital information department to ensure that there are records with permissions and traceability.
5.3 Full-link encryption
5.3.1 Storage encryption
Adopt field-level encryption policies to distinguish between sensitive and non-sensitive fields:
- Sensitive fields (such as medical record content, inspection report): AES-256 algorithm is used to encrypt and store, the key is managed separately by the hospital information department (the platform manufacturer cannot obtain it), and the key is automatically rotated every 3 months;
- Non-sensitive fields (such as department name, device number): Database encryption (transparent data encryption TDE) is used to reduce performance loss.
After the implementation of a hospital, it not only met the requirements of the third level of classified insurance, but also controlled the system response time within 0.5 seconds.
5.3.2 Transmission Encryption
- Inter-system transmission: Use the hospital’s private VPN channel, and attach a check code (generated based on data content + timestamp) to each batch of data, and write it to the database after the receiver verifies to prevent data tampering during the transmission process.
- External access (such as patient APP query report): HTTPS + dynamic token double encryption – every time a patient logs in, the system generates a one-time token (valid for 15 minutes), which is jointly verified with the account password; When querying reports, the patient’s device fingerprint (mobile phone IMEI code + APP installation ID) is encrypted twice before data transmission to prevent data leakage after the account is stolen.