Summary

The auto-population of data fields is a key benefit of electronic medical records over paper.

However, there are two issues that need to be addressed in order to minimize error and reduce the chance that garbage data is created.

First, a system is needed where users have an opportunity to confirm if the auto-populated data remains ‘true’ or has changed.

Second, the auto-populated data must be stored in a way where it is tagged different from new data.

But first, a comment on the benefit of auto-populated data fields.

1. Benefits of auto-populated data fields

1.1 - Benefit 1 - more form data

Forms that use auto-populated data can be built with more questions and even more required fields, without burdening the user. For instance,

eg) Consider a cardiac electrophysiology department that would like to have a pre-operative pacemaker implantation questionnaire for the clinician to complete.

Paper: If this was a paper form, perhaps the ‘maximum tolerable length’ for the clinician to complete is 20 questions.

Digital: If the form is digital, the data fields can auto-populate from past data in the patient’s chart. This means that if the clinic wants to, they could keep completion of that form at the same ‘maximum tolerable length’ as the paper form.

For instance, let’s pretend that approximately 80% of the questions auto-populate from past data 95% of the time. This means that on average a form that is almost 100 questions long could be used in the same time and as a form with 20 questions.

Said another way, if a digital form has 100 questions, and 80% of them auto-populate the majority of the time, then most of the time the user only has to complete 20 questions. This has the benefit of collecting more standardized data, without burdening the clinician.

1.2 - Benefit 2 - faster workflow

A clinic may choose instead to keep the form at 20 questions, but use the benefit of data auto-population to speed up the time to complete the form.

1.3 Benefits - In summary

Experimentation is required by the clinic and forms team to understand the auto-population rates for their forms, which questions are more/less likely to auto-populate, and find the right balance.

Perhaps the clinic/hospital can also anticipate ahead of time which patient’s lack most of the required data to auto-populate a clinical encounter, and book a longer appointment for those patients.

2. Is the auto-populated data true?

There are two major approaches to data auto-population.

First - data by default is auto-populated

Second - a button / icon beside the field is ‘activated’ when auto-populated data is available. The clinician presses that button to fill in the field.

The issue with the second option is that it is more time consuming. The clinician has to press the ‘auto-populate’ button beside the question to ‘insert’ the auto-populated data. They then have to read what was inserted. Determine if it is correct. And then delete it if it is incorrect.

A better option is the first, where form data is auto-populated.

Now. Here is the important part.

1) ‘Auto-populated data’ must be visually different on the form from ‘new’ or ‘confirmed’ data. The clinician must be easily able to tell what data they entered, versus what data comes from the past.

2) The clinician must be able to sort auto-populated data into three different buckets.

2a. Data that is incorrect. In this case, the clinician directly asked the patient the question, or knows the answer from another source, and chooses the change the answer to a field that was previously auto-populated.

2b. Data that is correct. In this case, the clinician directly asked this question / or knows it is correct, and they confirm that the auto-populated data field is still corrected.

2c. Data that is not sorted. In this case, the clinician neither acknowledges that the auto-populated data is correct nor incorrect.

An example of this is below. In this case the slider shows ‘neutral’. And the data is a blue rectangular button. All this means, in this example, that the data is auto-populated and remains unverified.

If the user were to click on the auto-populated data to select the same field. Or click on the ‘plus’ to confirm this data. The system changes to show that this data is ‘correct’ and was ‘verified’.

If the user had chosen an alternative answer, or entered a new answer, for a field that was auto-populated the slider would move to (-) to show that a auto-populated field was incorrect. (The user can re-populate this field if they want by clicking on the neutral zone. In theory a user could also ‘delete’ a piece of auto-populated data by clicking on the (-) sign.

(note: these examples are just to illustrate the idea. I don’t actually like the UI here at all.)

2.2 When may a clinician choose to not verify a piece of auto-populated data?

eg.1) Consider if a clinician completes the same form every follow up visit. They may choose to only address and update some of the questions on that form at each visit. The data continues to populate forward from past examples, in order to make it easier to use the form at future visits.

eg. 2) Consider a form with 50 question, of which 10 of them are required fields.

For those 10 required fields: the clinician has to decide if the data is correct or incorrect before the form can be completed.

The purpose of the other 40 fields is to help facilitate making it easier for the next user receiving the form. These fields aggregate past data from the patient’s chart.

Perhaps the clinic’s policy is that all the required fields must have their auto-populated data verified before the form can be completed, and then the clinician should spend up to 10 minutes filling in gaps in the remaining 40 questions - of which some will have been auto-populated - and some may not have had the data available to auto-populate.

In this case, the clinician does not have the ability to personally verify the validity of each auto-populated piece of data on the 50 question form; however, they can verify the crucial data. And then spend the rest of their time filling in the gaps where auto-population did not have data available.

In many ways this is how healthcare works on a practical level. Clinicians rely on historical patient data on a daily basis. This often involves flipping through old charts and reports to answer questions. Auto-population essentially automates this process, and makes it more transparent what data is carried forward, versus confirmed in that visit.

3. Store auto-populated data differently

3.1 Some ways to store auto-populated data

When the form is complete, it is saved. It is important that each data field has saved with it the ‘data’s origin’. For instance, is the data

New / entered in encounter
New / entered in the encounter to replace a previous auto-populated value that is no longer true
Auto-populated, and confirmed in the encounter
Auto-populated, and not-verified during the encounter

It is important to tag stored data this way because each of these 4 types of data have a different level of quality / evidence.

Take example 4 - auto-populated data that is not verified during the encounter:

A mentor of mine always reminded us that “The entire medical chart is full of lies. Trust no one” Even on paper records, something incorrect can be written down, and then carried forward by future authors in other notes for years to come. This is actually exceptionally common. (Which is why the chart is full of lies).

For this reason it is important to differentiate information that was explicitly asked and verified during the encounter, versus carried over from past encounters.

The examples above of how to tag auto-populated data are just early examples. I’m interested to meet someone who has thought about this issue longer, and has tried to implement this in real life, with what the most effective ways to tag the data are.

3.2 Auto-Populated Data & Machine learning

The other reason it is important to tag data that is auto-populated and unverified vs auto-populated and verified is to assist in determining real from garbage data.

This is particularly important as more algorithms and machine learning is used on the patient medical record. Techniques may comb a patient’s record looking for the frequency of a true fact. And if the fact is frequently recorded, the computer may assume it is true.

eg. A patient’s chart records on 3 occasions that the ‘patient has smoked in the past’, and then on 12 subsequent occasions records that the patient is a non smoker. (The questions has been arranged in this way, to eliminate having to deal with data temporality in assessment of its validity).

In this example, if we used only the frequency of that data being recorded, we would conclude that the correct answer is ‘the patient has never smoked’. The math says it is 12 to 2.

However, what if I told you that in fact the two ‘Yes’ responses were each independently verified data fields, and that of the 12 ‘No’ responses, only the first was a verified data field, and the other 11 were carried forward data via auto-population.

This is a very different situation. We have 2 verified data points vs 1 verified data point. This is a much different situation that having 12 verified data points in a row saying ‘no’.

In this article, I’m not proposing how to resolve this data discrepancy, or exactly how the computer will weigh the strength of the true answer to this question. But the conclusion we can draw, is that there is certainly a much lower level of certainty about the truth of if the patient smoked in the past, between these two examples.

eg 3) In a similar way, if the clinician deliberately deletes a past data field and replaces it with new piece of data, data scientists will have to determine how to handle this fact. Using frequency based approaches will not work, because the strength of that single ‘replaced data fact’ may ought-weigh all past data points. (Of course a system will need to be in place to ensure that the auto-populated data that was changes is in fact correct, and not changed in error).

4. What is next?

Levels of data quality - the end of this post beings the discussion around analyzing the levels of truthfulness of data in the clinical chart. We will continue this discussion in the next post.

Clinic Notes - This article focuses on the use of auto-populated data for questionnaires and forms. A concern that many have, is how to address the issue of copy and paste data in clinical notes. That is a topic for another day, and the solution lies in changing the structure of clinical documentation…

Manage auto-populated EHR data separately