Swimming in the Data Lake

Swimming in the Data Lake


medicinehealthcaretech

This is part of the series from What Tech Bros Don’t Tell You about Healthcare. See the world from the view of The Contrarian.

Disclaimer: This does not represent the views of AWS or my past jobs.

What is a data lake?

It is a place where the Loch Ness Monster goes to die. There is no actual water to swim in. Humans are curious creatures. We like to amass giant amounts of data thinking that all of it can be used one day. Maybe there is a little bit of acorn woodpecker in all of us. The real value of a data lake is when you know what problems you are trying to solve as a healthcare organization. What use cases are you trying to achieve? Any type of fancy tech is just a tool after all. Just like surgical residents won’t confuse an Iris scissor with a Metzenbaum scissor. Although they are all scissors, they do very different things. Don’t buy a data lake because you think it is like a swimming pool to be installed in the backyard. Ah how great would it be if I can swim in a data lake or pool. I am pretty sure no one thinks of it like this but I like to imagine it like this. First of all, the plumbing like a real pool takes a lot of effort to be put in. Second of all, a pool/ lake is not without maintenance effort. You still gotta pay a pool boy or pool robots (they do have those now) to clean the pool.

The weird boast of “we got data”

It is weird that tech bros feel a sense of accomplishment when they see the amount of data sitting in their database. “I got 5 years worth of data.” Newsflash… what are you going to do with it? Most of the clinical notes that house officers take are completely useless when it comes to training AI/ML models. I can guarantee you that emeritus professors (no matter how amazing they are as doctors) are also not contributing to the high quality EMR data pool. Doctors literally type 3 words after an outpatient consult: “Patient well. Discharged.” The worst part was that the 5 years worth of data did not have a single identifier for us to match across time to paint a picture of what the patient is like. They are just transactional data! I bet you no tech bro can tell you that. If you talk to a data scientist, he or she will tell you that garbage in = garbage out. A better way to look at the mountains of data is to imagine yourself sitting in a room that is floor to ceiling full of trash in a hoarder’s house. Most of it is garbage with some gems hidden but it takes you so much effort to get to the gems that by the time you are done… you wonder why you even bothered at all. This is the constant agony my data scientist friends experienced when they churn through EMR data.

Noise noise everywhere

I know that most of the notes “data” that we have in the EMR are just pure noise because I have spent 30 min scrolling through copy and pasted clinical notes looking for an answer as to why a patient has been admitted for longer than 6 months after we took over the patient. A single line of “family not keen to bring the mother home” does not give you any answers after all the medical issues were sorted out. After multiple phone calls to the medical social worker, who happened to have an excellent memory, I was able to get the answer that I wanted which was a sibling feud plus lots of FUD (fear, uncertainty, doubt). The crappy documentation of the patients notes meant that everyone has deemed the patient a “stone” that is not movable and just waited for the next team to take over. I was determined to be the “stone” mover and did in fact managed to move that “stone” who is a real human being and waiting to go home. She thought her family abandoned her. She was so happy wearing her own clothes on a wheelchair waiting in the discharge lounge for her kids to pick her up. On a side note, people don’t go home for all sorts of reasons. I had a patient that lost his national identification card and was ergo deemed not dischargeable by the nursing sister of the ward. It is funny how you don’t need an ID card to come into the hospital but you need one to go home. I called the Immigration & Checkpoints Authority hotline so many times that I used to remember the number off the top of my head. After listening to the on hold music day after day, I finally got an officer who showed up with a camera to take a picture of my poor patient who have been chilling in the hospital. He didn’t care if he did not have an ID card. He said he is Singaporean. What a funny conundrum to be in.

The extreme

Ok by now, you are like, “If the issue is with unstructured data that are typed in a text box, we should just make everyone fill out forms that are structured.” I cannot tell you the number of times that we pulled out data for a “critical” field and found a single period inside. I have seen a clinical document entry that has 30 form fields. I am pretty sure no one fills it all out, which is even worse because there might be something you need to know about the patient that goes missing simply because people have no idea how to fill out this “structured” form that you created. My medical officers (MOs) used to stay behind to type in cases in the cancer registry that is, again, a horrible form that is used by the department to collect data. Sure, the data is useful from a population health perspective but people literally stayed behind until 11PM before tumor board to get the data entry done after a whole day of clinic or OT. We are slaves to unruly forms that have never ending next pages.

Technology cannot solve human behavior

Having a new EMR system will not solve how people document in the hospitals. It always come back to the humans. Do you incentivize them to have proper documentation? Is there enough time to document things? Do they see the bigger picture that they play in the long-term care for the patient? If every house officer only sees his or her documentation as a to do task and not contributing to the longitudinal care of the patient’s inpatient acute episode, then no matter how fancy the tech we introduce, we will never be able to deliver better care for patients. We just don’t have the data to do more.

So the next time when someone says, “We have so much unstructured data… there must be something we can do with it”, you know that is a statement to be unpacked. The journey to leverage data well in any organization isn’t building a data lake or warehouse or pool and call it a day. You should know what you want to do with the data first and collect/curate them accordingly. You should know what use cases you want to tackle and not just buy the latest infinity pool.  

© 2024 Petty Chen