The Origin
My path into research began in a community field trip in Yangon, not a laboratory. As a dental house officer, I spent long days helping with oral health surveys, measuring pocket depths at six sites per tooth, across all 28 teeth, for every patient. By the end of each session, the team was physically exhausted and still short of the target sample size. It was clear that a smarter way to prioritise candidates existed; we simply did not have the tools to find it. The research community in Myanmar at the time was small and not well-resourced. Methodologies were rarely part of the standard curriculum, and most of what I learned came from grassroots projects and occasional training sessions led by local scholars with overseas experience. We were a tight-knit group, united by shared goals and shared constraints. On graduation, a professor recommended that I seek training abroad. I took that advice and moved to Thailand.
The Learning Curve
My first semester at Mahidol was disorienting in ways I had not anticipated. Most of my peers arrived with stronger quantitative backgrounds and considerably more research experience; I arrived having barely encountered formal statistics. The programme was heavily quantitative from the start, and those first months involved a great deal of catching up: long nights, frequent searches for mathematical notation I should have already known, and a recurring sense that I was several steps behind where I needed to be. In the pre-ChatGPT era, Stack Exchange and online forums were the closest thing to a reliable lifeline, and I used them constantly. The learning curve was steep, but it forced a shift in focus: from what I did not know to what I could figure out through sustained effort. Working through problems rather than around them, debugging errors, and tracing failures back to their source proved more instructive than any single course. I graduated ahead of my cohort and left with something harder to teach than any technique: a genuine tolerance for uncertainty.
The Realisation
My thesis introduced me to secondary data research through a prospective cardiovascular cohort. The aim was to predict the risk of severe periodontitis from existing clinical records, rather than conducting new and resource-intensive examinations. What stayed with me was not the specific models but the broader realisation: routinely collected clinical data, handled carefully, can answer questions that would otherwise require enormous resources to study from scratch. That idea has shaped my work ever since. The years following graduation have been spent building and maintaining large longitudinal cohorts from electronic medical records, engineering raw extracts into analysis-ready datasets, and thinking carefully about what it actually takes for real-world data to be trustworthy enough to draw conclusions from.
A fish is bound to the water; a bird is bound to the sky. The duck belongs to both, and the land between them. In the intersection of clinical logic and data science, being ‘between’ isn’t a compromise—it’s the only way to see the whole landscape.
The Intersection
One moment that crystallised this for me was serving as a mentor at the Thailand Health AI Datathon. I found myself working alongside clinicians who had pressing research questions but limited analytical skills, and data specialists who had the tools but needed help understanding the clinical logic behind the data. My background sat at exactly that intersection. Teams using the hypertension database I had curated went on to win the grand prize and the runner-up prize. It was a small moment, but it illustrated something important: the most useful position in this field is not always the one with the deepest specialisation, but the one that can hold the space where different kinds of expertise need to meet.
The Why
The deeper I go into real-world data research, the more I find to think carefully about. Electronic health records capture only what the system sees. They miss episodes at other providers, referrals outside the network, and patients who moved, changed provider, or simply could not make it back. In settings without health information exchanges, these temporal gaps distort exposure-outcome relationships, introduce censoring bias in survival analyses, and undermine the calibration of prediction models when applied beyond the narrow group of patients with complete records. My current research focuses on characterising and adjusting for this kind of continuity loss, with the goal of developing methods that remain valid even in fragmented data environments. The motivation is grounded in something specific: the settings where real-world evidence could make the greatest difference, including settings like where I am from, are precisely those where records are most fragmented and interoperability is least available.
Making these methods work there is not only a technical problem; it is an equity one.