Arseniy Yashkin
Research Scientist, Senior

I am primarily a health outcomes researcher who specializes in cancers and chronic age-related diseases, especially Alzheimer’s disease and type II diabetes mellitus.  However, I also write in epidemiology, demography, health economics and genetics.  I am a specialist in the analysis of administrative big health  data.   My main contributions to scholarship can be summarized across three focus areas: health outcomes research, epidemiology and methodology, and health economics.  Some of my most important findings are described below.

Current Research Interests

1.1. Health Outcomes Research.  My interest in health outcomes began with the study of vision.  These early studies, done right after coming to Duke, focused on the safety concerns associated with anti-vascular endothelial growth factor (VEGF) eye-injections(1,2) – a highly effective treatment for age-related macular degeneration.  At the time (and to an extent still, as similar studies still appear from time to time) there has been a persistent concern that anti-VEGF injections increase the probably of acute myocardial infarction and stroke and therefore such injections may be contraindicated in persons at high risk for these adverse outcomes.  Refusing anti-VEGF injections, however, is likely to result in low vision and eventual blindness in most affected individuals.  Fortunately, we found that these injections were not associated with increased risk of myocardial infarction or stroke (2).  I still consider this to be the most important work done by me to-date.  This is because we were able to take the pre-print and immediately show it to real ophthalmology patients at the Duke Medical Center to allay their fears about getting this highly important treatment.  This study has also been cited by follow-up research in The Lancet (3), JAMA Opthalmology (4) and others(5).  

This early work gave me a good sense of the opportunities made available by large health databases such as Medicare Administrative claims (6).  Specifically, a much larger range of identifiable health conditions than even the largest survey, extremely long follow-up times often measured in decades, large sample sizes, national representability and health status updates that occur as often as a medical service is billed for.  Although an excellent source of data on individual and population health, such data is limited by its inability to account for the effects of health-related behaviors (e.g. smoking, alcohol consumption, exercise).  Nevertheless, we found that it was possible to identify the effects of certain discrete health-related behaviors in the data.  By design, such behavior patterns are both public-health-relevant and easily targetable by public-health interventions.  Specifically, we looked for evidence of adherence to disease management guidelines for chronic diseases, most notably diabetes mellitus type II (T2D).  Unsurprisingly we found that adherence to T2D screening guidelines and regular prescribed medication use reduces the risk of death and many known complications of T2D (7,8).  It was cited in a recent literature review on the subject of non-adherence in the British Journal of Medical Pharmacology (9)   (and the lead authors’ subsequent doctoral thesis) (10).  Another study from this research track found that low levels of adherence to T2D disease management guidelines was associated with increased risk of AD onset (11).  This article is cited in a literature review on AD (12) and in an article on the identification of AD from Medicare claims (13), both in Alzheimer’s and Dementia.  More importantly, this article led me to start to focus on Alzheimer’s disease and would eventually, in concert with other findings, result in several applications for external funding.

Finally, work on adherence led me to study the effects of polypharmacy – a problem increasingly relevant among older adults as levels of multimorbidity rise (14).  This article received a response in the same issue of the journal it was published in (15), has since been listed in a literature review (16), and was actively used in a doctoral thesis (17).   

1.2. Epidemiology and Methods.  Over the course of my career, I have had the privilege to collaborate on the development of new statistical methodologies.   This type of work, although time and effort intensive, has the potential to lead to qualitatively new findings without requiring costly new data collection.  The first of such articles was published in the International Economic Review – one of the leading journals for presenting econometric methodologies  (18).  Although mathematically intensive the model introduced by us has already seen use (19,20).   There are three  key features in this model which make it relevant to the study of health in older adults and especially for AD. 

First is the partial observability of disease onset.  Historically, AD has been notoriously difficult to diagnose especially at early stages when its symptoms may not be evident; although there is no validated treatment to reverse the neurodegeneration associated with AD, early diagnosis can allow better management of co-existing risk-related conditions such as type II diabetes mellitus (11), and therefore improve post-onset longevity.  Our model accounts for this by empirically modelling, through integration, all potential times of onset between the last time an individual was observed healthy, and the first time an individual was diagnosed with the condition and produces estimates of the effects associated with the time an individual spends with AD prior to it being diagnosed.  This allows us to split our model of disease duration into unobserved and observed duration (each with its own possible specification).  Note that without accounting for partial observability the effect of early diagnosis can-not be distinguished from the effects of lead-time bias.

Second are the effects of endogeneity in the choice of seeking medical care (which leads to a possible observation of disease onset, progression, and/or an adverse health shock) and unobserved heterogeneity (disease onset and progression could differ by unobserved characteristics of the at-risk population or the disease itself).  These problems are addressed by the use of discrete factor models.  Briefly, for each model event (e.g. probability of progression through disease stages, screening/treatment episode, adverse health shock, etc.), we use polynomials to model the associated discrete heterogeneity distribution.  This allows for non-proportional relationships between the heterogeneity terms associated with each model component.  The optimal number of heterogeneity points of support (each corresponding to a population “type”) is identified through exploratory analysis and is limited by available computing power, and model identification.  Note that modeling of the type we are undertaking is itself an aide in identification. Since the model is dynamic we can exploit additional exclusion restrictions not available in a static single index model. Moreover, the predictive power of some instruments is likely to be greater.

Third is the issue of the simultaneous occurrence of adverse health shocks outside of the natural progression of the disease; this could be the onset of a major risk-related event which alters the usual progression of the disease (e.g. a traumatic brain injury to a previously non-diagnosed individual could engender a diagnosis of AD; a diagnosis of diabetes mellitus introduces the need for the management of this chronic disease to avoid worse AD outcomes down the road, etc.) or a meaningful censoring event not modeled in the main progression (e.g. death, long-term-care nursing home admission, need for home health care, etc).  Our method models such health shocks simultaneously with the main progression.   

I have also worked together with Dr. Akushevich, one of my colleagues, on developing a new partitioning approach to quantify the time patterns and health disparities of disease prevalence and mortality that allows the analysis of their time trends and any associated health disparities in terms of the relative contributions of other epidemiological measures with clear interpretation, e.g., incidence, survival, disease severity at diagnosis, etc. (21).  This method has been used extensively by me and our research group to analyze the historic epidemiological causes of trends in prevalence and survival and stage-at-diagnosis of diseases such as AD, T2D and multiple cancers (21-25).  This method currently forms the methodological core of National Institute on Aging Grant No. R01AG066133: Racial and Geographic Disparities in Risk and Survival of Alzheimer's Disease and Related Dementias (PI: Akushevich).  Work on this method began after a discussion of new ways to identify the causes of epidemiological trends in T2D observed in Medicare data.  One option discussed was to use Oaxaca-Blinder decomposition (7).  This article was well received by the community (26-29) and would later be extended by us for application to time-series data with censoring; this manuscript has been published in the American Journal of Hypertension, with myself and Dr. Akushevich as co-furst-authors (Akushevich I., Kolpakov S., Yashkin A. P., Kravchenko, J. (2022): Vulnerability to hypertension is a major determinant of racial disparities in Alzheimer’s disease risk. American journal of hypertension35(8), 745-751.)

Overall, the idea that the utility of existing sources of data can be greatly expanded by either the development of new methodologies or creative applications of existing methodologies commonly used in other fields, but not common to epidemiology, health outcomes research, etc. would later find its realization in a series of NIA-sponsored workshops held collaboratively by our research group (5R13AG069381-02 PI: Akushevich).  The latter regularly sees attendance of upwards of 200 people from a wide range of high-profile institutions and is associated with a thematic seminar at the yearly meeting of the Gerontological Association of America.  

1.3. Health Economics.  I am highly interested in the long-term costs in terms of both direct expenditures by the Medicare program and patient burden represented by the frequency of the utilization of related health services of common high-impact conditions such as cancer and Alzheimer’s disease.  I have published a series of such papers on breast (30__) and bladder cancer (31__).  I have also written an R03 application to extend such analysis to the case of AD.  The bladder cancer article, published in European Urology Oncology in 2020, already has 37 citations from as early as 2020.  Our use of SEER-Medicare data allowed us to stratify our estimates by cancer severity at time of diagnosis, and then monitor and compare the type and intensity of use of medical services and associated costs for over 20 years of follow-up while accounting for differences in baseline morbidity an loss to mortality over follow-up (31,32).  The breast cancer article was featured on release at JCO Oncology.

I studied the differences in mortality between older women age 65+ diagnosed with ductal carcinoma in situ (a very early form of cancer, usually harmless but with the potential to progress) and assigned guideline concordant care (an expensive and physically taxing regime of surgery, chemotherapy, and other active cancer treatments) and active surveillance (regular surveillance of a non-malignant tumor until such time as treatment becomes medically necessary).  The concern being that for such an elderly subset of the population characterized by high pre-existing morbidity, aggressive cancer treatment for a tumor that may never actually progress to a malignant cancer could be unwarranted both from a perspective of personal utility and healthcare expenditures.  We found, that guideline concordant care could be delayed for up to a year since diagnosis of ductal carcinoma in situ without any adverse effect on mortality (33__ ).

During my work on the multiple stage duration model with partial observability (18__), in addition to the methodologic benefits discussed in Section 1.2, we also made a practical contribution to health economics.  We investigated how early detection of T2D can delay the onset of lower extremity complications and death.  We allowed for partial observability of the disease stage, unmeasured heterogeneity, and endogenous timing of diabetes screening.  Later detection of T2D was found to be associated with significantly worse health outcomes and earlier death.  We then evaluated the effectiveness of two potential policies on mandatory T2D screening to reduce the monetary costs of frequent screening in terms of lost longevity. Compared to the status quo, the more restrictive policy yields an implicit value for an additional year of life of about $50,000, whereas the less restrictive policy implies a value of about $120,000.

Finally, I designed a Time Tradeoff exercise designed to obtain a utility-based QUALY measure for female breast cancer patients.  The survey was successfully administered, and the data processed as part of work on Patient-Centered Outcomes Research Institute Grant No. CER-1503-29572: Comparing the Effectiveness of Guideline-Concordant Care to Active Surveillance for DCIS: An Observational Study.  The resulting manuscript is still in progress.

 1.4. Future Directions.  In the immediate future I plan to concentrate on further developing my study of Alzheimer’s Disease (AD) using big health data.  AD is a highly heterogeneous (34) and difficult to diagnose condition that occurs primarily in older adults.  From a researcher’s perspective this brings both benefits and costs.  The benefit is that older adults age 65+ in the U.S. become eligible for the Medicare health insurance system.  This both equalizes (to a very large extent) the population in terms of access to health insurance (though ability to pay is still a consideration due to co-payments, premiums and benefit limits) and provides a uniform way to track an individual’s health status through administrative health insurance services paid for by the Medicare system.  This covers over 98% of the elderly population and supplementary data in a similar format are available from state Medicaid programs, the Veteran’s Affairs health system, etc.  My lecture on the subject at a recent collaborative R13 NIA-sponsored workshop summarizes the pros and cons of such datasets (; also see: Akushevich I., Kravchenko J., Yashkin A.P., Doraiswamy P. M., Hill C. V., Alzheimer's Disease and Related Dementia Health Disparities Collaborative Group (2023): Expanding the scope of health disparities research in Alzheimer's disease and related dementias: Recommendations from the “Leveraging Existing Data and Analytic Methods for Health Disparities Research Related to Aging and Alzheimer's Disease and Related Dementias” Workshop Series. Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring, 15(1), e12415.).  Use of CMS data allows us to bring in other important sources of health information.  Most relevant here are the regular individual assessments mandatory for any long-term care facility eligible to receive Medicare/Medicaid funding.  Since many older adults with AD move to a nursing home as the disease progresses this data linkage provides access to an expanse of new information for an important subset of older adults.  The digitized results of these assessments are known as the Minimum Dataset and are fully linkable to Medicare Administrative Claims records at an individual level.  This data is discussed by me at the second (and third) iteration of our R13 workshop (  In their totality, the range of administrative data currently available allows us to track an elderly individual from Medicare eligibility at age 65 through all medical care paid for by Medicare/Medicaid, into and out of both short-term and long-term stays in skilled nursing facilities, into and out of home health services (using OASIS, a dataset similar in concept to the Minimum Dataset) and onto death (with the official death certificate information also linkable to the base data).  This opens a range of opportunities in studying AD, cognition, health disparities and a range of other important health outcomes.  Especially since nursing home evaluation data are a highly understudied resource with longitudinal studies exceptionally rare.


1.            Hahn P, Yashkin AP, Sloan FA. Effect of Prior Anti–VEGF Injections on the Risk of Retained Lens Fragments and Endophthalmitis after Cataract Surgery in the Elderly. Ophthalmology. 2016;123(2):309-315.

2.            Yashkin AP, Hahn P, Sloan FA. Introducing anti-vascular endothelial growth factor therapies for AMD did not raise risk of myocardial infarction, stroke, and death. Ophthalmology. 2016;123(10):2225-2231.

3.            Mitchell P, Liew G, Gopinath B, Wong TY. Age-related macular degeneration. The Lancet. 2018;392(10153):1147-1159.

4.            Dalvin LA, Starr MR, AbouChehade JE, et al. Association of intravitreal anti–vascular endothelial growth factor therapy with risk of stroke, myocardial infarction, and death in patients with exudative age-related macular degeneration. JAMA ophthalmology. 2019;137(5):483-490.

5.            Reibaldi M, Fallico M, Avitabile T, et al. Frequency of intravitreal anti-vascular endothelial growth factor injections and risk of death: a systematic review with meta-analysis. Ophthalmology Retina. 2021.

6.            Akushevich I, Kravchenko J, Yashkin AP, Yashin AI. Time trends in the prevalence of cancer and non-cancer diseases among older US adults: Medicare-based analysis. Experimental Gerontology. 2018.

7.            Yashkin AP, Picone G, Sloan F. Causes of the Change in the Rates of Mortality and Severe Complications of Diabetes Mellitus: 1992–2012. Medical care. 2015;53(3):268-275.

8.            Yashkin AP, Sloan F. Adherence to Guidelines for Screening and Medication Use: Mortality and Onset of Major Macrovascular Complications in Elderly Persons With Diabetes Mellitus. Journal of aging and health. 2018;30(4):503-520.

9.            Walsh CA, Cahir C, Tecklenborg S, Byrne C, Culbertson MA, Bennett KE. The association between medication non‐adherence and adverse health outcomes in ageing populations: A systematic review and meta‐analysis. British journal of clinical pharmacology. 2019;85(11):2464-2478.

10.         Walsh C. The association between medication adherence across multiple medications and health outcomes in ageing populations, Royal College of Surgeons in Ireland; 2021.

11.         Yashkin AP, Akushevich I, Ukraintseva S, Yashin A. The effect of adherence to screening guidelines on the risk of Alzheimer’s disease in elderly individuals newly diagnosed with type 2 diabetes mellitus. Gerontology and Geriatric Medicine. 2018;4:2333721418811201.

12.         Finch CE, Kulminski AM. The Alzheimer's disease exposome. Alzheimer's & Dementia. 2019;15(9):1123-1132.

13.         Jain S, Rosenbaum PR, Reiter JG, et al. Using Medicare claims in identifying Alzheimer's disease and related dementias. Alzheimer's & Dementia. 2021;17(3):515-524.

14.         Yashkin AP, Kravchenko J, Yashin AI, Sloan F. Mortality and macrovascular risk in elderly with hypertension and diabetes: effect of intensive drug therapy. American journal of hypertension. 2017;31(2):220-227.

15.         Safar ME, Slama G, Blacher J. Concomitant hypertension and diabetes: role of aortic stiffness and glycemic management. American Journal of Hypertension. 2018;31(2):169-171.

16.         Labib A-M, Martins AP, Raposo JF, Torre C. The association between polypharmacy and adverse health consequences in elderly type 2 diabetes mellitus patients; a systematic review and meta-analysis. Diabetes Research and Clinical Practice. 2019;155:107804.

17.         Huang Y-T. Polypharmacy in older adults–prevalence, risk factors, and associations with mortality–and the role of diabetes, UCL (University College London); 2022.

18.         Mroz TA, Picone G, Sloan F, Yashkin AP. Screening for a chronic disease: A multiple stage duration model with partial observability. International economic review. 2016;57(3):915-934.

19.         Cockx B, Picchio M, Baert S. Modeling the effects of grade retention in high school. Journal of Applied Econometrics. 2019;34(3):403-424.

20.         Ward S, Williams J, van Ours JC. Delinquency, arrest and early school leaving. Oxford Bulletin of Economics and Statistics. 2021;83(2):411-436.

21.         Akushevich I, Yashkin A, Kravchenko J, et al. Theory of partitioning of disease prevalence and mortality in observational data. Theoretical population biology. 2017;114:117-127.

22.         Akushevich I, Kravchenko J, Yashkin AP, Fang F, Yashin AI. Partitioning of time trends in prevalence and mortality of lung cancer. Statistics in medicine. 2019;38(17):3184-3203.

23.         Akushevich I, Yashkin AP, Inman BA, Sloan F. Partitioning of time trends in prevalence and mortality of bladder cancer in the United States. Annals of epidemiology. 2020;47:25-29.

24.         Akushevich I, Yashkin AP, Kravchenko J, et al. Identifying the causes of the changes in the prevalence patterns of diabetes in older US adults: A new trend partitioning approach. Journal of Diabetes and its Complications. 2018;32(4):362-367.

25.         Akushevich I, Yashkin AP, Kravchenko J, Yashin AI. Analysis of Time Trends in Alzheimer’s Disease and Related Dementias Using Partitioning Approach. Journal of Alzheimer's Disease. 2021;82(3):1277-1289.

26.         Harding JL, Pavkov ME, Magliano DJ, Shaw JE, Gregg EW. Global trends in diabetes complications: a review of current evidence. Diabetologia. 2019;62(1):3-16.

27.         Thomas MC, Cooper ME, Zimmet P. Changing epidemiology of type 2 diabetes mellitus and associated chronic kidney disease. Nature Reviews Nephrology. 2016;12(2):73-81.

28.         Gregg EW, Sattar N, Ali MK. The changing face of diabetes complications. The lancet Diabetes & endocrinology. 2016;4(6):537-547.

29.         Gregg EW. The changing tides of the type 2 diabetes epidemic—smooth sailing or troubled waters ahead? Kelly West Award Lecture 2016. Diabetes care. 2017;40(10):1289-1297.

30.         Yashkin AP, Greenup RA, Gorbunova G, Akushevich I, Oeffinger KC, Hwang ES. Outcomes and Costs for Women After Breast Cancer: Preparing for Improved Survivorship of Medicare Beneficiaries. JCO Oncology Practice. 2021;17(4):e469-e478.

31.         Sloan FA, Yashkin AP, Akushevich I, Inman BA. The Cost to Medicare of Bladder Cancer Care. European Urology Oncology. 2019.

32.         Sloan FA, Yashkin AP, Akushevich I, Inman BA. Longitudinal patterns of cost and utilization of medicare beneficiaries with bladder cancer. Paper presented at: Urologic Oncology: Seminars and Original Investigations2020.

33.         Akushevich I, Yashkin AP, Greenup RA, Hwang ES. A medicare-based comparative mortality analysis of active surveillance in older women with DCIS. NPJ breast cancer. 2020;6(1):1-8.

34.         Yashin AI, Fang F, Kovtun M, et al. Hidden heterogeneity in Alzheimer's disease: insights from genetic association studies and other analyses. Experimental gerontology. 2018;107:148-160.

35.         Akushevich I, Yashkin AP, Kravchenko J, Kertai MD. Chemotherapy and the RISK of Alzheimer's disease in colorectal cancer survivors: evidence from the medicare system. JCO oncology practice. 2021:OP. 20.00729.

36.         Akushevich I, Yashkin AP, Kravchenko J, Ukraintseva S, Stallard E, Yashin AI. Time Trends in the Prevalence of Neurocognitive Disorders and Cognitive Impairment in the United States: The Effects of Disease Severity and Improved Ascertainment. Journal of Alzheimer's Disease. 2018(Preprint):1-12.

37.         Ukraintseva S, Yashin A, Akushevich I, Arbeev K. Epidemiological trends may help clarify the role of infection in etiology of Alzheimer’s disease. Journal of Alzheimer's Disease. Journal of Alzheimer's Disease; Letters to the Editor, on-line:  6/7/2016     2016.

Office Hours

by appointment only.

Current Appointments & Affiliations

Contact Information

Some information on this profile has been compiled automatically from Duke databases and external sources. (Our About page explains how this works.) If you see a problem with the information, please write to Scholars@Duke and let us know. We will reply promptly.