- Research
- Open access
- Published:
Unveiling new insights into migraine risk stratification using machine learning models of adjustable risk factors
The Journal of Headache and Pain volume 26, Article number: 103 (2025)
Abstract
Background
Migraine ranks as the second-leading cause of global neurological disability, affecting approximately 1.1 billion individuals worldwide with severe quality-of-life impairments. Although adjustable risk factors—including environmental exposures, sleep disturbances, and dietary patterns—are increasingly implicated in pathogenesis of migraine, their causal roles remain insufficiently characterized, and the integration of multimodal evidence lags behind epidemiological needs.
Methods
We developed a three-step analytical framework combining causal inference, predictive modeling, and burden projection to systematically evaluate modifiable factors associated with migraine. First, two-sample mendelian randomization (MR) assessed causality between five domains (metabolic profiles, body composition, cardiovascular markers, behavioral traits, and psychological states) and the risk of migraine. Second, we trained ensemble machine learning (ML) algorithms that incorporated these factors, with Shapley Additive exPlanations (SHAP) value analysis quantifying predictor importance. Finally, spatiotemporal burden mapping synthesized global incidence, prevalence, and disability-adjusted life years (DALYs) data to project region-specific risk and burden trajectories through 2050.
Results
MR analyses identified significant causal associations between multiple adjustable factors (including overweight, obesity class 2, type 2 diabetes [T2DM], hip circumference [HC], body mass index [BMI], myocardial infarction, and feeling miserable) and the risk of migraine (P < 0.05, FDR-q < 0.05). The Random Forest (RF)-based model achieved excellent discrimination (Area under receiver operating characteristic curve [AUROC] = 0.927), identifying gender, age, HC, waist circumference [WC], BMI, and systolic blood pressure [SBP] as the predictors. Burden mapping projected a global decline in migraine incidence by 2050, yet persistently high prevalence and DALYs burdens underscored the urgency of timely interventions to maximize health gains.
Conclusions
Integrating causal inference, predictive modeling, and burden projection, this study establishes hierarchical evidence for adjustable migraine determinants and translates findings into scalable prevention frameworks. These findings bridge the gap between biological mechanisms, clinical practice, and public health policy, providing a tripartite framework that harmonizes causal inference, individualized risk prediction, and global burden mapping for migraine prevention.
Introduction
Migraine, one of the most disabling neurological disorders globally, has emerged as a critical public health challenge in the 21st century. Epidemiological data reveal that it afflicts approximately 14% of the global population, with a threefold higher prevalence in women than men, underscoring its sex-specific pathophysiological characteristics [1]. Clinically, migraine is classified by the International Classification of Headache Disorders 3rd edition (ICHD-3) into distinct subtypes (including migraine with aura, chronic migraine, and vestibular migraine), and their heterogeneity reflects diverse neurovascular mechanisms that differentially impact disease progression and treatment response [2, 3]. Beyond its clinical toll, migraine incurs annual productivity losses exceeding $11 billion (USD) in high-income countries alone, underscoring an urgent need for targeted prevention strategies [4]. Despite advances in identifying multidimensional pathogenic factors, such as metabolic dysregulation and autonomic dysfunction, a systematic evaluation of causal hierarchies among adjustable factors—particularly those amenable to population-level interventions—remains absent, with fewer than 15% of existing studies integrating genetic and machine learning (ML) approaches [5, 6].
The metabolic, body composition, cardiovascular, behavioral, and psychological domains collectively encompass adjustable determinants of health and disease, and have often been targeted in the past as key areas for intervention. Migraine pathophysiology has been robustly associated with adjustable factors spanning sleep architecture disturbances, anthropometric variation, psychological distress (including anxiety, depression, and post-traumatic stress), and cardiovascular dysregulation [7,8,9]. Emerging evidence further implicates intricate networks involving metabolic-cardiovascular-behavioral interactions in migraine progression. For instance, obesity and diabetes may exacerbate migraine susceptibility through inflammatory pathways, while migraine-associated neuropeptides (e.g., calcitonin gene-related peptide) could impair glucose metabolism [10, 11]. Cardiovascular risks, notably the dose-response relationship between elevated diastolic blood pressure and migraine susceptibility in women, highlight hemodynamic alterations as potential mechanistic drivers [12]. Notably, childhood adversity, including physical abuse and peer victimization, elevates migraine risk by 2.3-fold, emphasizing the long-term consequences of environmental stressors during neurodevelopmentally sensitive periods [13,14,15]. While emerging Mendelian randomization (MR) studies have begun untangling adjustable risk factors for migraine, persistent inconsistencies across domains raise concerns about the directionality and biological plausibility of reported associations, particularly in metabolic and psychological traits [16,17,18,19]. These discrepancies hinder the prioritization of intervention targets.
ML, a paradigm enabling autonomous pattern recognition from complex datasets, has demonstrated transformative potential in predictive modeling and therapeutic innovation [20, 21]. Unlike conventional statistical methods, ML supports personalized risk stratification, predictive analytics, and dynamic intervention modeling [22, 23]. This study introduced a tripartite framework integrating causal inference, risk prediction, and burden mapping. First, MR was employed to elucidate causal effect of 23 candidate factors. Subsequently, ML algorithms quantified multidimensional risk contributions to develop individualized predictive models. Finally, we project migraine burden to evaluate the population-level impact of interventions, providing actionable evidence for global health policy.
Method
Study design
In MR studies, the validity of causal effect estimation relies on three core assumptions for single nucleotide polymorphisms (SNPs) serving as instrumental variables (IVs). First, selected IVs must exhibit robust genetic associations with the target exposure (P < 5 × 1008) to mitigate bias from weak instruments. Second, IVs must adhere to the exclusion restriction criterion, ensuring that SNPs influence the outcome exclusively through the target exposure, thereby eliminating pleiotropic or confounding effects. Third, IVs must remain independent of known confounding factors to preserve causal inference validity. These assumptions collectively underpin MR methodology and require rigorous evaluation through sensitivity analyses and horizontal pleiotropy tests.
This study employed a tripartite analytical framework, as illustrated in Fig. 1. First, a two-sample MR analysis was conducted to evaluate causal associations between adjustable risk factors and migraine risk across five domains: metabolic traits (fasting blood glucose, high-density lipoprotein cholesterol [HDL-C], low-density lipoprotein cholesterol [LDL-C], type 2 diabetes mellitus [T2DM]), body composition (obesity, obesity class 2, overweight, body mass index [BMI], waist-hip ratio [WHR], waist circumference [WC], hip circumference [HC]), cardiovascular health (diastolic blood pressure [DBP], systolic blood pressure [SBP], myocardial infarction), behavioral factors (smoking initiation, snoring, moderate-to-vigorous physical activity levels, vigorous physical activity), and psychological states (irritable mood, feelings of loneliness, misery, guilt, and nervousness). Subsequently, factors demonstrating significant causal associations with migraine were prioritized for predictive modeling. Eight ML algorithms were implemented, each subjected to hyperparameter tuning and 10-fold cross-validation to optimize performance robustness. Model predictive accuracy was quantified using the area under the receiver operating characteristic curve (AUROC), while Shapley additive explanations (SHAP) were applied to interpret feature importance in top-performing models. Finally, a Bayesian age-period-cohort (BAPC) model was utilized to project the global burden of migraine from 2022 to 2050, integrating demographic and epidemiological trends to estimate incidence, prevalence, and disability-adjusted life years (DALYs), which combine years lived with disability (YLDs) and years of life lost (YLLs) due to premature mortality.
The basic design of the study. A. The basic principles and framework of MR study. This study follows the 3 core assumptions of MR Design (correlation, independence and exclusivity), and uses IVW, WM, MR-Egger and weighted model methods as the main analysis methods for causal associations. B. Construct risk prediction model based on 8 ML methods. Step 1: identify key features; Step 2: data prepossess; Step 3: model construction; Step 4: model construction; Step 5: model interpretation. Abbreviations: IVW, Inverse Variance Weighted; WM, Weighted Median; MR_PRESSO, MR Pleiotropy RESidual Sum and Outlier test; MR, Mendelian Randomization; RF, Random Forest; GLM: Generalized Linear Model; KNN, K-Nearest Neighbor; SVM, Support Vector Machine; GBM, Gradient Boosting Machine; NNET, Neural Network; DT, Decision Tree; LASSO, Least Absolute Shrinkage And Selection Operator; SHAP, SHapley Additive exPlanations; AUROC, Area Under the Receiver Operating Characteristic Curve; ML, Machine Learning
Data sources and statistical analysis
MR analysis
Genetic instruments for 23 adjustable risk factors and self-reported migraine were obtained from the IEU Open Genome-Wide Association Study (GWAS) database (https://gwas.mrcieu.ac.uk/, accessed on 30 March 2025). The dataset was derived from the UK Biobank using Phenome-Wide Association Study (PheWAS)-derived variables through GWAS pipelines. All summary-level genetic data were sourced directly from IEU Open GWAS, and no additional ethical review was required for this publicly available aggregated data. Detailed data sources are provided in Table S1.
To estimate causal associations between adjustable risk factors and migraine, we selected IVs adhering to the three core assumptions of MR analysis (relevance, independence, and exclusion restriction). The following steps were implemented to minimize bias: Only IVs identified in European-ancestry populations were included, in order to reduce confounding due to population stratification [24]. Subsequently, based on the correlation hypothesis, a threshold of P < 5 × 10− 08 and the standard of linkage disequilibrium (LD) (r2 = 0.001, kb = 10,000) were used to screen IVs. A total of 14 SNPs for fasting blood glucose, 46 for HDL-C, 42 for LDL-C, 67 for T2DM, 6 for obesity, 11 for obesity class 2, 14 for overweight, 11 for BMI, 3 for WHR, 2 for WC, 52 for HC, 4 for DBP, 4 for SBP, 80 for myocardial infarction, 91 for smoking initiation, 3 for snoring, 19 for moderate to vigorous physical activity levels, 7 for vigorous physical activity, 38 for irritable mood, 7 for feeling lonely, 35 for feeling miserable, 13 for feeling guilty, and 35 for feeling nervous were identified from the genome-wide correlation SNPs. The F-statistic was used to assess the strength of included IVs, and the R2 statistic was used to assess phenotypic interpretability, calculated as follows [25]:
F = (R2× (n − k − 1)) / (k × (1 − R2));
R2 = 2 × ((1 − MAF) × MAF × beta),
n is the sample size, and k denotes the number of IVs. All selected SNPs met F-statistic > 10, indicating minimal weak instrument bias.
Inverse variance-weighted (IVW) and weighted median (WM) methods were jointly employed in the two-sample MR (TSMR) framework to evaluate causal associations, with concordant results (P < 0.05) considered statistically significant [26]. Under the assumption of no horizontal pleiotropy, IVW is recognized as the most robust estimator. To mitigate potential bias arising from invalid IVs, WM analysis was incorporated as a supplementary approach. The WM method demonstrates superior accuracy and reduced type I error rates when up to 50% of IVs are invalid, ensuring reliable causal inference [27]. For comprehensive validation, MR-Egger regression and weighted mode methods were implemented as sensitivity analyses, with directional consistency confirmed across all four approaches. Rigorous sensitivity analyses were conducted to assess result stability. Steiger directionality tests were applied to confirm that variance in adjustable factors SNPs explained significantly greater variance in exposure than in outcomes (all Steiger-P < 0.001), thus minimizing the possibility of reverse causation [28]. Horizontal pleiotropy was tested via Mendelian Randomization Pleiotropy Residual Sum and Outlier (MR-PRESSO), while heterogeneity among IVs was evaluated using Cochran’s Q test (P > 0.05 indicating absence of significant heterogeneity and pleiotropy) [29, 30]. Leave-one-out analysis was performed to identify potential bias from individual SNPs [31]. To address multiple testing concerns, false discovery rate (FDR) correction was applied to P-values, minimizing false-positive findings. Associations meeting both nominal significance (P < 0.05 in both IVW and WM analyses) and the FDR-adjusted threshold (FDR-q < 0.05) were classified as robust causal evidence.
Construction of ML-based prediction models
The development of ML models utilized data from the National Health and Nutrition Examination Survey (NHANES) spanning 1999–2004. The dataset encompassed baseline characteristics including gender, age, race, income, and marital status, along with five major health dimensions: metabolic indicators (glycohemoglobin [HbA1c], HDL-C, LDL-C, T2DM), body composition (obesity, overweight, BMI, WC, HC), cardiovascular health (hypertension, SBP, DBP, myocardial infarction), behavioral factors (smoking initiation, comparison of activities with peers), and psychological aspects (feelings guilty and nervous).
The dataset was randomly partitioned into training (70%) and testing (30%) sets. Due to the potential impact of complex high-dimensional data on ML algorithm performance, we employed the Boruta algorithm for feature selection prior to model construction. This method identifies important predictors by iteratively comparing the importance of original features with that of randomly generated shadow variables, thereby, enhancing model interpretability, preventing overfitting, and optimizing model efficiency [32]. Following feature selection, Synthetic Minority Over-sampling Technique (SMOTE) was applied to achieve a balanced distribution between case and control groups in the training set.
Eight distinct ML approaches were implemented: Random Forest (RF), Support Vector Machine (SVM), Generalized Linear Model (GLM), Gradient Boosting Machine (GBM), K-Nearest Neighbors (KNN), Neural Network (NNET), Least Absolute Shrinkage and Selection Operator (LASSO), and Decision Tree (DT). Each model was selected based on its unique advantages. RF demonstrated strong ensemble learning capabilities and resistance to overfitting; SVM excelled in high-dimensional space classification; GLM offered flexible response variable distribution assumptions; GBM achieved efficient prediction through iterative loss function optimization; KNN provided straightforward instance-based learning suitable for various tasks; NNET led in deep feature extraction and complex pattern recognition; LASSO implemented variable selection and model simplification through L1 regularization; and DT offered interpretable tree structures particularly effective in classification and regression tasks. Each ML model underwent hyperparameter tuning and ten-fold cross-validation to optimize performance and ensure model reliability. AUROC was employed to evaluate model predictive accuracy, with higher values indicating superior prediction capability (range: 0.5-1.0) [33]. Additionally, SHAP methodology was applied to elucidate the most effective prediction model. Based on game theory principles, SHAP allocated feature contributions to prediction outcomes, providing consistent and fair interpretations commonly used in explaining ML model predictions [32]. Finally, to increase the utility of the ML model, we developed a web platform embedding it.
Global mapping of migraine burden
We used the Global Health Data Exchange (GHDx) query tool (https://vizhub.healthdata.org/gbd-results/) to extract data related to the incidence, prevalence, and disease burden of migraine. The GBD 2021 provided estimates for 371 types of diseases and injury burdens across 204 countries and 811 regions from 1990 to 2021, using 95% uncertainty intervals (UIs) to reflect the range of certainty for an estimate.
The BAPC model combines the integrated nested Laplacian approximation (INLA) for posterior estimation (with 10,000 iterations) and Monte Carlo simulation for quantified uncertainty assessment to evaluate the global migraine trend across three epidemiological dimensions: (1) age effects quantifying biological risk progression, (2) period effects monitoring population-level environmental influences, and (3) cohort effects tracking birth-generation specific exposures [34]. Monte Carlo simulations generated 95% credible intervals that rigorously accounted for parameter uncertainty and Poisson-distributed stochastic noise. The framework enabled the simulation of policy interventions by modulating period effect parameters while maintaining demographic consistency through WHO-standardized population structures. All of the statistical analyses were conducted in R environment (version 4.3.1) and R packages are listed in Table S2.
Results
Causal relationships between adjustable risk factors and migraine
Based on the specified selection criteria, we identified 502 adjustable risk factor-related SNPs for analysis, encompassing 13 fasting blood glucose, 39 HDL-C, 35 LDL-C, 59 T2DM, 4 obesity, 7 obesity class 2, 9 overweight, 7 BMI, 2 WHR, 1 WC, 52 HC, 3 DBP, 4 SBP, 64 myocardial infarction, 78 smoking initiation, 2 snoring, 18 moderate to vigorous physical activity levels, 5 vigorous physical activity, 35 irritable mood, 6 feelings of loneliness, 31 feelings of misery, 10 feelings of guilt, and 27 feelings of nervousness. All variants demonstrated F-statistics > 10, indicating robust instrumental variables resistant to weak instrument bias (Table S3). The phenotypic variance explained by each instrument, is presented in Fig. 2A.
We investigated causal relationships between 23 adjustable risk factors and migraine using TSMR analysis. IVW method results with FDR correction revealed significant associations with decreased migraine risk for T2DM (OR = 0.998, 95% CI: 0.996, 0.999, P = 0.005, FDR-q = 0.016), obesity class 2 (OR = 0.995, 95% CI: 0.993, 0.998, P = 1.00 × 10− 04, FDR-q = 0.001), overweight (OR = 0.989, 95% CI: 0.985, 0.993, P = 2.11 × 10− 08, FDR-q = 4.64 × 10− 07), BMI (OR = 0.991, 95% CI: 0.985, 0.997, P = 3.13 × 10− 03, FDR-q = 0.014) and myocardial infarction (OR = 0.995, 95% CI: 0.993, 0.998, P = 6.17 × 10− 04, FDR-q = 0.003) (Fig. 2B & Table S4). Conversely, feeling miserable (OR = 1.017, 95% CI: 1.007, 1.028, P = 6.14 × 10− 04, FDR-q = 0.003) showed positive correlations with increased risk of migraine (Table S4). The WM method yielded consistent results, further supporting these associations (Table S4). No evidence of horizontal pleiotropy or heterogeneity was observed (P > 0.05) (Table S4). The Steiger directionality test showed that there is no reverse causality (Table S5).
Assessment of migraine prediction effect based on ML
In this study, the Boruta algorithm augmented with shadow features was employed to systematically identify significant predictors from an initial pool of candidate features (including gender, age, race, income, marital status, WC, HC, BMI, comparison of activity levels with peers, SBP, DBP, hypertension, glycohemoglobin, cardiovascular health level), the standard definition of each predictor is presented in Table S6. Following 99 iterations, the analysis identified five robust predictors: BMI, HC, WC, gender, and SBP. Age was included a priori based on established epidemiological evidence. Figure 3A illustrates the iterative selection process, where green boxes denote confirmed predictors and red boxes indicate rejected variables. Subsequently, a correlation analysis was conducted to examine the interrelationships among the variables (Fig. 3B).
Comprehensive ML Framework Integrating Boruta Feature Selection, Correlation Analysis, and Model Performance Evaluation with AUROC and Residual Distribution. A. Important feature variable filtering based on Boruta algorithm. B. Assessment of feature correlations between important variables. C. The AUROC curve of the respective residual distribution. D. The AUROC curve of the respective residual distribution. Abbreviations: DBP, Diastolic Blood Pressure; SBP, Systolic Blood Pressure; BMI: Body Mass Index; RF, Random Forest; GLM: Generalized Linear Model; KNN, K-Nearest Neighbor; SVM, Support Vector Machine; GBM, Gradient Boosting Machine; NNET, Neural Network; DT, Decision Tree; LASSO, Least Absolute Shrinkage and Selection Operator; SHAP, SHapley Additive exPlanations; AUROC, Area Under the Receiver Operating Characteristic Curve; ML, Machine Learning
Eight machine learning models (including RF, SVM, GLM, GBM, KNN, NNET, LASSO and DT) were developed and validated using shadow feature methods, with all models utilizing the same six input variables. The performance of these models was assessed via ROC curves and residual distributions (Fig. 3C and D). Notably, the RF model exhibited superior predictive performance and model fit within the test framework, achieving an AUC of 0.927, thereby establishing its exceptional ability to predict migraine risk. Comparative performance metrics for eight models are summarized in Fig. 4.
Different performance of the eight ML models on the training and test sets. AUC represents the area under the ROC curve and is used to evaluate the performance of the model in binary classification problems. Precision represents the precision rate, the higher the precision rate, the higher the proportion of real examples in the model prediction results, and the stronger the model’s ability to identify positive samples. Recall represents the recall rate, the higher the recall rate, the higher the proportion of positive samples successfully predicted by the model, and the more comprehensive the recognition ability of the model. F1 value represents the harmonic mean of precision and recall ratio are used to comprehensively evaluate the performance of the model. Accuracy refers to the proportion of the samples that are correctly predicted as positive cases to the total number of samples that are predicted as positive cases. Abbreviations: RF, Random Forest; GLM: Generalized Linear Model; KNN, K-Nearest Neighbor; SVM, Support Vector Machine; GBM, Gradient Boosting Machine; NNET, Neural Network; DT, Decision Tree; LASSO, Least Absolute Shrinkage and Selection Operator; AUC, Area Under the Receiver Operating Characteristic Curve; ML, Machine Learning
The SHAP analysis of the optimal RF model quantified the specific contributions of the variables, revealing that gender was the most dominant risk factor among the five, as evidenced by the highest SHAP value. Additionally, SBP and WC were identified as key variables in cardiovascular health and body composition measures, respectively, while BMI and HC demonstrated progressively diminishing importance (Fig. 5A-D). Then, we will develop the most advanced RF model embedded within a web-based platform, which enables individualized risk stratification. This platform features an intuitive user interface and integrates all six key model functionalities corresponding to the input (https://machinelearning1.shinyapps.io/migraine_prediction/). By inputting patient-specific information, the platform outputs the probability of the predicted outcome (Fig. 5E).
The importance and contribution of each characteristic variable in the ML model were evaluated based on shap analysis. A. Feature colony map interpretable based on SHAP; B. Feature importance ranking graph that can be interpreted based on SHAP; C. Interpretable feature variable waterfall diagram based on SHAP; D. Contribution maps of individual features interpretable based on SHAP. E. Visual operational interface of the migraine prediction model. Abbreviations: WC, Waist Circumference; SBP, Systolic Blood Pressure; BMI: Body Mass Index; SHAP, SHapley Additive exPlanations; ML, Machine Learning
Global mapping of migraine burden
By 2050, global migraine incidence is projected to decrease substantially to 858.45 cases per 100,000 population (95% UI: 425.68, 1291.23) compared to 2021 levels (1,153.20, 95% UI: 1,151.53, 1,154.87). However, disease burden estimates indicate an increase from 14,246.55 (95% UI: 14,240.33, 14,252.77) in 2021 to 14,351.68 (95% UI: 7,323.68, 21,379.68) in 2050, while DALYs show a modest decline from 532.70 (95% UI: 531.51, 533.89) to 523.44 (95% UI: 261.39, 785.48) (Fig. 6 & Table S7).
Assessment of the potential health benefits of global mapping and of the burden of migraine based on the BAPC model by 2050. A. Age-standardized incidence rates of migraine from 204 countries. B. Age-standardized prevalence rates of migraine from 204 countries. C. Age-standardized DALY rates of migraine from 204 countries. DALYs combine YLDs and YLLs due to premature mortality. Abbreviations: BAPC, Bayesian Age-Period-Cohort Model; ASIR, Age-standardized Incidence Rate; ASPR, Age-standardized Prevalence Rate; ASDR, Age standardized DALY Rate; DALY, Disability-Adjusted Life Year; YLDs, Years Lived with Disability; YLLs, Years of Life Lost
Notable gender disparities emerge in the analysis, with consistently lower rates observed among males compared to females across all metrics: incidence (858.45 vs. 1,351.44), disease burden (10,601.70 vs. 17,349.22), and DALYs (397.38 vs. 620.39) (Table S7). Despite the projected decrease in incidence risk, migraine continues to impose substantial disease burden and demonstrates significant gender-based differences.
At the national level, the Islamic Republic of Iran (Iran) exhibits the highest predicted risks across all metrics for 2050, with estimated incidence of 2,499.11 (95% UI: 113.87, 4,920.62), disease burden of 60,941.03 (95% UI: 0.00, 168,345.47), and DALY rate of 1,275.87 (95% UI: 64.92, 2,609.83). Belgium and Norway also maintain elevated predictions for migraine-related metrics. Detailed results are presented in Table S8.
Discussion
Migraine represents a significant global health threat and has been associated with psychiatric disorders, diet, and stroke. Despite DALYs associated with migraine escalating, the causal roles of adjustable risk factors, particularly those amenable to population-level interventions, remain underexplored [35,36,37]. Our study examines the causal associations between various adjustable risk factors and migraine from a genetic perspective, progressing to individual-level risk prediction and population-level burden mapping. Through integration of biomarker profiling, causal inference methods, ML predictive modeling, and disease burden mapping, we established causal relationships linking cardiovascular health indicators, body composition markers, and psychological factors with migraine risk modification. We identified key biomarkers and established high-performance risk prediction models, and quantified substantial health benefits from migraine improvement. These findings bridge the gap between biological mechanisms, clinical practice and public health policy, significantly enhancing the operational feasibility of early migraine identification and prevention.
Our MR analyses revealed paradoxical protective associations between body composition metrics (BMI, HC) and migraine risk—contrasting with prior observational evidence [38, 39]. This paradoxical finding may be explained by elevated BMI and HC levels reflecting substantial subcutaneous fat accumulation, which is associated with higher adiponectin secretion. Adiponectin may mitigate migraine risk by inhibiting neuro-inflammatory pathways involving Interleukin-6 (IL-6) and Tumor Necrosis Factor-alpha (TNF-α) [40, 41]. Notably, our study uncovers a complex and multifaceted relationship between obesity, the risks of T2DM, and migraine, challenging the conventional views that treat these conditions solely as independent risk factors. The underlying mechanism may involve metabolic adaptive regulation, wherein higher body fat percentages provide alternative energy reserves for neurons by increasing the availability of free fatty acids, which are subsequently converted into ketone bodies such as β-hydroxybutyric acid [42]. This process helps stabilize cortical excitability fluctuations induced by hypoglycemia. These findings are consistent with results from multicenter cohort studies, which indicate a lower prevalence of migraine among individuals with diabetes compared to those without [43, 44]. Additionally, consistent with clinical research, psychological health indicators including feelings of misery demonstrated detrimental effects on migraine risk, manifesting through pathogenic psychoneuro-inflammatory axes and increased disability risk [45,46,47].
The ML-derived risk stratification model prioritized six intervention targets: gender, age, BMI, WC, hypertension and HC. While gender itself is non-adjustable, its identification as the strongest predictive factor aligns with MR studies showing higher susceptibility to psychological health factors among females, suggesting that estrogen fluctuations may amplify psycho-inflammatory axis effects through regulation of trigeminal vascular reactivity [48,49,50]. These findings indicate that prioritizing psychological interventions in high-risk female populations may yield higher health benefits for reducing migraine risk. Furthermore, the interaction effect between WC and SBP was significant in SHAP analysis, showing that visceral fat accumulation damages endothelium-dependent vasodilation through pro-inflammatory microenvironments, while elevated SBP maintains cerebral perfusion through compensatory vascular tension regulation, suggesting that combined weight reduction and blood pressure monitoring interventions may generate greater health benefits [51]. In addition, the clinical implications of these findings should be interpreted through the lens of precision medicine. While population-level studies support the modifiability of the identified metabolic factors, individual genetic architecture may substantially influence both baseline risk levels and intervention responsiveness. For instance, individuals with high polygenic risk scores (PRS) for dyslipidemia may derive limited benefit from dietary changes alone compared to those with lower genetic risk [52]. Therefore, risk factor management strategies should ideally integrate genetic profiling when available, particularly in cases where conventional approaches prove ineffective.
Despite projected 25.5% reductions in global migraine prevalence by 2050, persistently elevated DALYs highlight ongoing unmet needs in chronic disease management. Females are expected to bear a 1.6- to 1.8-fold greater burden compared to males. Epidemiological investigations demonstrate significant correlations between migraine prevalence variations and female reproductive status, potentially interacting with excitatory circuits, including serotonergic components [50, 53, 54]. While identifying adjustable risk factors for primary prevention, three critical gaps remain. First, current clinical practices predominantly focus on acute symptomatic treatment while neglecting long-term chronic disease management [55]. Second, while fasting glucose, lipid profiles, and diabetes status are generally considered adjustable cardiovascular risk factors, emerging evidence suggests significant genetic modulation of these traits [56]. Therefore, conventional lifestyle interventions may have limited efficacy for individuals with higher PRS, potentially requiring more aggressive pharmacological approaches. In addition, migraine improvement strategies should prioritize gender-related pathophysiological differences, incorporate individualized therapeutic approaches and enhance long-term disease management protocols to optimize population health outcomes. From a public health perspective, the risk stratification framework established in this study provides critical epidemiological evidence for optimizing resource allocation in migraine management. Future implementation research should extend these insights through subtype-specific interventions that account for heterogeneous therapeutic responses across migraine phenotypes.
Our research possesses several notable advantages. First, our genetic tools were selected based on GWAS of diagnosed migraine cases rather than on proxies for symptoms or transient clinical manifestations. MR leverages genetic variations that represent lifelong exposure levels, and its strength lies in being less prone to phenotypic fluctuations compared to observational measurements. Second, while MR suggests that the individual-level effect size may be limited, population-level interventions can still yield significant value in reducing the burden of migraine. Additionally, we performed rigorous pleiotropy tests, preserving the strength of the instruments while substantially mitigating potential pleiotropic confounding.
Limitations and future research directions
Several limitations warrant consideration. First, the migraine risk and burden projections assume maintenance of current treatment efficacy levels and healthcare policy implementation intensity. While our risk prediction models characterize baseline epidemiological patterns, it is important to note that should novel therapeutic interventions achieve widespread clinical adoption, the actual preventable disease burden could substantially exceed our model’s conservative estimates. Second, the reliance on self-reported migraine diagnoses in the UK Biobank cohort introduces potential misclassification bias. The observed associations should be interpreted as reflecting biological pathways shared across self-reported migraine phenotypes rather than mechanisms specific to clinically defined migraine subtypes. Although this approach facilitates large-scale data collection, the lack of clinical verification may conflate heterogeneous headache disorders under a single diagnostic category. Third, the absence of long-term follow-up data further limits longitudinal analysis of disease progression. Fourth, our epidemiological approach, while appropriate for detecting population-level associations, could not account for clinical subtypes (episodic vs. chronic migraine) due to unavailability of standardized diagnostic information. This phenotypic heterogeneity may obscure subtype-specific risk factors and attenuate observed effect sizes. Finally, the accuracy of GBD estimates is contingent on the quality and consistency of source data across regions. Methodological variations in data collection may compromise the reliability of health benefit assessments.
Future research should prioritize three directions. First, incorporating detailed clinical phenotyping to enable subtype-specific risk stratification. Second, establishing longitudinal cohorts to elucidate long-term prognosis in migraine populations. Third, conducting multicenter external validations of machine learning models using diverse population datasets.
Conclusion
This study integrated a framework combining causal inference, ML, and burden prediction to characterize modifiable risk factors for migraine. It identified the causal relationships between metabolic, cardiovascular, and psychological factors and the onset of migraine, and developed a predictive model along with a visualization map of disease burden for personalized risk stratification and burden assessment. Future efforts should focus on developing targeted public health strategies, particularly by coordinating lifestyle modifications, biomarker surveillance, and gender-sensitive policies, to mitigate the global burden of migraines.
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
- MR:
-
Mendelian Randomization
- ML:
-
Machine Learning
- WC:
-
Waist Circumference
- HC:
-
Hip Circumference
- DBP:
-
Diastolic Blood Pressure
- SBP:
-
Systolic Blood Pressure
- AUROC:
-
Area Under Receiver Operating Characteristic Curve
- DALY:
-
Disability Adjusted Life Year
- SNP:
-
Single Nucleotide Polymorphism
- IV:
-
Instrumental Variable
- HDL-C:
-
High-density Lipoprotein Cholesterol
- LDL-C:
-
Low-density Lipoprotein Cholesterol
- T2DM:
-
Type 2 Diabate
- BMI:
-
Body Mass Index
- WHR:
-
Waist-Hip Ratio
- SHAP:
-
Shapley Additive Explanations
- BAPC:
-
Bayesian Age-Period Cohort
- GWAS:
-
Genome-Wide Association Study
- LD:
-
Linkage Disequilibrium
- IVW:
-
Inverse Variance Weighting
- WM:
-
Weighted Median
- HbA1c:
-
Glycohemoglobin
- SMOTE:
-
Synthesized a Few Oversamples
- RF:
-
Random Forest Model
- SVM:
-
Support Vector Machine Model
- GLM:
-
Generalized Linear Model
- GBM:
-
Gradient Boosted Machine
- KNN:
-
K - Nearest Neighbor
- NNET:
-
Neural Network
- LASSO:
-
Least Absolute Shrinkage and Selection Operator
- DT:
-
Decision Tree
- LOO:
-
Leave-One-Out
- OR:
-
Odd Ratio
- CI:
-
Confidence Interval
- PRS:
-
Polygenic Risk Score
- YLDs:
-
Years Lived with Disability
- YLLs:
-
Years of Life Lost
References
Steiner TJ, Stovner LJ (2023) Global epidemiology of migraine and its implications for public health and health policy. Nat Rev Neurol 19:109–117. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41582-022-00763-1
Silberstein SD (2004) Migraine Lancet 363:381–391. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S0140-6736(04)15440-8
Raggi A, Leonardi M, Arruda M et al (2024) Hallmarks of primary headache: part 1 - migraine. J Headache Pain 25:189. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-024-01889-x
Dodick DW (2018) Migraine Lancet 391:1315–1330. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S0140-6736(18)30478-1
Schwedt TJ (2014) Chronic migraine. BMJ 348:g1416. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj.g1416
Parker ED, Pereira MA, Virnig B, Folsom AR (2008) The association of hip circumference with incident hip fracture in a cohort of postmenopausal women: the Iowa women’s health study. Ann Epidemiol 18:836–841. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.annepidem.2008.07.007
Tiseo C, Vacca A, Felbush A et al (2020) Migraine and sleep disorders: a systematic review. J Headache Pain 21:126. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-020-01192-5
Sun X, Song J, Yan R et al (2025) The association between lipid-related obesity indicators and severe headache or migraine: a nationwide cross sectional study from NHANES 1999 to 2004. Lipids Health Dis 24:10. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12944-025-02432-w
Sacco S, Cerone D, Carolei A (2008) Comorbid neuropathologies in migraine: an update on cerebrovascular and cardiovascular aspects. J Headache Pain 9:237–248. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10194-008-0048-4
Rainero I, Govone F, Gai A et al (2018) Is migraine primarily a metaboloendocrine disorder?? Curr Pain Headache Rep 22:36. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11916-018-0691-7
Bernecker C, Pailer S, Kieslinger P et al (2010) GLP-2 and leptin are associated with hyperinsulinemia in non-obese female migraineurs. Cephalalgia 30:1366–1374. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/0333102410364674
Al-Hassany L, Acarsoy C, Ikram MK et al (2024) Sex-Specific association of cardiovascular risk factors with migraine: the Population-Based Rotterdam study. Neurology 103:e209700. https://doiorg.publicaciones.saludcastillayleon.es/10.1212/WNL.0000000000209700
Russo A, Bruno A, Trojsi F et al (2016) Lifestyle factors and migraine in childhood. Curr Pain Headache Rep 20:9. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11916-016-0539-y
Casucci G, Villani V, d’Onofrio F, Russo A (2015) Migraine and lifestyle in childhood. Neurol Sci 36 Suppl 197–100. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10072-015-2168-3
Koller LS, Diesner SC, Voitl P (2019) Quality of life in children and adolescents with migraine: an Austrian monocentric, cross-sectional questionnaire study. BMC Pediatr 19:164. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12887-019-1537-0
Wang Y-P, Wei H-X, Hu Y-Y et al (2025) Causal association between obstructive sleep apnea and migraine: A bidirectional Mendelian randomization study. Nat Sci Sleep 17:183–194. https://doiorg.publicaciones.saludcastillayleon.es/10.2147/NSS.S492630
Yuan S, Daghlas I, Larsson SC (2022) Alcohol, coffee consumption, and smoking in relation to migraine: a bidirectional Mendelian randomization study. Pain 163:e342–e348. https://doiorg.publicaciones.saludcastillayleon.es/10.1097/j.pain.0000000000002360
Hong P, Han L, Wan Y (2024) Mendelian randomization study of lipid metabolism characteristics and migraine risk. Eur J Pain 28:978–986. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/ejp.2235
Li M, Qu K, Wang Y et al (2025) Associations between post-traumatic stress disorder and neurological disorders: A genetic correlation and Mendelian randomization study. J Affect Disord 370:547–556. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2024.11.044
Zhang L, Li Y, Xu Y et al (2025) Machine learning-driven identification of critical gene programs and key transcription factors in migraine. J Headache Pain 26:14. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-025-01950-3
Petrušić I, Ha W-S, Labastida-Ramirez A et al (2024) Influence of next-generation artificial intelligence on headache research, diagnosis and treatment: the junior editorial board members’ vision - part 1. J Headache Pain 25:151. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-024-01847-7
Liu Y, Yin S, Chen B et al (2022) Development and validation of an online nomogram for predicting the outcome of open tracheotomy decannulation: a two-center retrospective analysis. Am J Transl Res 14:8343–8360
Han Y, Ren Z, Liu Y, Liu Y (2024) Diagnostic and prognostic value of fibrinogen, fibrinogen degradation products, and lymphocyte/monocyte ratio in patients with laryngeal squamous cell carcinoma. Ear Nose Throat J 103:NP278–NP288. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/01455613211048970
Burgess S, Thompson SG, CRP CHD Genetics Collaboration (2011) Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol 40:755–764. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/ije/dyr036
Brion M-JA, Shakhbazov K, Visscher PM (2013) Calculating statistical power in Mendelian randomization studies. Int J Epidemiol 42:1497–1501. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/ije/dyt179
Burgess S, Scott RA, Timpson NJ et al (2015) Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol 30:543–552. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10654-015-0011-z
Bowden J, Davey Smith G, Haycock PC, Burgess S (2016) Consistent Estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol 40:304–314. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/gepi.21965
Davey Smith G, Hemani G (2014) Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet 23:R89–98. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/hmg/ddu328
Holmes MV, Ala-Korpela M, Smith GD (2017) Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nat Rev Cardiol 14:577–590. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/nrcardio.2017.78
Burgess S, Bowden J, Fall T et al (2017) Sensitivity analyses for robust causal inference from Mendelian randomization analyses with multiple genetic variants. Epidemiology 28:30–42. https://doiorg.publicaciones.saludcastillayleon.es/10.1097/EDE.0000000000000559
Hemani G, Bowden J, Davey Smith G (2018) Evaluating the potential role of Pleiotropy in Mendelian randomization studies. Hum Mol Genet 27:R195–R208. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/hmg/ddy163
Bifarin OO (2023) Interpretable machine learning with tree-based Shapley additive explanations: application to metabolomics datasets for binary classification. PLoS ONE 18:e0284315. https://doiorg.publicaciones.saludcastillayleon.es/10.1371/journal.pone.0284315
Liu Y, Han Y, Chen B et al (2022) A new online dynamic nomogram: construction and validation of an assistant Decision-Making model for laryngeal squamous cell carcinoma. Front Oncol 12:829761. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fonc.2022.829761
Riebler A, Held L (2017) Projecting the future burden of cancer: bayesian age-period-cohort analysis with integrated nested Laplace approximations. Biom J 59:531–549. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/bimj.201500263
Antonacci G, Vanna R, Ventura M et al (2024) Birefringence-induced phase delay enables Brillouin mechanical imaging in turbid media. Nat Commun 15:5202. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41467-024-49419-2
Borończyk M, Zduńska A, Węgrzynek-Gallina J et al (2025) Migraine and stroke: correlation, coexistence, dependence - a modern perspective. J Headache Pain 26:39. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-025-01973-w
Thuraiaiyah J, Ashina H, Christensen RH et al (2024) Postdromal symptoms in migraine: a REFORM study. J Headache Pain 25:25. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-024-01716-3
Peterlin BL, Rosso AL, Williams MA et al (2013) Episodic migraine and obesity and the influence of age, race, and sex. Neurology 81:1314–1321. https://doiorg.publicaciones.saludcastillayleon.es/10.1212/WNL.0b013e3182a824f7
Gelaye B, Sacco S, Brown WJ et al (2017) Body composition status and the risk of migraine: A meta-analysis. Neurology 88:1795–1804. https://doiorg.publicaciones.saludcastillayleon.es/10.1212/WNL.0000000000003919
Arzani M, Jahromi SR, Ghorbani Z et al (2020) Gut-brain Axis and migraine headache: a comprehensive review. J Headache Pain 21:15. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-020-1078-9
Empl M, Sostak P, Riedel M et al (2003) Decreased sTNF-RI in migraine patients? Cephalalgia 23:55–58. https://doiorg.publicaciones.saludcastillayleon.es/10.1046/j.1468-2982.2003.00453.x
Wang W, Zhu C, Martelletti P (2024) Understanding headaches attributed to cranial and/or cervical vascular disorders: insights and challenges for neurologists. Pain Ther 13:1429–1445. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s40122-024-00668-5
Aamodt AH, Stovner LJ, Midthjell K et al (2007) Headache prevalence related to diabetes mellitus. The Head-HUNT study. Eur J Neurol 14:738–744. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/j.1468-1331.2007.01765.x
Hagen K, Åsvold BO, Midthjell K et al (2018) Inverse relationship between type 1 diabetes mellitus and migraine. Data from the Nord-Trøndelag health surveys 1995–1997 and 2006–2008. Cephalalgia 38:417–426. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/0333102417690488
Sullivan A, Cousins S, Ridsdale L (2016) Psychological interventions for migraine: a systematic review. J Neurol 263:2369–2377. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00415-016-8126-z
Burrowes SAB, Goloubeva O, Stafford K et al (2022) Enhanced mindfulness-based stress reduction in episodic migraine-effects on sleep quality, anxiety, stress, and depression: a secondary analysis of a randomized clinical trial. Pain 163:436–444. https://doiorg.publicaciones.saludcastillayleon.es/10.1097/j.pain.0000000000002372
Castro Zamparella T, Carpinella M, Peres M et al (2024) Specific cognitive and psychological alterations are more strongly linked to increased migraine disability than chronic migraine diagnosis. J Headache Pain 25:37. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-024-01734-1
Mitter VR, Lupattelli A, Bjørk M-H, Nordeng HME (2024) Identification and characterization of migraine in pregnancy: A Norwegian registry-based cohort study. Cephalalgia 44:3331024241248846. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/03331024241248846
Buzzi MG, Cologno D, Formisano R (2005) Migraine disease: evolution and progression. J Headache Pain 6:304–306. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10194-005-0215-9
Brandes JL (2006) The influence of Estrogen on migraine: a systematic review. JAMA 295:1824–1830. https://doiorg.publicaciones.saludcastillayleon.es/10.1001/jama.295.15.1824
Zhou YB, Gu HG, Liu YX (1988) [Measurements of portal venous pressure in 100 normal Chinese]. Zhonghua Wai Ke Za Zhi 26:158–159
Park S-J, Kim M-S, Choi S-W, Lee H-J (2020) The relationship of dietary pattern and genetic risk score with the incidence dyslipidemia: 14-Year Follow-Up cohort study. Nutrients 12:3840. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/nu12123840
Silberstein SD, Merriam GR (1991) Estrogens, progestins, and headache. Neurology 41:786–793. https://doiorg.publicaciones.saludcastillayleon.es/10.1212/wnl.41.6.786
Russell MB, Rasmussen BK, Thorvaldsen P, Olesen J (1995) Prevalence and sex-ratio of the subtypes of migraine. Int J Epidemiol 24:612–618. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/ije/24.3.612
Haghdoost F, Togha M (2022) Migraine management: Non-pharmacological points for patients and health care professionals. Open Med (Wars) 17:1869–1882. https://doiorg.publicaciones.saludcastillayleon.es/10.1515/med-2022-0598
Pang S, Yengo L, Nelson CP et al (2023) Genetic and modifiable risk factors combine multiplicatively in common disease. Clin Res Cardiol 112:247–257. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00392-022-02081-4
Acknowledgements
This study was supported by The First Affiliated Hospital of Anhui Medical University and Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University. We appreciate the National Natural Science Foundation of China (Grant No. 82171127) and Zhejiang University School of Medicine Affiliated Sir Run Run Shaw Hospital Cultivation Project (Grant No. YQNPY24238).
Author information
Authors and Affiliations
Contributions
WW and HFP supported the conception and design of this project. YCL and HFP acquired and analyzed the data. WW and YHL contributed to data quality control. YCL produced the first draft. All authors contributed intellectual content to the revised manuscript and have read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, YC., Liu, YH., Pan, HF. et al. Unveiling new insights into migraine risk stratification using machine learning models of adjustable risk factors. J Headache Pain 26, 103 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-025-02049-5
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-025-02049-5