Skip to main content

Prediction models for treatment response in migraine: a systematic review and meta-analysis

Abstract

Background

Migraine is a complex neurological disorder with significant clinical variability, posing challenges for effective management. Multiple treatments are available for migraine, but individual responses vary widely, making accurate prediction crucial for personalized care. This study aims to examine the use of statistical and machine learning models to predict treatment response in migraine patients.

Methods

A systematic review and meta-analysis were conducted to assess the performance and quality of predictive models for migraine treatment response. Relevant studies were identified from databases such as PubMed, Cochrane Register of Controlled Trials, Embase, and Web of Science, up to 30th of November 2024. The risk of bias was evaluated using the PROBAST tool, and adherence to reporting standards was assessed with the TRIPOD + AI checklist.

Results

After screening 1,927 documents, ten studies met the inclusion criteria, and six were included in a quantitative synthesis. Key data extracted included sample characteristics, intervention types, response outcomes, modeling methods, and predictive performance metrics. A pooled analysis of the area under the curve (AUC) yielded a value of 0.86 (95% CI: 0.67–0.95), indicating good predictive performance. However, the included studies generally had a high risk of bias, particularly in the analysis domain, as assessed by the PROBAST tool.

Conclusion

This review highlights the potential of statistical and machine learning models in predicting treatment response in migraine patients. However, the high risk of bias and significant heterogeneity emphasize the need for caution in interpretation. Future research should focus on developing models using high-quality, comprehensive, and multicenter datasets, rigorous external validation, and adherence to standardized guidelines like TRIPOD + AI. Incorporating multimodal magnetic resonance imaging (MRI) data, exploring migraine symptom-treatment interactions, and establishing uniform methodologies for outcome measures, sample size calculations, and missing data handling will enhance model reliability and clinical applicability, ultimately improving patient outcomes and reducing healthcare burdens.

Trial registration

PROSPERO, CRD42024621366.

Peer Review reports

Introduction

Migraine is one of the most prevalent neurological disorders, characterized by recurrent pulsating headaches that typically affect one side of the head with moderate to severe intensity [1]. The Global Burden of Disease (GBD) 2021 study estimates that migraine affects approximately 14% of the global population, translating to around 1.16 billion prevalent cases [2]. The World Health Organization (WHO) ranks migraine attacks among the highest disability categories, contributing to 43.4 million years lived with disability (YLDs) and making it the third leading cause of YLDs worldwide [2, 3]. This highlights the increasing societal and healthcare challenges associated with migraine.

Research over the past few decades has resulted in effective treatments for migraine, including pharmacotherapy, neuromodulatory devices, acupuncture, and more [4]. However, each treatment method has a significant proportion of patients who do not experience improvement (non-responses) in their condition after treatment [5,6,7,8]. The high rate of non-response to various treatments may suggest that there is no one-size-fits-all approach, as suitability varies among individuals or patient subgroups. Assigning migraine patients a priori to the most promising treatment for them could help reduce the rate of non-response. A prerequisite for this treatment allocation is having sufficiently accurate predictions of treatment outcomes at the individual subject level [9].

Individualized treatment is an important trend in future medical therapy [10]. The widespread use of medical imaging methods in the diagnosis and treatment of neurological and psychiatric disorders has facilitated the development of neuroimaging-based biomarkers for migraine. These predictive factors aid in disease diagnosis and classification, as well as in forecasting the outcomes and prognosis of individualized treatment [11,12,13,14,15]. Predictive models have been established based on functional and structural changes in migraine patients to assess the efficacy of treatments, yielding encouraging results [14,15,16]. However, there are few studies that utilize multimodal magnetic resonance imaging (MRI) information to predict the efficacy of all treatment methods in migraine therapy.

From a methodological perspective, both statistical modeling and machine learning approaches are particularly well-suited for this endeavor. They are capable of explaining existing data and accurately predicting new data [17]. To clearly explain the observed data, statistical models often identify possible relationships between variables and utilize a limited number of dependent variables. In contrast, machine learning methods encompass a wide range of algorithms that can effectively handle large numbers of variables and capture nonlinear relationships. Some of the most common algorithms include Logistic Regression (LR) [18], Support Vector Machines (SVM) [19], Random Forests (RF) [20], and Neural Networks [21]. Several metrics are available to assess a model’s classification performance on a test set, primarily focusing on the number of correct and incorrect predictions. One of the most common and widely used metrics is accuracy, which summarizes the ratio of correctly classified cases (both positive and negative) to the total number of cases. However, its interpretability decreases when dealing with imbalanced classes (e.g., 60% non-responders and 40% responders), a factor often overlooked in several studies. In such cases, it is advisable to use alternative metrics, including the area under the curve (AUC) of the receiver operating characteristic (ROC) [22].

With respect to migraine, numerous models utilizing statistical methods and machine learning have been proposed to predict clinical outcomes based on patients’ pre-treatment characteristics [12, 23,24,25]. However, to our knowledge, no systematic review has been conducted to date to summarize and evaluate these migraine efficacy prediction models. Therefore, this systematic review aims to comprehensively analyze the application of these migraine models in predicting clinical treatment responses.

Methods

Search strategy

The electronic databases PubMed, Cochrane Register of Controlled Trials, Embase, Web of Science, China National Knowledge Infrastructure (CNKI), China Biology Medicine disc (CBM), VIP, and Wanfang (WF) were searched for relevant studies from inception up the 30th of November 2024. Search terms encompassed keywords for migraine, treatment, and machine learning (The detailed strategies used for the search can be found in Supplementary file 1). Additionally, reference lists of eligible studies and review articles were screened.

For the systematic review, we adopted the populations, interventions, comparators, outcomes, timing, types of studies and settings (PICOTS) system, which was recommended by the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist [26]. This system helps frame the review’s aim, search strategy, and criteria for study inclusion and exclusion [27]. Below are the key components of our systematic review:

P (Population): Patients with migraine.

I (Intervention model): Treatment prediction models for migraine patients that were developed and published.

C (Comparator): No competing model.

O (Outcome): The outcome focused on response/ remission to the treatment.

T (Timing): The outcome was predicted after evaluating basic information at admission, clinical scoring scale results, or multimodal magnetic resonance scans.

S (Setting): The intended use of the prediction model is to individualize the prediction of treatment outcome in patients with migraine, selecting a more beneficial and targeted intervention type for the patient.

Inclusion and exclusion criteria

The inclusion criteria for studies were: (1) studies involving patients with migraine; (2) randomized controlled trials, retrospective, and prospective studies; (3) reported at least one prediction model; (4) predicting treatment outcome as a categorical outcome; (5) predicting outcome to any treatment that aimed to improve the patients’ condition. The exclusion criteria were: (1) not written in English or Chinese; (2) the full text could not be retrieved despite contacting the authors via email.

Study selection and screening

The screening process of the studies was conducted independently by two authors (QYC and JRZ). The process commenced with the elimination of duplicate studies. Subsequently, the eligibility of the remaining studies was evaluated through a review of titles and abstracts. Following the application of the inclusion and exclusion criteria, full texts were reviewed. Additionally, reference lists of all eligible studies were examined to identify any potentially relevant studies. In case of disagreements regarding study selection, a discussion involving three authors (QYC, JRZ, and LL) was held to reach a consensus.

Data extraction

Two authors (QYC and JRZ) independently extracted data from included studies. Following completion, crosschecks were performed. In the event of any disputes, discrepancies were resolved through discussion or by consulting a third author (LL).

The information extracted from the selected studies was categorized into two groups: (1) Basic information: included details such as the author, year of publication, disorder, intervention, outcome measures, treatment duration, response/non response rates, and sample size. (2) Model information: encompassed information related to the prediction model, including the model development method, model validation type, modality, concordance index (c. index), specificity and sensitivity of the best model, and the final predictors used in the model.

Quality assessment

The best model of each study was assessed for risk of bias using Prediction model Risk of Bias Assessment Tool (PROBAST) [28, 29], a tool developed for evaluating the quality of prediction models. This comprehensive tool, which is based on 20 signaling questions, categorizes the risk of bias into four critical domains: participants, predictors, outcome and analysis. Each domain is evaluated independently, with the responses to the signaling questions being classified as “yes”, “probably yes”, “no”, “probably no”, or “no information”. A domain is deemed to have a high risk of bias if any of its signaling questions are answered with “no” or “probably no”. Consequently, only when all four domains are adjudged to have a low risk of bias can the overall risk of bias for a model be considered low. Two authors (JRZ and BCC) independently evaluated the presence of bias and concerns regarding the applicability of the studies.

While assessing the risk of bias in predictive models within the studies, we endeavored to apply the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis + Artificial Intelligence (TRIPOD + AI) [30] for standardized evaluation of the completeness of reporting in the processes of developing and validating predictive models across the included studies. TRIPOD + AI statement comprises an expanded 27-item checklist, each item accompanied by detailed explanations and guidance. This checklist not only covers traditional sections such as title, abstract, introduction, methods, results, and discussion but also specifically emphasizes open science practices, patient and public involvement, and equity issues. For each included study, two authors (JRZ and YHH) independently evaluated whether it adhered to the checklist, using “Y” for “Yes” and “N” for “No”, and ultimately calculated the TRIPOD + AI adherence rate for each item and publication.

Data analysis

Following data extraction, we performed a meta-analysis of the AUC values derived from the validated models. Anticipating substantial heterogeneity in AUC values across studies, we employed a random-effects model and applied Knapp-Hartung adjustments [31] to estimate the confidence interval for the mean AUC. Between-group heterogeneity within the random-effects model was assessed using the restricted maximum likelihood estimator [32]. We used the I2 index to assess heterogeneity. The I2 index quantifies heterogeneity, with thresholds of 25%, 50%, and 75% corresponding to low, moderate, and high heterogeneity, respectively [33]. Statistical analyses were conducted in R using the metamisc and metafor package [34, 35] (R Foundation for Statistical Computing, version 4.4.2).

Results

Study selection

Figure 1 shows the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 flowchart depicting the comprehensive search process and results [36]. The initial search identified 1,927 indexed records. After removing 487 duplicates across all databases, 1,440 titles and abstracts were screened for eligibility. Of these, 61 studies underwent full-text screening. During this process, four studies were excluded due to the unavailability of full texts. Additionally, nine studies were excluded because they did not align with the target population of the review, twenty-four studies were excluded for failing to develop prediction models, fifteen studies were excluded due to irrelevant outcomes, and one study was excluded as duplicated data. A manual search of references and citations resulted in the inclusion of two additional articles. Ultimately, ten studies were included in this review.

Fig. 1
figure 1

Flow diagram of the literature screening process and results

Study characteristics

Table 1 summarizes the characteristics of ten studies published between 2019 and 2024, focusing on the application of machine learning methods to migraine treatment and prediction. Six studies focused on patients with migraine without aura (MwoA) [12, 14, 23, 37,38,39], one study targeted both MwoA and migraine with aura (MwA) [40], and one study addressed chronic migraine (CM) as well as MwoA and MwA [24]. Additionally, two studies included unspecified migraine types [16, 25]. The sample sizes ranged from 41 to 712 participants, with varied study durations between 4 and 48 weeks.

Table 1 Main characteristics and results of the included studies

The machine learning models employed in the studies were diverse, indicating a broad exploration of potential predictive algorithms. Four studies applied SVM [25, 37,38,39], while two used support vector regression (SVR) [14, 23]. RF were utilized in two studies [16, 24], and linear SVM appeared in one study [12]. A broader approach was evident in the study by including LR, decision trees (DT), and multilayer perceptron (MLP) [25].

Treatment interventions also varied across the included studies. Six studies implemented pharmacological treatments, including nonsteroidal anti-inflammatory drugs (NSAIDs), sumatriptan, and anti-calcitonin gene related peptide (CGRP) monoclonal antibodies [16, 24, 25, 37, 38, 40]. Three studies investigated acupuncture as an intervention [12, 14, 39], while one study employed transcutaneous auricular vagus nerve stimulation (taVNS) [23]. Outcome measures typically assessed treatment efficacy in terms of symptom improvement, such as reductions in Visual Analog Scale (VAS) score [16, 23, 25, 37, 38] or decrease in the number of migraine days [12, 39].

Included predictors in prediction models

The details of the predictive factors and the rationale behind the selection of predictors used in the included studies are summarized in Supplementary Table S1. Three studies explicitly utilized clinical data as predictive factors. One study specifically mentioned using monthly migraine days (MMDs), monthly headache day (MHD), and headache impact test (HIT-6) to predict treatment efficacy [24]. The second study provided a description, considering disease duration, VAS, frequency, Generalized Anxiety Disorder-7 item, Patient Health Questionnaire-9 item, and Pittsburgh Sleep Quality Index as predictive factors [25]. The third study combined clinical data with rs-fMRI features of amygdala-specific functional connectivity differences, but did not detail the specific clinical features employed [37].

Neuroimaging modalities, including rs-fMRI, sMRI, and DTI, were also commonly used. Five studies applied rs-fMRI [14, 16, 23, 37, 38]. These studies identified differential brain regions by comparing rs-fMRI metrics between migraine patients and healthy controls, such as causal connectivity, amplitude of low frequency fluctuations (ALFF), and functional connectivity, and used these features to construct predictive models. Notably, two studies used ALFF or percent amplitude of fluctuation of the insula as a feature [14, 16]. Three studies employed sMRI, identifying brain regions with gray matter volume differences between migraine patients and healthy controls as features for predictive modeling [16, 39, 40]. Among these, two studies used gray matter volume differences in the precuneus (L) and superior frontal gyrus (R) as features [16, 39]. Additionally, one study applied DTI, using white matter microstructural features from the external capsule and fiber pathways of the anterior cingulate cortex (ACC)/medial prefrontal cortex (mPFC) to build the prediction model [12].

Models validation

The potential for a prediction to apply to individuals outside the model development group can be assessed by comparing the prediction accuracy between internal cross-validation and external validation, or through independent replication in new datasets. Among the included studies, all models were developed and internally validated, with no study conducting external validation using a dataset for migraine. The most common validation method was fold cross-validation (CV), employed in seven studies [14, 16, 23, 25, 37,38,39], while one study utilized leave-one-out cross-validation (LOOCV) [12].

Results of quality assessment

The PROBAST assessment revealed that all included studies were generally considered to have a high risk of bias, with the primary issues concentrated in the analysis domain. (see Fig. 2; Table 2 and Supplementary Table S2)

Fig. 2
figure 2

Risk of bias assessment using the PROBAST based on four domains

Table 2 Tabular presentation for the PROBAST assessment of included studies

In the participant domain, one study was rated as high risk due to its retrospective design [40], and another study had an unclear risk of bias because of inadequate patient inclusion and exclusion criteria [25]. In the predictor domain, one study had an unclear risk of bias as it was a multicenter study that failed to report quality control measures to minimize bias [24]. Another study was rated as high risk because of its retrospective nature [40]. In the outcome domain, one study was deemed high risk due to the lack of definition of clinical outcomes [14], while another study was rated as unclear risk for providing vague outcome definitions [40].

In the analysis domain, all ten studies were identified as high risk. Regarding question 4.1, three studies met the recommended “events per variable” (EPV) threshold of greater than 20, indicating sufficient sample sizes [12, 25, 38]. The remaining seven studies were at high risk due to insufficient or unclear numbers of non-responders or prognostic factors, resulting in EPVs below the standard or an inability to accurately determine EPV [14, 16, 23, 24, 37, 39, 40]. Regarding question 4.3, one study mentioned excluding participants due to poor image quality, which was considered a low-risk action [37]. Since image quality is beyond the researchers’ control, excluding subjects for this reason is deemed low risk. However, direct exclusion of participants for other reasons was considered high risk. Regarding question 4.4, only one study reported how missing data were handled, but the method used was not recommended by PROBAST, leading to a potential risk of bias [24]. Regarding question 4.5, one study failed to avoid univariate analysis in the best-performing model [25], posing a potential bias risk. Regarding question 4.6, no studies provided information on data complexity. Regarding question 4.7, only one study conducted a comprehensive assessment of its prediction model’s performance [16], and two studies did not specify their model evaluation methods [14, 23]. Regarding question 4.8, one study used an osteoarthritis dataset for validation, which does not qualify as strict external validation [16], while another study had unclear external validation due to conflicting participant groupings [40]. The remaining studies did not address model overfitting, underfitting, or optimism in performance metrics. Regarding question 4.9, just one study reported model parameters in its supplementary materials [39].

We identified ten articles published after 2020. Among these, four studies achieved an overall adherence of 50% or more to the TRIPOD + AI guidelines [16, 24, 39, 40], while six studies met adherence levels from 40 to 50% (Table S3 and Fig. S1) [12, 14, 23, 25, 37, 38]. The overall adherence of the studies to the TRIPOD + AI statement ranged from 40.38 to 53.85%, with a median adherence of 47.12% (Interquartile Range: 42.31– 51.92%).

A total of 18 TRIPOD + AI items demonstrated at least 80% adherence (Table S3 and Fig. S1). These included items such as title, background (3a & 3b), objectives, participants (6b), data preparation, outcomes (8a), predictors (9a & 9c), analytical methods (12a & 12e), ethical approval, open science (18a, 18b, & 18e), results (20b), and model limitations and usability (27c). However, 18 TRIPOD + AI items showed adherence rates below 20%, primarily concentrated on the methods and results sections related to model performance, development, and predictors. (Table S3 and Fig. S1)

Meta-analysis

Due to limited reporting on the detailed development and validation of models in the included studies, only six were eligible for quantitative synthesis. A pooled area under the curve (AUC) was calculated using a random-effects model, yielding a value of 0.86 (95% confidence interval: 0.67–0.95) (Fig. 3). However, a high degree of heterogeneity was observed among the studies, as indicated by an I² value of 90.84% (p < 0.001).

Fig. 3
figure 3

Forest plot of a random-effect meta-analysis on the AUC values. AUC area under the curve, CI confidence interval

Discussion

Previous studies have conducted systematic reviews and meta-analyses on diagnostic tools for migraine and predictive models for the classification and diagnosis of medication-overuse headache, revealing that classification diagnostic models for migraine remain in the developmental and exploratory stages [41, 42]. Predicting treatment efficacy for migraine is equally critical, as this complexly classified disorder is influenced by various factors, including treatment modalities, disease subtypes, and demographic characteristics [43]. Evaluating the likelihood of a therapy’s effectiveness before initiating treatment is crucial for optimizing patient outcomes. This study aimed to evaluate the application of statistical modeling and machine learning in predicting the clinical response to treatment for migraine patients based on pretreatment characteristics. We identified ten studies that met the inclusion criteria, with a total of 1,260 patients. The machine learning methods used in these studies included SVM, RF, and LR. The treatment outcomes were assessed using various measures, such as the VAS and the number of migraine days.

In recent years, machine learning-based predictive models using MRI, including rs-fMRI, sMRI, and DTI, have been applied to the diagnostic classification and treatment efficacy prediction of various diseases [44,45,46]. Several studies have employed MRI to explore the brain’s functional and structural changes in migraine, attempting to explain the pathophysiology and treatment mechanisms [47, 48]. Some studies have suggested that machine learning techniques often achieve greater accuracy than traditional logistic regression [49]. This study incorporated multiple brain imaging-based predictive models constructed using machine learning, and the results revealed that the imaging features included in each model were not entirely consistent. Notably, among the ten studies, researchers employed various distinct brain regions for feature prediction. The insula, left precuneus, and right superior frontal gyrus were each used in two different studies [14, 38, 39], while other brain regions showed no significant overlap across studies (Supplementary Table S1). These brain regions may participate in central mechanisms that influence treatment efficacy in migraine. However, due to the limited number of studies included, it is insufficient to establish a strong correlation between treatment efficacy prediction and these brain regions. Future research should focus on these regions to better understand how structural and functional changes in these areas may impact treatment outcomes.

Currently, there is a lack of systematic reviews on imaging-based predictive models for the therapeutic efficacy of migraine and other pain-related disorders. Similar to prediction model studies in other diseases, the accuracy of migraine prediction models in this study is influenced by multiple factors, primarily including sample size and validation methods. Insufficient sample size and the lack of external validation significantly affect the reliability and generalizability of prediction models, often leading to overfitting and overestimation of model performance. Small sample sizes tend to result in high bias and variance, as models may overfit the training data by capturing noise instead of true patterns [50]. Cross-validation has been shown to produce overly optimistic accuracy estimates in small datasets [51]. To mitigate these risks, the EPV ratio of at least 20:1 is recommended to reduce overfitting and improve generalization, thereby minimizing bias [29]. Moreover, insufficient sample size can undermine the robustness of machine learning algorithms and limit their applicability to unseen data, thus emphasizing the need for adequately powered studies. The lack of external validation further exacerbates these issues by preventing a realistic evaluation of model performance on independent datasets. Internal validation methods, such as cross-validation or bootstrapping, tend to underestimate the degradation in model performance, leading to a biased assessment of reliability [52]. Additionally, external validation with small sample sizes may fail to detect overfitting or performance declines, resulting in misleading conclusions about a model’s applicability in real-world settings [53]. To address these challenges, researchers should ensure external validation datasets are sufficiently large to capture performance variability and identify potential model failures [54]. Moreover, research has shown that smaller-scale studies and those employing insufficient validation methods tend to report higher prediction accuracy, which parallels the situation observed in this study [55]. Future research on migraine prediction models should aim to design studies with larger sample sizes and conduct external validation using independent datasets.

Meanwhile, different machine learning methods and feature selection approaches may also influence the accuracy of predictive models [56,57,58]. However, some studies have different conclusions. For instance, in a study on medication overuse headache, no significant differences in prediction accuracy were found as the use of various machine learning methods [59]. Regarding feature selection, the feature selection in the migraine studies included in this analysis varies. One research has found no significant association between feature selection methods and prediction accuracy [55]. The impact of different machine learning methods and feature selection techniques on model accuracy requires further investigation through larger-scale studies.

Our meta-analysis showed that the pooled AUC for predicting treatment response was 0.86 (95% CI: 0.67–0.95), indicating a relatively high discriminatory ability of the models. However, the I2 value was 90.84% (p < 0.001), suggesting significant heterogeneity among the studies. The observed heterogeneity can be attributed to several factors, including variations in interventions and outcome measures (clinical heterogeneity), as well as differences in study designs and potential biases (methodological heterogeneity) [60]. Among the ten studies included in this review, there are notable variations in migraine types (CM, MwoA, and MwA), interventions (pharmacological treatments, acupuncture, and taVNS), outcome measures (response in VAS score, migraine days, headache days, and headache intensity), and research methodologies (including randomized controlled trials, retrospective, and prospective studies, blinding vs. open-label designs, etc.). These factors have undoubtedly contributed to the complexity of the observed heterogeneity in the results. Additionally, the wide variation in intervention durations (from 4 to 48 weeks) has further amplified the diversity of the findings. However, given the small number of studies included in the meta-analysis, subgroup analyses and meta-regression were not feasible, thus preventing exploration of the sources of heterogeneity. Additionally, variability in the reporting of predictors, outcomes, and model performance further hindered more detailed stratified analyses. The lack of standardized approaches across studies likely contributed to the observed heterogeneity. Future research should aim to ensure more consistent reporting of key methodological details to facilitate more comprehensive meta-analyses and subgroup comparisons.

The PROBAST assessment revealed that all included studies exhibited a high risk of bias, predominantly within the analysis domain. Common issues included insufficient sample sizes, unclear handling of missing data, and the absence of external validation. A robust sample size is critical for maintaining the events-per-variable ratio, which prevents overfitting and enhances model reliability [29]​. For handling missing data, PROBAST recommends the use of multiple imputation [29]. Multiple imputation is superior to other methods in terms of reducing bias and improving precision, in both model development and validation studies [61, 62]. None of the ten included studies conducted external validation, however, external validation remains a critical step in assessing the robustness of the models. Additionally, most studies recruited participants from single centers, with just one study including multicenter data [24]. Incorporating multicenter data can help mitigate overfitting and improve generalizability, particularly in heterogeneous disorders such as migraine [58].

Furthermore, we employed the TRIPOD + AI checklist to evaluate the reporting quality of the included studies. TRIPOD + AI represents the latest guideline for standardized reporting in clinical predictive modeling [30]. Compared to the original TRIPOD statement [63], the TRIPOD + AI checklist encompasses every critical step in the development of AI-based predictive models, ensuring adherence to prescribed methodologies, and enhancing both the transparency and comprehensiveness of research reports. This checklist not only facilitates the evaluation of studies but also elevates their scientific credibility and recognition.

Among the ten included studies, adherence to the TRIPOD + AI checklist was notably low, with only four studies achieving an overall adherence rate of 50% or higher. A cross-sectional comparison revealed that TRIPOD + AI items with adherence rates below 20% were concentrated in the methods and results sections related to model performance, development, and predictors. Several key TRIPOD + AI items were consistently ignored in the ten included studies, which significantly impacts their reproducibility. For instance, included studies were no explanation regarding how the study size was determined or any sample size calculations, making it difficult to assess statistical power and generalizability. Additionally, the handling of predictors, including transformations, rescaling, or standardization, was not described, complicating the understanding and reproduction of the model. The lack of information on model updating, such as recalibration, limited the ability to understand how the model adapts to new data, reducing its applicability in real-world settings. Furthermore, fairness considerations and methods to address class imbalance were not addressed, which could lead to biased predictions and poor reproducibility. Differences between the development and evaluation datasets, such as healthcare settings or eligibility criteria, were also not identified, compromising the model’s generalizability. Moreover, the full prediction model, including code, formula, or application programming interface), was not provided, hindering replication and adaptation to new settings. Lastly, performance estimates were reported without confidence intervals, leaving uncertainty about the model’s reliability and stability. These omissions undermine reproducibility and limit the practical application of the model. These findings highlight the need for improved methodological rigor and reporting standards in future research.

The results of this meta-analysis have several potential implications for clinical practice and future research. Firstly, the relatively high AUC values suggest that machine learning models have the potential to assist clinicians in predicting treatment response for migraine patients. However, the high heterogeneity and risk of bias call for caution in interpreting and applying these results. Many of the included studies primarily conducted internal validation but faced the issue of overfitting, which limits the generalizability of the models [64]. Clinicians need to be aware that the models’ predictions may not be equally reliable across patients with migraine. Secondly, future studies should focus on improving the methodological quality of prediction models, including appropriate sample size calculations, handling of missing data, and external validation. Understanding how different migraine symptoms interact with various treatment modalities and how they can be better incorporated into the models will be crucial. After standardizing reporting and optimizing model construction, potential benefits could be seen in areas such as treatment decision-making, healthcare cost control, and reducing the time patients take to select the most appropriate treatment. Thirdly, the variability in outcome measures highlights the need for standardization in future research. We recommend focusing on developing clear, standard, and evidence-based outcome measures for treatment response that are applicable across diverse patient populations. Additionally, identifying clinical and demographic predictors of responsiveness could help classify patients more accurately as responsive or non-responsive, enabling more personalized treatment strategies. Such advancements would significantly enhance clinical decision-making and optimize treatment outcomes for migraine patients. Finally, the use of multimodal MRI information and a combination of different machine learning algorithms may enhance the predictive performance of the models. By integrating data related to the structural and functional changes in the brain associated with migraine symptoms, more accurate predictions might be achieved.

There are several limitations to this meta-analysis. Firstly, the relatively small number of included studies limits the generalizability and robustness of the findings. Given the complexity of migraine, with its diverse symptoms and subtypes, a larger body of research is needed to provide a more comprehensive understanding of the disorder across different patient populations. Secondly, significant heterogeneity among the studies—stemming from variations in study designs, populations, migraine characteristics, and treatment approaches—necessitates caution when drawing clinical conclusions. This inconsistency underscores the importance of standardized methodologies in future research to enhance comparability and reliability. Thirdly, the quality of the included studies was generally low, which may have introduced bias into the meta-analysis. The inconsistent reporting of migraine symptoms and their relationship to the prediction models could have affected the overall validity of the findings. Fourthly, none of the studies included external validation to assess the predictive models, limiting the generalizability and robustness of the findings. Without external validation, the models may be overfitted to specific datasets, reducing their reliability in real-world clinical settings. Future research should prioritize external validation across diverse populations and settings to confirm the robustness and clinical applicability of predictive models. Finally, a key limitation of this review is the limited number of studies on newer treatments, particularly anti-CGRP antibodies. Additionally, a significant proportion of the studies included in this review focused on predicting response to NSAIDs or sumatriptan, both of which are cost-effective and easy-to-use treatments. While valuable, predicting response for such treatments may be less critical than for higher-cost or more complex therapies, where outcome prediction is more clinically significant. This review emphasizes the need for further research into prediction models for newer and more expensive treatments to better guide individualized care.

Conclusion

This systematic review and meta-analysis provide a comprehensive evaluation of statistical and machine learning models for predicting treatment response in migraine patients. While the pooled AUC values suggest promising predictive potential, the high risk of bias and significant heterogeneity among studies underscore the need for caution in interpreting the findings. To advance this field, future research should focus on actionable strategies, including the development of models with high-quality, comprehensive, and multicenter datasets to improve generalizability, rigorous external validation to ensure reliability in diverse clinical settings, and adherence to standardized reporting guidelines like TRIPOD + AI. The integration of multimodal MRI data and systematic exploration of interactions between migraine symptoms and treatment modalities are critical for enhancing predictive performance. Moreover, efforts should be directed toward establishing uniform methodologies for defining outcome measures, sample size calculations, and handling missing data. These improvements will facilitate the creation of robust and clinically applicable prediction models that can guide personalized treatment strategies, ultimately optimizing patient outcomes and reducing healthcare burdens.

Data availability

No datasets were generated or analysed during the current study.

References

  1. Sutherland HG, Jenkins B, Griffiths LR (2024) Genetics of migraine: complexity, implications, and potential clinical applications. Lancet Neurol 23(4):429–446. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S1474-4422(24)00026-7

    Article  PubMed  CAS  Google Scholar 

  2. Collaborators GBDNSD (2024) Global, regional, and national burden of disorders affecting the nervous system, 1990–2021: a systematic analysis for the global burden of disease study 2021. Lancet Neurol 23(4):344–381. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S1474-4422(24)00038-3

    Article  Google Scholar 

  3. Salomon JA, Vos T, Hogan DR, Gagnon M, Naghavi M, Mokdad A et al (2012) Common values in assessing health outcomes from disease and injury: disability weights measurement study for the global burden of disease study 2010. Lancet 380(9859):2129–2143. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S0140-6736(12)61680-8

    Article  PubMed  PubMed Central  Google Scholar 

  4. Ashina M, Buse DC, Ashina H, Pozo-Rosich P, Peres MFP, Lee MJ et al (2021) Migraine: integrated approaches to clinical management and emerging treatments. Lancet 397(10283):1505–1518. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S0140-6736(20)32342-4

    Article  PubMed  Google Scholar 

  5. Pozo-Rosich P, Dolezil D, Paemeleire K, Stepien A, Stude P, Snellman J et al (2024) Early use of erenumab vs nonspecific oral migraine preventives: the APPRAISE randomized clinical trial. JAMA Neurol 81(5):461–470. https://doiorg.publicaciones.saludcastillayleon.es/10.1001/jamaneurol.2024.0368

    Article  PubMed  PubMed Central  Google Scholar 

  6. Liu L, Chen Q, Zhao L, Lyu T, Nie L, Miao Q et al (2024) Acupuncture plus topiramate placebo versus topiramate plus sham acupuncture for the preventive treatment of chronic migraine: a single-blind, double-dummy, randomized controlled trial. Cephalalgia 44(6):3331024241261080. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/03331024241261080

    Article  PubMed  Google Scholar 

  7. Hodaj H, Payen JF, Mick G, Vercueil L, Hodaj E, Dumolard A et al (2022) Long-term prophylactic efficacy of transcranial direct current stimulation in chronic migraine. A randomised, patient-assessor blinded, sham-controlled trial. Brain Stimul 15(2):441–453. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.brs.2022.02.012

    Article  PubMed  Google Scholar 

  8. Karlsson WK, Ostinelli EG, Zhuang ZA, Kokoti L, Christensen RH, Al-Khazali HM et al (2024) Comparative effects of drug interventions for the acute management of migraine episodes in adults: systematic review and network meta-analysis. BMJ 386:e080107. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj-2024-080107

    Article  PubMed  PubMed Central  Google Scholar 

  9. Chiang CC, Schwedt TJ, Dumkrieger G, Wang L, Chao CJ, Ouellette HA et al (2024) Advancing toward precision migraine treatment: predicting responses to preventive medications with machine learning models based on patient and migraine features. Headache 64(9):1094–1108. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/head.14806

    Article  PubMed  Google Scholar 

  10. Malmartel A, Ravaud P, Tran VT (2023) A methodological framework allows the identification of personomic markers to consider when designing personalized interventions. J Clin Epidemiol 159:235–245. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jclinepi.2023.06.003

    Article  PubMed  Google Scholar 

  11. Russo A, Silvestro M, Tessitore A, Tedeschi G (2019) Functional neuroimaging biomarkers in migraine: diagnostic, prognostic and therapeutic implications. Curr Med Chem 26(34):6236–6252. https://doiorg.publicaciones.saludcastillayleon.es/10.2174/0929867325666180406115427

    Article  PubMed  CAS  Google Scholar 

  12. Liu J, Mu J, Chen T, Zhang M, Tian J (2019) White matter tract microstructure of the mPFC-amygdala predicts interindividual differences in placebo response related to treatment in migraine patients. Hum Brain Mapp 40(1):284–292. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/hbm.24372

    Article  PubMed  Google Scholar 

  13. Zhu B, Coppola G, Shoaran M (2019) Migraine classification using somatosensory evoked potentials. Cephalalgia 39(9):1143–1155. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/0333102419839975

    Article  PubMed  Google Scholar 

  14. Yin T, Sun G, Tian Z, Liu M, Gao Y, Dong M et al (2020) The spontaneous activity pattern of the middle occipital gyrus predicts the clinical efficacy of acupuncture treatment for migraine without aura. Front Neurol 11:588207. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fneur.2020.588207

    Article  PubMed  PubMed Central  Google Scholar 

  15. Liu L, Lyu TL, Fu MY, Wang LP, Chen Y, Hong JH et al (2022) Changes in brain connectivity linked to multisensory processing of pain modulation in migraine with acupuncture treatment. Neuroimage Clin 36:103168. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.nicl.2022.103168

    Article  PubMed  PubMed Central  Google Scholar 

  16. Wei HL, Yu YS, Wang MY, Zhou GP, Li J, Zhang H et al (2024) Exploring potential neuroimaging biomarkers for the response to non-steroidal anti-inflammatory drugs in episodic migraine. J Headache Pain 25(1):104. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-024-01812-4

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Sidey-Gibbons JAM, Sidey-Gibbons CJ (2019) Machine learning in medicine: a practical introduction. BMC Med Res Methodol 19(1):64. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12874-019-0681-4

    Article  PubMed  PubMed Central  Google Scholar 

  18. Lee MJ, Park BY, Cho S, Park H, Chung CS (2019) Cerebrovascular reactivity as a determinant of deep white matter hyperintensities in migraine. Neurology 92(4):e342–e50. https://doiorg.publicaciones.saludcastillayleon.es/10.1212/WNL.0000000000006822

    Article  PubMed  CAS  Google Scholar 

  19. Ferroni P, Zanzotto FM, Scarpato N, Spila A, Fofi L, Egeo G et al (2020) Machine learning approach to predict medication overuse in migraine patients. Comput Struct Biotechnol J 18:1487–1496. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.csbj.2020.06.006

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Zhao Z, Zhao M, Yang T, Li J, Qin C, Wang B et al (2024) Identifying significant structural factors associated with knee pain severity in patients with osteoarthritis using machine learning. Sci Rep 14(1):14705. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41598-024-65613-0

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Chen W, Zhao H, Feng Q, Xiong X, Ke J, Dai L et al (2024) Disrupted gray matter connectome in vestibular migraine: a combined machine learning and individual-level morphological brain network analysis. J Headache Pain 25(1):177. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-024-01861-9

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Tholke P, Mantilla-Ramos YJ, Abdelhedi H, Maschke C, Dehgan A, Harel Y et al (2023) Class imbalance should not throw you off balance: choosing the right classifiers and performance metrics for brain decoding with imbalanced data. NeuroImage 277:120253. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.neuroimage.2023.120253

    Article  PubMed  Google Scholar 

  23. Feng M, Zhang Y, Wen Z, Hou X, Ye Y, Fu C et al (2022) Early fractional amplitude of low frequency fluctuation can predict the efficacy of transcutaneous auricular vagus nerve stimulation treatment for migraine without aura. Front Mol Neurosci 15:778139. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fnmol.2022.778139

    Article  PubMed  PubMed Central  Google Scholar 

  24. Gonzalez-Martinez A, Pagán J, Sanz‐García A, García‐Azorín D, Rodríguez‐Vico JS, Jaimes A et al (2022) Machine‐learning‐based approach for predicting response to anti‐calcitonin gene‐related peptide (CGRP) receptor or ligand antibody treatment in patients with migraine: a multicenter Spanish study. Eur J Neurol 29(10):3102–3111. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/ene.15458

    Article  PubMed  Google Scholar 

  25. Lu Z-X, Dong B-Q, Wei H-L, Chen L (2022) Prediction and associated factors of non-steroidal anti-inflammatory drugs efficacy in migraine treatment. Front Pharmacol 13:1002080. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fphar.2022.1002080

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Moons KG, de Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG et al (2014) Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med 11(10):e1001744. https://doiorg.publicaciones.saludcastillayleon.es/10.1371/journal.pmed.1001744

    Article  PubMed  PubMed Central  Google Scholar 

  27. Debray TP, Damen JA, Snell KI, Ensor J, Hooft L, Reitsma JB et al (2017) A guide to systematic review and meta-analysis of prediction model performance. BMJ 356:i6460. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj.i6460

    Article  PubMed  Google Scholar 

  28. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS et al (2019) PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 170(1):51–58. https://doiorg.publicaciones.saludcastillayleon.es/10.7326/M18-1376

    Article  PubMed  Google Scholar 

  29. Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS et al (2019) PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 170(1):W1–W33. https://doiorg.publicaciones.saludcastillayleon.es/10.7326/M18-1377

    Article  PubMed  Google Scholar 

  30. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B et al (2024) TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 385:e078378. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj-2023-078378

    Article  PubMed  PubMed Central  Google Scholar 

  31. Knapp G, Hartung J (2003) Improved tests for a random effects meta-regression with a single covariate. Stat Med 22(17):2693–2710. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/sim.1482

    Article  PubMed  Google Scholar 

  32. Langan D, Higgins JPT, Jackson D, Bowden J, Veroniki AA, Kontopantelis E et al (2019) A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Res Synth Methods 10(1):83–98. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/jrsm.1316

    Article  PubMed  Google Scholar 

  33. Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ 327(7414):557–560. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj.327.7414.557

    Article  PubMed  PubMed Central  Google Scholar 

  34. R Core Team (2019) R: A language and environment for statistical computing

  35. de Jong VMT, Moons KGM, Eijkemans MJC, Riley RD, Debray TPA (2021) Developing more generalizable prediction models from pooled studies and large clustered data sets. Stat Med 40(15):3533–3559. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/sim.8981

    Article  PubMed  PubMed Central  Google Scholar 

  36. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD et al (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372:n71. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj.n71

    Article  PubMed  PubMed Central  Google Scholar 

  37. Wei H-L, Xu C-H, Wang J-J, Zhou G-P, Guo X, Chen Y-C et al (2022) Disrupted functional connectivity of the Amygdala predicts the efficacy of non-steroidal anti-inflammatory drugs in migraineurs without Aura. Front Mol Neurosci 15:819507. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fnmol.2022.819507

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Wei HL, Yang Q, Zhou GP, Chen YC, Yu YS, Yin X et al (2024) Abnormal causal connectivity of anterior cingulate cortex-visual cortex circuit related to nonsteroidal anti‐inflammatory drug efficacy in migraine. Eur J Neurosci 59(3):446–456. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/ejn.16219

    Article  PubMed  CAS  Google Scholar 

  39. Yang X-J, Liu L, Xu Z-L, Zhang Y-J, Liu D-P, Fishers M et al (2020) Baseline brain Gray Matter volume as a predictor of acupuncture outcome in treating migraine. Front Neurol 11:111. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fneur.2020.00111

    Article  PubMed  PubMed Central  Google Scholar 

  40. Wu J-W, Lai P-Y, Chen Y-L, Wang Y-F, Lirng J-F, Chen S-T et al (2022) The use of neuroimaging for predicting sumatriptan treatment response in patients with migraine. Front Neurol 13:798695. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fneur.2022.798695

    Article  PubMed  PubMed Central  Google Scholar 

  41. Aramruang T, Malhotra A, Numthavaj P, Looareesuwan P, Anothaisintawee T, Dejthevaporn C et al (2024) Prediction models for identifying medication overuse or medication overuse headache in migraine patients: a systematic review. J Headache Pain 25(1):165. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-024-01874-4

    Article  PubMed  PubMed Central  Google Scholar 

  42. Woldeamanuel YW, Cowan RP (2022) Computerized migraine diagnostic tools: a systematic review. Ther Adv Chronic Dis 13:20406223211065235. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/20406223211065235

    Article  PubMed  PubMed Central  Google Scholar 

  43. Leone M (2024) Globalisation of the pharmacological treatment of migraine. Lancet Neurol 23(12):1179–1180. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S1474-4422(24)00427-7

    Article  PubMed  CAS  Google Scholar 

  44. Cohen SE, Zantvoord JB, Wezenberg BN, Bockting CLH, van Wingen GA (2021) Magnetic resonance imaging for individual prediction of treatment response in major depressive disorder: a systematic review and meta-analysis. Transl Psychiatry 11(1):168. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41398-021-01286-x

    Article  PubMed  PubMed Central  Google Scholar 

  45. Vieira S, Liang X, Guiomar R, Mechelli A (2022) Can we predict who will benefit from cognitive-behavioural therapy? A systematic review and meta-analysis of machine learning studies. Clin Psychol Rev 97:102193. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.cpr.2022.102193

    Article  PubMed  Google Scholar 

  46. Meinke C, Lueken U, Walter H, Hilbert K (2024) Predicting treatment outcome based on resting-state functional connectivity in internalizing mental disorders: a systematic review and meta-analysis. Neurosci Biobehav Rev 160:105640. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.neubiorev.2024.105640

    Article  PubMed  Google Scholar 

  47. Zhang D, Huang X, Su W, Chen Y, Wang P, Mao C et al (2020) Altered lateral geniculate nucleus functional connectivity in migraine without aura: a resting-state functional MRI study. J Headache Pain 21(1):17. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-020-01086-6

    Article  PubMed  PubMed Central  Google Scholar 

  48. Karsan N, Goadsby PJ (2023) Neuroimaging in the pre-ictal or premonitory phase of migraine: a narrative review. J Headache Pain 24(1):106. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-023-01617-x

    Article  PubMed  PubMed Central  Google Scholar 

  49. Churpek MM, Yuen TC, Winslow C, Meltzer DO, Kattan MW, Edelson DP (2016) Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit Care Med 44(2):368–374. https://doiorg.publicaciones.saludcastillayleon.es/10.1097/CCM.0000000000001571

    Article  PubMed  PubMed Central  Google Scholar 

  50. Pavlou M, Ambler G, Qu C, Seaman SR, White IR, Omar RZ (2024) An evaluation of sample size requirements for developing risk prediction models with binary outcomes. BMC Med Res Methodol 24(1):146. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12874-024-02268-5

    Article  PubMed  PubMed Central  Google Scholar 

  51. Zantvoort K, Nacke B, Gorlich D, Hornstein S, Jacobi C, Funk B (2024) Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions. NPJ Digit Med 7(1):361. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41746-024-01360-w

    Article  PubMed  PubMed Central  Google Scholar 

  52. Steyerberg EW, Bleeker SE, Moll HA, Grobbee DE, Moons KG (2003) Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J Clin Epidemiol 56(5):441–447. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/s0895-4356(03)00047-7

    Article  PubMed  Google Scholar 

  53. Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JD (2005) Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol 58(5):475–483. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jclinepi.2004.06.017

    Article  PubMed  Google Scholar 

  54. Riley RD, Debray TPA, Collins GS, Archer L, Ensor J, van Smeden M et al (2021) Minimum sample size for external validation of a clinical prediction model with a binary outcome. Stat Med 40(19):4230–4251. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/sim.9025

    Article  PubMed  Google Scholar 

  55. Sajjadian M, Lam RW, Milev R, Rotzinger S, Frey BN, Soares CN et al (2021) Machine learning in the prediction of depression treatment outcomes: a systematic review and meta-analysis. Psychol Med 51(16):2742–2751. https://doiorg.publicaciones.saludcastillayleon.es/10.1017/S0033291721003871

    Article  PubMed  Google Scholar 

  56. Barboza F, Kimura H, Altman E (2017) Machine learning models and bankruptcy prediction. Expert systems with Applications 83:405–17. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.eswa.2017.04.006

  57. Pfob A, Lu SC, Sidey-Gibbons C (2022) Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison. BMC Med Res Methodol 22(1):282. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12874-022-01758-8

    Article  PubMed  PubMed Central  Google Scholar 

  58. Alanazi HO, Abdullah AH, Qureshi KN (2017) A critical review for developing accurate and dynamic predictive models using machine learning methods in medicine and health care. J Med Syst 41(4):69. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10916-017-0715-6

    Article  PubMed  Google Scholar 

  59. Pecorelli F, Lujan S, Lenarduzzi V, Palomba F, De Lucia A (2022) On the adequacy of static analysis warnings with respect to code smell prediction. Empir Softw Eng 27(3):64. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10664-022-10126-5

    Article  PubMed  PubMed Central  Google Scholar 

  60. Boutron I, Page MJ, Higgins JP, Altman DG, Lundh A, Hróbjartsson A et al (2019) Considering bias and conflicts of interest among the included studies. Cochrane Handbook for Systematic Reviews of Interventions 177–204

  61. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG et al (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338:b2393. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj.b2393

    Article  PubMed  PubMed Central  Google Scholar 

  62. Vergouwe Y, Royston P, Moons KG, Altman DG (2010) Development and validation of a prediction model with missing predictor data: a practical approach. J Clin Epidemiol 63(2):205–214. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jclinepi.2009.03.017

    Article  PubMed  Google Scholar 

  63. Collins GS, Reitsma JB, Altman DG, Moons KG (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 350:g7594. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj.g7594

    Article  PubMed  CAS  Google Scholar 

  64. Steyerberg EW, Harrell FE Jr., Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD (2001) Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 54(8):774–781. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/s0895-4356(01)00341-9

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

None.

Funding

This research is supported by the China National Natural Science Foundation (82374575), Beijing Natural Science Foundation (7232270), Capital’s Funds for Health Improvement and Research (CFH2024-2–2235), and Out-standing Young Talents Program of Capital Medial University (B2207).

Author information

Authors and Affiliations

Authors

Contributions

Qiuyi Chen and Jiarun Zhang: Methodology, Validation, Investigation, Resources, Data curation, Writing– original draft, Writing– review & editing, Visualization, Supervision. Lu Liu: Conceptualization, Methodology, Validation, Investigation, Resources, Data curation, Writing– review & editing, Visualization. Bin Li: Conceptualization, Validation, Investigation, Visualization. Baicheng Cao and Yihan Hu: Validation, Investigation, Resources, Data curation. Yazhuo Kong: Methodology, Resources. All authors reviewed the manuscript.

Corresponding author

Correspondence to Lu Liu.

Ethics declarations

Human ethics and consent to participate

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Q., Zhang, J., Cao, B. et al. Prediction models for treatment response in migraine: a systematic review and meta-analysis. J Headache Pain 26, 32 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-025-01972-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s10194-025-01972-x

Keywords