MOLECULAR PATHWAY PREDICTION USING REINFORCEMENT LEARNING FOR THERAPEUTIC TARGET DISCOVERY
Authors :
Yang Yi and Hossein Karami
Address :
School of Computing and Information Systems, Singapore Management University, 188065, Singapore
Sharif University of Technology Department of Electrical Engineering Tehran, Iran
Abstract :
A major hurdle for therapeutic target identification is the intricacy of biological pathways underpinning disease processes. Conventional methods often overlook the dynamic and interdependent character of these routes. The paper presents RL-MPTT, a new framework that uses reinforcement learning (RL) to forecast changes in molecular pathways (MP) and find important therapeutic targets (TT). Molecular networks derived from freely accessible pathway and interaction datasets are navigated and optimized by the RL-MPTT using RL agents. Nodes in the networks stand for molecules and edges for interactions; the networks were built using curated datasets such as Reactome. To achieve therapeutic goals like route stabilization or the suppression of disease-associated activity, the RL agent engages with these networks by mimicking interventions, such as stimulating or inhibiting particular nodes. Genomics data, such as gene expression profiles, were included to improve biological integrity and guarantee pathway relevance. The RL-MPTT also uses computational and experimental validation to verify the biological plausibility of anticipated targets. The results show that the RL model is predictively powerful since it reliably finds important routes corresponding to recognized treatment targets. Furthermore, this method finds new targets that could be the basis for future therapeutic development in neurological disorders and cancer. RL-MPTT shows how reinforcement learning may change the game for finding therapeutic targets. It can make predictions about molecular route dynamics much more accurate and scalable.
Keywords :
Molecular Pathways, Therapeutic Target Discovery, Reinforcement Learning, Pathway Modeling, Omics Data Integration, Drug Discovery.
1.Introduction
A complex network of molecular pathways orchestrates cellular functions and responses in both health and disease. The networks of interconnected molecules, comprising genes, proteins, and metabolites, drive metabolic processes and signal transduction [1]. Among the many diseases attributed to the dysregulation of these pathways are cancer, neurological disorders, and metabolic disorders [2]. Therefore, the development of new therapies depends on our ability to understand and target these molecular pathways that have emerged as a central focus of therapeutic research. The complexity of the pathways, marked by feedback loops and dynamic interactions, complicates finding therapeutic targets [3]. The dynamic and interdependent nature of molecular pathways is usually lost with the traditional methods of investigation that depend substantially on static models and experimental methodologies [4]. Knowledge of existing medications, their therapeutic targets, and comparative data on other bioactive substances and targets will facilitate drug discovery [5]. The therapeutic potential of therapies that target these pathways is huge. Still, in most real-life dynamic systems, these intervention points are not defined straightforwardly because the systems involve numerous interdependencies among themselves [6].
To save time and money, machine learning has been used to make predictions about interactions based on complicated, high-dimensional data. This could lead to discovering new experimental validation leads [7]. There is no one uniform subset of AI that ML algorithms and methods comprise. Both supervised and unsupervised learning are prevalent in ML algorithms. In supervised learning, the labels of new samples are learned from training samples that already have them. Typically, unlabeled samples are used by unsupervised learning to identify patterns [8]. Using biological targets to screen for big molecules with the potential to treat disease is known as reverse pharmacology. The screening process can provide numerous hits in the cell supply, and animals have been tested for adequacy [9]. To be effective, a medication target must be therapeutically modulable and relevant to the disease phenotype. The development of highthroughput biomedical data has been driven by ongoing biological and technical advancements, opening new possibilities for the early detection of possible therapeutic targets. Nevertheless, efficient methods that may generate precise forecasts for target identification are necessary for analyzing such extensive multidimensional biological data. To investigate the ever-increasing multi-omics data and find possible treatment targets, AI/ML has arisen as a potent tool [10].
To tackle the intricacy of dynamic chemical pathways, RL-MPTT provides a novel computational framework. Biomolecules (genes, proteins) are represented as nodes in the model of molecular pathways, while interactions (e.g., regulatory links) are represented as edges. Curated route databases such as Reactome and STRING are used to build these networks. Then, omics data, such as gene expression and proteomics, are used to refine them and ensure they are biologically relevant. An RL agent mimics interventions—such as activating, inhibiting, or altering nodes—in the RL framework, which views the molecular network as the environment. A reward system representing therapeutic goals, such as reducing disease activity or stabilising pathways, directs the agent's behavior. The agent picks up the ability to pinpoint therapeutically important nodes via repeated simulations. Computational and experimental approaches are employed to validate the predicted targets to ensure the biological relevance and plausibility of the identified therapeutic targets.
The main contribution of the paper is
The accuracy, biological plausibility, and usefulness of the targets chosen for intervention are ensured by a comprehensive validation pathway comprising both computational simulation and experimental approaches.
The RL-MPTT framework overcomes the shortcomings of conventional methods by using the power of reinforcement learning in making accurate and scalable predictions of molecular pathway dynamics. The RL framework explores the intervention strategies methodically to find the critical treatment targets, while incorporation of omics data ensures pathway models are biologically meaningful. With its computational and experimental validation, RL-MPTT is a game-changing strategy for discovering new treatments for difficult diseases by reducing the complexity of molecular pathways.
The paper's structure is based on research in therapeutic target identification and molecular pathway modelling, which is discussed in Section 2. Section 3 details the RL-MPTT framework, including its data sources and RL algorithms. Section 4 lays out the important results and new goals, assesses and compares the framework's performance, and Section 5 addresses its consequences, limitations, and future directions.
2. Literature review
Nayarisseri, A., et al. [11] proposed a combination of artificial intelligence, big data, and machine-learning approaches in precision medicine and drug discovery to enhance therapeutic development. It tried to handle the complexities of the drug discovery process to make personalized medicine more effective. The outcomes achieved indeed included the development of optimized therapies and advanced machine learning algorithms that demonstrated quite a lot of potential in drug design. However, the study pointed out limitations, including the risk of overfitting in QSAR models, which may impair the prediction of new compounds with diverse chemical structures.
Piazza I. et al. [12] presented a machine learning-based chemo-proteomic approach called LiP-Quant to improve drug target identification in complex proteomes. The method was proposed to face the challenges of unravelling the mechanisms of action for small molecules in drug development. Results: This work showed that LiP-Quant can identify drug targets with a positive predictive value of 30%, higher than the traditional methods. However, it was limited by the dependence on specific training datasets and possible variability in results under different experimental conditions, which may limit the generalizability of the findings to broader applications.
Pun, F. W. et al. [13] addressed the role of artificial intelligence in therapeutic target discovery and proposed to harness AI in making drug discovery more efficient in analyzing huge datasets and complex biological networks on which identifying drug targets conventionally relies—something that is inherently slow and costly. Implementation-wise, it led to AI-identified targets experimentally validated, while many in the field now head for clinical trials. However, it had limitations in data accuracy, depending on published data, ethical concerns, and a limited ability to shorten clinical trial timelines. AI is highly promising for addressing complex diseases, process streamlining, and cost reduction, but it must be validated and integrated into the existing frameworks with care.
Singh S. et al. [14] introduced an AI to revolutionize pharmacological research with machine learning, deep learning, and natural language processing. It aimed at overcoming the inefficiencies in drug discovery, data analysis complexities, and clinical trial delays. AI remarkably accelerated the discovery of new drugs, improved target validation, optimized clinical trial designs, and advanced personalized medicine. While many of these feats have been achieved, data privacy concerns, algorithmic biases, ethical dilemmas, and difficulties in integrating AI into clinical workflows persist. These limitations only underscore the need for strong frameworks, ethical guidelines, and careful implementation to tap the full potential of AI in pharmacology
Husnain A. et al. [15] proposed an application of machine learning in revolutionizing drug discovery through improved compound screening, biological activity prediction, and optimization of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) profiling. ADMET targeted inefficiencies in traditional drug development, a time- and resourceconsuming process involving high failure rates. ML implementation improved accuracy in hit prediction, innovative de novo drug design, and increased efficiency in clinical trials. However, limitations still existed in dependence on high-quality datasets, challenges in model interpretability, and ethical concerns regarding data privacy and biases. Despite these challenges, ML showed great potential in accelerating drug development and enabling personalized treatments.
Gupta R. et al. [16] described integrating artificial intelligence and deep learning techniques in drug discovery. The motivation was to avoid some of the obstacles of traditional methods, such as ineffectiveness, huge expenses, and long-time use. Solutions entail AI approaches in virtual screening, peptide synthesis, and predictive modelling for drug design. Results show better accuracy in less time and lower costs in processes including target identification, toxicity prediction, and drug repositioning. Still, high computational demands, the requirement for large datasets, and the tendency of models to overfit all called for more work on algorithms and computational structures.
Setiya, A. et al. [17] introduced MolToxPred, a machine learning-based stacked model for predicting small molecule toxicity. It is designed to make up for the inefficiency, high cost, and ethical concerns in traditional ways of toxicity prediction. This model combined random forest, LightGBM, and multi-layer perceptron as base classifiers, with logistic regression as the meta-classifier, optimized via Bayesian methods. It achieved AUROC scores of 87.76% on the test set and 88.84% on external validation, which is better than base classifiers and outperforms the existing tools. Nevertheless, its limitations entailed the dependence on large, curated datasets and the difficulties with the interpretability of results; hence, further refinement in data diversity and model transparency was stressed.
Huang, D., et al. [18] proposed Deep-AVPiden, a deep learning-based approach using temporal convolutional networks (TCNs) to improve the classification and discovery of antiviral peptides (AVPs). The proposed method had overcome several challenges in traditional methods with poor scalability and low efficiency in processing biological sequence data. It outperformed other existing classifiers in terms of accuracy, precision, AUROC, and its optimized version, Deep-AVPiden-DS, reduced computational costs. However, it also faces some challenges, such as high volume data requirements and dependence on resource-intensive infrastructure, which demand further optimization to improve scalability and efficiency
3. Proposed Methodology
a. Dataset
Reactome can capture biological processes by cataloguing the molecules (small molecules, DNA, RNA, and proteins) and their interactions. According to this molecular perspective, incorporating microbially produced proteins, changing the functionality of human proteins, or changing the expression levels of functionally normal human proteins are the three mechanistic origins of human disease pathways. Infectious diseases are modeled in Reactome as interactions between microbes and humans and the events that follow from these interactions. Adding the altered protein to a new or different reaction, an expansion to the 'regular' route, represents the presence of variant proteins and their link to disease-specific biological processes. It is also possible to indirectly capture diseases caused by proteins carrying out their regular tasks at aberrant rates. Many mutant alleles code for proteins that still perform their intended functions but with altered stabilities or catalytic efficiencies. This causes normally occurring processes to progress to an abnormal degree. When alternative sources of expression or rate data are linked with pathway annotations, the phenotypes of these disorders can be elucidated [19].
b. The RL-MPTT workflow
The RL-MPTT framework uses reinforcement learning (RL) to identify therapeutic targets by optimizing molecular pathways. It builds molecular networks from datasets like Reactome, incorporating genomic data for biological relevance. RL agents navigate and manipulate these networks by simulating interventions, such as activating or inhibiting specific molecular nodes, to achieve therapeutic goals like stabilizing pathways or suppressing disease activity. The framework predicts key therapeutic targets, validating them through computational and experimental methods. RL-MPTT shows strong predictive power, identifying known and novel targets for diseases like cancer and neurological disorders, offering a scalable approach to therapeutic target discovery. It involves the following steps. Figure 1 shows the work process of the RL-MPTT method.

Figure 1 depicts the workflow of RL-MPTT, including major steps in data collection, molecular network construction, RL interaction, and validation. It integrates omics data, including gene expression and proteomics, with curated datasets such as Reactome and STRING, adding layers of biological relevance in pathway modeling. The RL agent interacts with the network under a simulation of interventions—virtual treatments—that mimic activation or inhibition of nodes, guided by feedback as a reward to maximize therapeutic outcomes. A dual validation of the identified targets is performed through computational and experimental approaches to ensure their plausibility. This workflow culminates in identifying known and novel therapeutic targets, demonstrating both the scalability and the applicability of such an approach to complex diseases. This workflow covers the steps.
1) Data collection
Molecular Network Construction: The datasets containing information on the biological pathways and molecular interactions, such as Reactome, can be used to construct molecular networks. Molecules in the network are represented as nodes; interactions among the molecules are edges. The pathway analysis can be done based on these networks. Figure 2 shows the molecular structure of mezlocillin.

Genomic Data Integration (omics): Gene expression profiles and other genomic data are integrated into the molecular networks so that biological relevance can be maximized. The integration makes sure that the pathways do, indeed, depict the molecular activities underlying an enhanced model in comprehending the diseases and the discovery of therapeutic targets. Omics data is retrieved from the dataset [20].
2) Reinforcement Learning (RL)
The RL agent is built to interact with the molecular network in a way that mimics therapeutic interventions. The goal of RL-MPTT is to modify the molecular network in a way that fosters therapeutic effects by either stabilizing disease-related pathways or repressing destructive activities. The agent can interact with nodes, representing molecules, in the network by stimulating, activating, inhibiting, or suppressing specific molecules. The RL agent takes an action based on its perception of the network's current state, which is influenced by molecular interactions and gene expression data. The actions taken by the agent elicit some changes in the network and are judged for their efficacy given therapeutic goals. Molecules represent nodes, while interactions between molecules form edges. The RL agent must maximize the interactions between molecules by exciting some nodes, i.e., molecules, or by inhibiting them to achieve therapeutic goals of stabilizing pathways or repressing disease-related functions.
State Representation
The state 𝑆𝑡 at time step 𝑡 represents the network's status at that time. It may contain various molecular features, including gene expression levels, protein activation states, or concentrations of molecules. In a reinforcement learning framework, this state is the input to the RL agent. This can be represented in Equation 1.

where 𝑥𝑖(𝑡) is the activity or expression level of the molecule 𝑖 at time 𝑡, and 𝑛 is the total number of molecules in the network.
Action Representation
Action Taken 𝐴𝑡 The RL agent's action corresponds to an intervention on a specific node or a set of nodes. The action can either activate or inhibit some molecular pathways, represented as changes in the state of nodes. The action could be represented as a vector that changes the state, as shown in equation 2.

where 𝑎𝑖 denotes the action on molecule 𝑖, such as activation (stimulating) or inhibition.
Reward Function
The reward 𝑅𝑡 is the measurement of the consequence of the action concerning the therapeutic goal. For example, if the goal is to prohibit an activity that brings on a disease, then a positive reward comes when this activity lowers, and negative if the disease proceeds. The reward function should guide the agent to beneficial interventions according to the therapeutic goals. It can be modelled as in equation 3.

where 𝑓(𝑆𝑡 , 𝐴𝑡) is a function that computes the reward based on the network's state and the agent's action. 𝑆𝑡 , 𝐴𝑡 are obtained from equations 1 and 2.
Policy and Q-Learning Algorithm
The RL agent seeks to learn a policy 𝜋 that maximises cumulative rewards over time. The policy specifies the optimal action 𝐴𝑡 to take, given the current state 𝑆𝑡 . This is obtained with the Q-learning algorithm in which the agent learns to estimate the value of taking a particular action in each state. The Q-value, 𝑄(𝑆𝑡 , 𝐴𝑡 ) represents the expected cumulative reward of acting 𝐴𝑡 in state 𝑆𝑡 and following the optimal policy. The update rule for Q-learning is shown in Equation 4.

where 𝛼 is the learning rate, 𝛾 is the discount factor, controlling the importance of future rewards, 𝐦𝐚𝐱 𝐴′ 𝑄(𝑆𝑡+1, 𝐴′) is the maximum 𝑄-value at the next state 𝑆𝑡+1, representing the optimal future reward. Over time, the agent updates its 𝑄-values based on the experiences it accumulates, refining its strategy to achieve the therapeutic goal.

Figure 3 shows the flow chart of the RL algorithm. RLMPTT shows how an RL agent can interact with molecular networks to simulate therapeutic interventions. The agent, with its actions updated continuously, taking rewards into account, learns to optimize molecular pathways for therapeutic purposes. The approach can be applied to multiple diseases, enabling the discovery of new therapeutic targets while enhancing the accuracy of drug discovery efforts.
Pathway Manipulation Goals: The pathway manipulation goals represent the specific therapeutic goals the RL agent should achieve by manipulating molecular pathways. In RLMPTT, the agent can take interventions to activate, inhibit, or change the activity of some nodes of the molecular pathways model. The main therapeutic goals are
i. Stabilizing Disease-Related Pathways: Certain diseases, such as cancer or neurodegenerative disorders, result from unstable or dysregulated molecular pathways. The RL agent then focuses on interventions that will restore normal functioning or balance of the path. This stabilization may involve the correction of aberrant signalling, re-establishing proper feedback loops, or diminishing the impact of dysfunctional genes or proteins to restore cellular homeostasis.
ii. Suppressing disease-associated activities: In other cases, disease pathways become overactive, such as cancer cells with uncontrolled growth signalling. Then, the RL agent zeros in on specific nodes or interactions in such pathways to suppress excessive activities and slow or arrest disease progression. This can comprise suppressing pro-tumorigenic signaling, reducing inflammation in autoimmune diseases, and dealing with neurodegenerative pathways.
3) Pathway interaction with RL agent
In RL-MPTT, the RL agent can interact with molecular networks to achieve therapeutic objectives, such as the stabilization of disease-related pathways or the suppression of diseaseassociated activities. This constitutes a very crucial part of the reinforcement learning process, where the agent learns by trial and error to optimize its actions. In molecular pathway manipulation, the RL agent operates within a dynamic environment represented by a state (𝑆) at time 𝑡. This state reflects the current configuration of the molecular network, such as the activity levels of genes, proteins, or other biomarkers, denoted as 𝑆𝑡 = {𝑠1, 𝑠2, . . . , 𝑠𝑛}, where each 𝑠_𝑖 corresponds to a specific biomolecule or pathway activity. The action (A) taken by the RL agent represents an intervention, such as activating a gene, inhibiting a protein, or modifying a biomolecule's expression. The set of possible interventions forms the action space (A). The agent receives a reward (R), a scalar value that indicates how well the action 𝐴𝑡 helps achieve therapeutic goals, such as stabilizing the pathway or suppressing disease-related activity. A higher reward signals a favourable outcome. After selecting an action based on the current state 𝑆𝑡 , the agent transitions to a new state 𝑆𝑡+1 and receives the corresponding reward 𝑅𝑡 . This process can be framed as a Markov Decision Process (MDP), where the goal is to maximize the expected cumulative reward over time, guiding the agent towards optimal therapeutic interventions
The agent selects the actions using an epsilon-greedy strategy, choosing most of the time the action associated with the highest Q-value to guarantee exploration over the entire state space by trying out some random action every now and then.

where 𝜖 is a small exploration factor.
Policy Gradient Methods (Policy-based Method): In policy gradient methods, the agent learns a policy directly as 𝜋𝜃(𝑠), where 𝜋𝜃(𝑠) is a probability distribution over actions parameterized by 𝜃. The agent selects actions according to this policy to optimize the parameters. 𝜃 to maximize expected cumulative rewards. The policy update is now followed by gradient ascent. The policy gradient theorem gives the update rule for 𝜃 is shown in equation 6.

where 𝐽(𝜃) is the objective function representing the expected cumulative reward and the gradient 𝛻𝜃 indicates how the policy parameters should be adjusted to maximize the expected reward.
Action Selection in Molecular Pathways: The RL agent's action selection process in molecular pathways relates to selecting interventions that can change specific network nodes (genes, proteins, etc.). Activating a gene may raise its expression level, potentially reestablishing a disrupted signalling pathway. This can inhibit a protein by reducing its overactive signalling, like blocking growth or promoting a pathway in cancer. The action is applied to the molecular pathway, and a new state is evaluated for therapeutic efficacy. If the intervention results in a desired outcome—for example, a reduction in disease activity or restoration of pathway stability—the agent is rewarded positively. Over time, the agent learns which interventions most effectively achieve these therapeutic goals.
4) Prediction of Therapeutic Targets using Reinforcement Learning
Therapeutic target prediction in RL-MPTT is a matter of identifying the key nodes, the biomolecules such as genes or proteins, that can be manipulated to achieve therapeutic goals. Thus, the RL framework goes through the molecular network to predict, for example, how changes in pathway dynamics lead toward therapeutic benefits in disease suppression or pathway stabilization.
i. Identification of Key Targets: The agent in RL-MPTT is mainly looking for therapeutic targets, nodes in the molecular pathway whose modulation will lead to therapeutic outcomes. The targets are usually ranked based on their ability to impact disease course or overall pathway stability. Key targets can correspond to already known therapeutic targets or appear as new targets not previously identified with traditional approaches in drug discovery. The RL agent chooses actions which either activate or inhibit some nodes in the pathway. Through repeated interactions with the network, it learns the effects of such interventions; the reward mechanism guides the agent toward interventions that produce desirable therapeutic outcomes. This can be achieved through equation 7.

where 𝐺𝑡 is the cumulative reward from the time step 𝑡 to the terminal state. 𝛾 is the discount factor that prioritizes future rewards. 𝑅𝑡 is the reward obtained at each time step based on the therapeutic outcome of the agent's intervention.
ii. Pathway Dynamics Prediction: Pathway dynamics can be predicted as a sequence of transitions over time in a Markov Decision Process (MDP). In the process, each state 𝑆𝑡 denotes the state of the pathway at the time 𝑡, and the action 𝐴𝑡 represents an intervention at that time. The agent's task is to predict the evolution of the pathway given different interventions and to choose interventions that will lead to therapeutic stabilization or suppression of disease activity. Formally, the transition from state 𝑆𝑡 to state 𝑆𝑡+1 based on action 𝐴𝑡 is governed by a transition function 𝑃(𝑆𝑡+1|𝑆𝑡 ,𝐴𝑡), which predicts the next state of the molecular pathway after applying the intervention as in equation 8.

where 𝑃(𝑆𝑡+1 |𝑆𝑡 , 𝐴𝑡 ) is the probability of transitioning from state 𝑆𝑡 to state 𝑆𝑡+1 after action 𝐴𝑡 is taken, and 𝑃 ′ (𝑆𝑡+1|𝑆𝑡 , 𝐴𝑡) is estimated using the RL agent based on the historical data from the molecular network. Figure 4 shows the pathway dynamic prediction.

As the agent intervenes in the pathway and observes effects on the dynamics of the pathway, it learns little by little the influence of some interventions throughout the disease or the re-establishment of the normal functioning of the pathway. An RL agent could predict what happens in cancers if the pro-oncogenic genes are inhibited or tumour suppressor genes are activated. It forecasts how the network will behave and possibly affords a therapeutic intervention. In neurological diseases, it may mimic the effects of manipulating neuronal signalling pathways toward neuroprotection or reduced neurodegeneration.
4. Results and Discussion
a) Performance Metrics
In this section, the RL-MPTT framework is compared with such methods as AI+ML [11], AI for Target ID [13], and MolToxPred [17] using metrics such as accuracy, scalability, and predictive power. AI+ML is moderately scalable and moderately accurate, AI for Target ID performs better on analyses on large datasets but becomes prone to data quality problems, while high-accuracy toxicity prediction is performed by MolToxPred using curated datasets. RLMPTT significantly outperforms these methods with respect to accuracy, near-linear scalability, and a strong predictive power for novel therapeutic targets.
Accuracy: Accuracy measures the proportion of correctly identified targets or predictions out of the total predictions made by a model. This can be calculated in equation 9.

where True Positives (TP) refer to correctly predicted therapeutic targets, True Negatives (TN) refer to correctly predicted non-targets or irrelevant predictions, and Total Predictions refer to the sum of all predictions, including true positives, true negatives, false positives (FP), and false negatives (FN). Thus, the equation can be rewritten as Equation 10


Table 1 compares the therapeutic target prediction methods: AI+ML, AI for Target ID, MolToxPred, and RL-MPTT. AI+ML achieves an accuracy of 80% but suffers from the risks of overfitting. AI for Target ID has an accuracy of 85% and performs well on large datasets but is quality-dependent. MolToxPred has high accuracy in predicting small molecule toxicity with an accuracy of 87.75%, which requires curated datasets. RL-MPTT outperforms other methods with an accuracy of 97%, thanks to reinforcement learning coupled with the integration of omics, increasing biological relevance and scalability. These findings further emphasize the superiority of RL-MPTT in correctly identifying therapeutic targets by combining dynamic pathway modelling with state-of-the-art machine-learning methods.
Scalability: Scalability can be defined as the ability of the methods to handle increases in data or complexity without experiencing a significant loss of performance. In molecular pathway identification, it can be quantified by how the time complexity 𝑇 can be expressed as a function of dataset size 𝑛, and network complexity 𝑚, denoted as in equation 11.

where 𝑇 is the total computational cost or time, 𝑘 Constant reflecting the hardware or implementation efficiency, 𝑛 Number of nodes (e.g., molecules, genes) in the network and 𝑚 Number of edges (interactions between nodes). 𝑓(𝑛, 𝑚) refers to the function representing the complexity of the algorithm (e.g., 𝑂(𝑛 2 ) for quadratic, 𝑂(𝑛 ⋅ 𝑚) for linear scaling with interactions). For scalable methods, 𝑓(𝑛, 𝑚) grows slower (𝑂(𝑛) or 𝑂(𝑙𝑜𝑔𝑛)), while less scalable methods have steeper growth (𝑂(𝑛 3 )). Figure 5 shows the scalability analysis of the RL-MPTT method with the conventional methods.

Figure 5 shows the Scalability of RL-MPTT in comparison to traditional methods via parameters like computational complexity (T), network nodes (n), and edges (e). RL-MPTT reflects a good scalability with nearly linear growth about 𝑛,𝑂(𝑛), considering its reinforcement learning algorithms and omics data integration. Traditional approaches, though, reflect either quadratic or greater growth rates—𝑂(𝑛 2 ) or worse—and do not scale well with larger-sized data. Adding multi-dimensional omics data makes RL-MPTT more biologically relevant, while hardware efficiency (𝐶) further optimizes performance. All these make RLMPTT a robust and adaptive framework for dynamic molecular pathway modelling.
Predictive Power: RL-MPTT's predictive power depends on network complexity and parameter integration, following the model of logarithmic growth. It balances computational challenges well and achieves high accuracy by dynamically adapting to increasing complexity and incorporating multi-omics features into the analysis. Thus, it suits large-scale biological network analysis and novel target identification. Predictive power can be obtained in equation 12.


Table 2 Compares of four methods—AI+ML, AI for Target ID, MolToxPred, and RLMPTT—in terms of the predictive power of identifying therapeutic targets. AI+ML demonstrates moderate predictive power with limited experimental validation and high computational complexity. AI for Target ID shows strong performances on large datasets but cannot be exempted from clinical validation challenges. MolToxPred does an excellent job in toxicity prediction with AUROC scores >87%, which demands highly curated datasets for accuracy. RL-MPTT has the most predictive power by incorporating omics data and using reinforcement learning; hence, it precisely identifies novel and already-known targets at the cost of high computational demand.
5. Conclusion
The RL-MPTT is a reinforcement learning-based framework for addressing dynamic molecular pathway complexities in the discovery of therapeutic targets. Representing the pathways as networks of biomolecules and their interactions, this framework integrated omics data to drive biological relevance and guide the RL agent toward imitating intervention strategies—node activations and inhibitions. By iterative learning, the RL agent succeeded in finding critical therapeutic targets, including known and novel nodes, and showed a strong predictive power in diseases such as cancer and neurological disorders. Computational and experimental validation confirmed the accuracy and plausibility of these predictions, establishing RL-MPTT as a scalable and precise tool for drug discovery. Compared to traditional approaches, RL-MPTT significantly outperformed in metrics such as accuracy (97%) and scalability while dynamically adapting to complex biological networks. However, the framework's dependency on curated datasets and high computational requirements poses challenges for large-scale and diverse applications. Future work will focus on improving computational efficiency, integrating diverse multi-omics datasets, and developing more robust validation techniques to improve the scalability and applicability of the methods to a wider spectrum of diseases. RL-MPTT represents a major advance in therapeutic target discovery and opens a way forward to quicker, more precise identification of actionable targets for drug development and tailored therapeutics.
References :
[1]. Zamanitajeddin, N., Jahanifar, M., Bilal, M., Eastwood, M., & Rajpoot, N. (2024). Social network analysis of cell networks improves deep learning for prediction of molecular pathways and key mutations in colorectal cancer. Medical Image Analysis, 93, 103071.
[2]. Liu, J., Hua, Z., Liao, S., Li, B., Tang, S., Huang, Q., ... & Ding, X. (2024). Prediction of the active compounds and mechanism of Biochanin A in the treatment of Legg-Calvé-Perthes disease based on network pharmacology and molecular docking. BMC Complementary Medicine and Therapies, 24(1), 26.
[3]. Srinivasan, M., Gangurde, A., Chandane, A. Y., Tagalpallewar, A., Pawar, A., & Baheti, A. M. (2024). Integrating network pharmacology and in silico analysis deciphers Withaferin-A’s anti-breast cancer potential via hedgehog pathway and target network interplay. Briefings in Bioinformatics, 25(2), bbae032.
[4]. Rajaram, Sampath. "A Model for Real-Time Heart Condition Prediction Based on Frequency Pattern Mining and Deep Neural Networks."
[5]. Zhou, Y., Zhang, Y., Lian, X., Li, F., Wang, C., Zhu, F., ... & Chen, Y. (2022). Therapeutic target database update 2022: facilitating drug discovery with enriched comparative data of targeted agents. Nucleic acids research, 50(D1), D1398-D1407.
[6]. Wang, Y., Zhang, S., Li, F., Zhou, Y., Zhang, Y., Wang, Z., ... & Zhu, F. (2020). Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic acids research, 48(D1), D1031-D1041.
[7]. D’Souza, S., Prema, K. V., & Balaji, S. (2020). Machine learning models for drug–target interactions: current knowledge and future directions. Drug Discovery Today, 25(4), 748-756.
[8]. Patel, L., Shukla, T., Huang, X., Ussery, D. W., & Wang, S. (2020). Machine learning methods in drug discovery. Molecules, 25(22), 5277.
[9]. Dara, S., Dhamercherla, S., Jadav, S. S., Babu, C. M., & Ahsan, M. J. (2022). Machine learning in drug discovery: a review. Artificial intelligence review, 55(3), 1947-1999.
[10]. Vatansever, S., Schlessinger, A., Wacker, D., Kaniskan, H. Ü., Jin, J., Zhou, M. M., & Zhang, B. (2021). Artificial intelligence and machine learning‐aided drug discovery in central nervous system diseases: State‐of‐the‐arts and future directions. Medicinal research reviews, 41(3), 1427-1473.
[11]. Nayarisseri, A., Khandelwal, R., Tanwar, P., Madhavi, M., Sharma, D., Thakur, G., ... & Singh, S. K. (2021). Artificial intelligence, big data and machine learning approaches in precision medicine & drug discovery. Current drug targets, 22(6), 631-655.
[12]. Piazza, I., Beaton, N., Bruderer, R., Knobloch, T., Barbisan, C., Chandat, L., ... & Reiter, L. (2020). A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes. Nature Communications, 11(1), 4200.
[13]. Pun, F. W., Ozerov, I. V., & Zhavoronkov, A. (2023). AI-powered therapeutic target discovery. Trends in pharmacological sciences
[14]. Singh, S., Kumar, R., Payra, S., & Singh, S. K. (2023). Artificial intelligence and machine learning in pharmacological research: bridging the gap between data and drug discovery. Cureus, 15(8).
[15]. Husnain, A., Rasool, S., Saeed, A., & Hussain, H. K. (2023). Revolutionizing pharmaceutical research: harnessing machine learning for a paradigm shift in drug discovery. International Journal of Multidisciplinary Sciences and Arts, 2(2), 149-157.
[16]. Gupta, R., Srivastava, D., Sahu, M., Tiwari, S., Ambasta, R. K., & Kumar, P. (2021). Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Molecular diversity, 25, 1315-1360.
[17]. Setiya, A., Jani, V., Sonavane, U., & Joshi, R. (2024). MolToxPred: small molecule toxicity prediction using machine learning approach. RSC advances, 14(6), 4201-4220.
[18]. Huang, D., Yang, M., & Zheng, W. (2024). Integrating AI and deep learning for efficient drug discovery and target identification. Journal of AI-Powered Medical Innovations (International online ISSN 3078- 1930), 2(1), 44-63.
[19]. https://reactome.org/PathwayBrowser/#/R-HSA-1643685
[20]. https://www.proteinatlas.org/humanproteome/tissue/tissue+specific
[21]. https://ubi29.informatik.uni-siegen.de/usi/data_wesad.html