Mobile ad-hoc network is an assortment of distinct attribute-based mobile devices that are autonomous and are cooperative in establishing communication. These nodes exploit wireless links for communication that causes injection of the adversaries in the network. Therefore, detection and mitigation of adversaries and anomalies in the network are mandatory to retain its performance. To strengthen this concept, in this project, a novel secure neighbor selection technique using recurrent reward-based learning is introduced. This proposed technique inherits the benefits of conventional routing and intelligent machine learning paradigm for classifying the states of the nodes based on their communication behavior. Thorough learning of the behavior of the nodes unanimously at all the hop-levels of communication enables establishing secure and consistent routing and transmission paths to the destination. The performance of the proposed technique is estimated using the metrics throughput, packet delivery ratio, and delay and detection ratio.