Loading...
Search for: q-learning
0.007 seconds

    Robust attitude control of an agile aircraft using improved Q-Learning

    , Article Actuators ; Volume 11, Issue 12 , 2022 ; 20760825 (ISSN) Zahmatkesh, M ; Emami, S. A ; Banazadeh, A ; Castaldi, P ; Sharif University of Technology
    MDPI  2022
    Abstract
    Attitude control of a novel regional truss-braced wing (TBW) aircraft with low stability characteristics is addressed in this paper using Reinforcement Learning (RL). In recent years, RL has been increasingly employed in challenging applications, particularly, autonomous flight control. However, a significant predicament confronting discrete RL algorithms is the dimension limitation of the state-action table and difficulties in defining the elements of the RL environment. To address these issues, in this paper, a detailed mathematical model of the mentioned aircraft is first developed to shape an RL environment. Subsequently, Q-learning, the most prevalent discrete RL algorithm, will be... 

    Towards a bounded-rationality model of multi-agent social learning in games

    , Article 2010 10th International Conference on Intelligent Systems Design and Applications, ISDA'10, Cairo, 29 November 2010 through 1 December 2010 ; 2010 , Pages 142-148 ; 9781424481354 (ISBN) Hemmati, M ; Sadati, N ; Nili, M ; Sharif University of Technology
    2010
    Abstract
    This paper deals with the problem of multi-agent learning of a population of players, engaged in a repeated normal-form game. Assuming boundedly-rational agents, we propose a model of social learning based on trial and error, called "social reinforcement learning". This extension of well-known Q-learning algorithm, allows players within a population to communicate and share their experiences with each other. To illustrate the effectiveness of the proposed learning algorithm, a number of simulations on the benchmark game of "Battle of Sexes" has been carried out. Results show that supplementing communication to the classical form of Q-learning, significantly improves convergence speed towards... 

    Optimizing Replenishment and Pricing in a Vendor-managed Inventory Supply Chain When Customers Negotiate

    , M.Sc. Thesis Sharif University of Technology Bagherirad, Sonia (Author) ; Modarres Yazdi, Mohammad (Supervisor)
    Abstract
    In this study vendor-managed inventory policy in supply chains is investigated and a formulation is developed to optimize replenishments from vendor to retailer and also price for negotiator customers. As a result, we consider a two echelon supply chain containing a vendor and a retailer managed according to VMI policy. The goal is to find the optimal replenishment from vendor to retailer at the beginning of the month and by using dynamic programming approach to maximize the supply chain profit. Demand is nondeterministic and it is supposed Poisson distribution with unknown parameter. We will consider Gamma distribution for this parameter which its parameters are learning in dynamic... 

    Dynamic Pricing of Charter Flight Tickets with Learning

    , M.Sc. Thesis Sharif University of Technology Mehrdar, Atabak (Author) ; Modarres, Mohammad (Supervisor)
    Abstract
    In this thesis, an approach is developed to obtain an optimal pricing policy for chartered flights. In order to do so, a model within the framework of dynamic programming is presented and its main structure is also analyzed. Since in real world cases the dimension of this model happens to be very large, a solution method is developed by “Q Learning” technique. This is an appropriate approach in approximate dynamic programming and reinforcement learning. Analysis is carried out under two different assumptions regarding demand, namely “linear-deterministic” and probabilistic demand for transition probabilities. An exact solution for deterministic demand case is developed. Furthermore, for... 

    Optimal Control of Unknown Interconnected Systems via Distributed Learning

    , M.Sc. Thesis Sharif University of Technology Farjadnasab, Milad (Author) ; Babazadeh, Maryam (Supervisor)
    Abstract
    This thesis addresses the problem of optimal distributed control of unknown interconnected systems. In order to deal with this problem, a data-driven learning framework for finding the optimal centralized and the suboptimal distributed controllers has been developed via convex optimization.First of all, the linear quadratic regulation (LQR) problem is formulated into a nonconvex optimization problem. Using Lagrangian duality theories, a semidefinite program is then developed that requires information about the system dynamics. It is shown that the optimal solution to this problem is independent of the initial conditions and represents the Q-function, an important concept in reinforcement... 

    Reinforcement Learning Approach in Self-Assembly Systems to Acquire Desired Structures

    , M.Sc. Thesis Sharif University of Technology Ravari, Amir Hossein (Author) ; Bagheri Shouraki, Saeed (Supervisor)
    Abstract
    Self-Assembly (SA) plays a critical role in the formation of different phenomena in nature. This phenomenon can be defined as an arrangement of meaningful patterns with the aggregate behavior of simpler structures. One of the examples of Self-Assembly can be considered of the formation of ice crystals from ice molecules. Previous works mainly focus on graph grammar and self-assembly in fully observable environments. These algorithms mainly consist of two main stages: first, constructing simpler structures and then joining these simpler structures to form a complex structure. The challenges of the previous work can be considered as the necessity of a central controller in the formation of... 

    Gait analysis of a six-legged walking robot using fuzzy reward reinforcement learning

    , Article 13th Iranian Conference on Fuzzy Systems, IFSC 2013 ; August , 2013 , Page(s): 1 - 4 ; ISBN: 9781479912278 Shahriari, M ; Khayyat, A. A ; Sharif University of Technology
    IEEE Computer Society  2013
    Abstract
    Free gait becomes necessary in walking robots when they come to walk over discontinuous terrain or face some difficulties in walking. A basic gait generation strategy is presented here using reinforcement learning and fuzzy reward approach. A six-legged (hexapod) robot is implemented using Q-learning algorithm. The learning ability of walking in a hexapod robot is explored considering only the ability of moving its legs and using a fuzzy rewarding system telling whether and how it is moving forward. Results show that the hexapod robot learns to walk using the presented approach properly  

    A new method for discovering subgoals and constructing options in reinforcement learning

    , Article Proceedings of the 5th Indian International Conference on Artificial Intelligence, IICAI 2011 ; 2011 , Pages 441-450 ; 9780972741286 (ISBN) Davoodabadi, M ; Beigy, H ; SIT; Saint Mary's University; EKLaT Research; Infobright ; Sharif University of Technology
    Abstract
    In this paper the problem of automatically discovering subtasks and hierarchies in reinforcement learning is considered. We present a novel method that allows an agent to autonomously discover subgoals and create a hierarchy from actions. Our method identifies subgoals by partitioning local state transition graphs. Options constructed for reaching these subgoals are added to action choices and used for accelerating the Q-Learning algorithm. Experimental results show significant performance improvements, especially in the initial learning phase  

    Traffic flow control using multi-agent reinforcement learning

    , Article Journal of Network and Computer Applications ; Volume 207 , 2022 ; 10848045 (ISSN) Zeynivand, A ; Javadpour, A ; Bolouki, S ; Sangaiah, A. K ; Ja'fari, F ; Pinto, P ; Zhang, W ; Sharif University of Technology
    Academic Press  2022
    Abstract
    One of the technologies based on information technology used today is the VANET network used for inter-road communication. Today, many developed countries use this technology to optimize travel times, queue lengths, number of vehicle stops, and overall traffic network efficiency. In this research, we investigate the critical and necessary factors to increase the quality of VANET networks. This paper focuses on increasing the quality of service using multi-agent learning methods. The innovation of this study is using artificial intelligence to improve the network's quality of service, which uses a mechanism and algorithm to find the optimal behavior of agents in the VANET. The result... 

    A multi-agent deep reinforcement learning framework for algorithmic trading in financial markets

    , Article Expert Systems with Applications ; Volume 208 , 2022 ; 09574174 (ISSN) Shavandi, A ; Khedmati, M ; Sharif University of Technology
    Elsevier Ltd  2022
    Abstract
    Algorithmic trading based on machine learning is a developing and promising field of research. Financial markets have a complex, uncertain, and dynamic nature, making them challenging for trading. Some financial theories, such as the fractal market hypothesis, believe that the markets behave based on the collective psychology of investors who trade with different investment horizons and interpretations of information. Accordingly, a multi-agent deep reinforcement learning framework is proposed in this paper to trade on the collective intelligence of multiple agents, each of which is an expert trader on a specific timeframe. The proposed framework works in a hierarchical structure in which... 

    Cancer Simulation with Markov Decision Process

    , M.Sc. Thesis Sharif University of Technology Zarepour, Fariborz (Author) ; Habibi, Jafar (Supervisor)
    Abstract
    Cancer is refer to a class of diseases that create as the result of abnormal growth of cells and invasion of them to normal cells of human body, and annually cause the considerable percentage of death in the world. Because cancer can be considered as a complex system, various models presented to modeling and simulation of the behavior of it, using of different methods such as cellular automata, agent-based, game theory and other methods. Multi-agent simulation models as a special kind of agent-based models, is a method that used to simulate some real-world phenomena that usually contains many different components and interact using different and complex ways. Since the cells are located in... 

    Optimal Design and Intelligent Control of Polymer Electrolyte Membrane Fuel Cell Stack

    , M.Sc. Thesis Sharif University of Technology Ahmadi, Mohammad Reza (Author) ; Boroushaki, Mehrdad (Supervisor)
    Abstract
    We present here an analysis of controlling the Polymer Electrolyte Membrane Fuel Cells (PEMFCs) using the Q-learning algorithm, the most widely-known among reinforcement learning (RL) techniques. The method is to train the controller to guide and sustain the fuel cell power output in the 2.5 kW mark by way of manipulating elements of the reaction subsystem including the fuel cell current, the relative humidity, and the anode/cathode pressures. As the Q-learning algorithm need be implemented within a fuel cell simulation environment, the mathematical model known as Amphlett steady-state model of the PEM fuel cell was employed. The semi-empirical nature of this model necessitates the... 

    Path planning of modular robots on various terrains using Q-learning versus optimization algorithms

    , Article Intelligent Service Robotics ; Volume 10, Issue 2 , 2017 , Pages 121-136 ; 18612776 (ISSN) Haghzad Klidbary, S ; Bagheri Shouraki, S ; Sheikhpour Kourabbaslou, S ; Sharif University of Technology
    Springer Verlag  2017
    Abstract
    Self-reconfigurable modular robots (SRMRs) have recently attracted considerable attention because of their numerous potential applications in the real world. In this paper, we draw a comprehensive comparison among five different algorithms in path planning of a novel SRMR system called ACMoD through an environment comprised of various terrains in a static condition. The contribution of this work is that the reconfiguration ability of ACMoD has been taken into account. This consideration, though raises new algorithmic challenges, equips the robot with new capability to pass difficult terrains rather than bypassing them, and consequently the robot can achieve better performance in terms of... 

    A constrained multi-item EOQ inventory model for reusable items: Reinforcement learning-based differential evolution and particle swarm optimization

    , Article Expert Systems with Applications ; Volume 207 , 2022 ; 09574174 (ISSN) Fallahi, A ; Amani Bani, E ; Akhavan Niaki, S. T ; Sharif University of Technology
    Elsevier Ltd  2022
    Abstract
    The growing environmental concerns, governmental regulations, and significant cost savings are the primary motivations for companies to consider the reuse and recovery of products in their inventory system. The previous research ignored several realistic features of reusable items inventory systems, such as the presence of multiple products and operational constraints. For the first time, this paper presents a new multiproduct economic order quantity inventory model for an inventory system of reusable products. The goal of the model is to determine the optimal replenishment quantity and reuse quantity of each item so that the system's total cost is minimized. Several operational constraints... 

    Model-free LQR design by Q-function learning

    , Article Automatica ; Volume 137 , 2022 ; 00051098 (ISSN) Farjadnasab, M ; Babazadeh, M ; Sharif University of Technology
    Elsevier Ltd  2022
    Abstract
    Reinforcement learning methods such as Q-learning have shown promising results in the model-free design of linear quadratic regulator (LQR) controllers for linear time-invariant (LTI) systems. However, challenges such as sample-efficiency, sensitivity to hyper-parameters, and compatibility with classical control paradigms limit the integration of such algorithms in critical control applications. This paper aims to take some steps towards bridging the well-known classical control requirements and learning algorithms by using optimization frameworks and properties of conic constraints. Accordingly, a new off-policy model-free approach is proposed for learning the Q-function and designing the...