A - SOL:AVIONICS APPLICATIONS OF REINFORCEMENT LEARNING ( / / )

COMMERCE BUSINESS DAILY ISSUE OF MARCH 20.1995 PSA#1306

WL/AAKR, Bldg. 7 2530 C Street Wright Patterson AFB, OH 45433-7607

A -- AVIONICS APPLICATIONS OF REINFORCEMENT LEARNING SYSTEMS. PART 1 OF 3 SOL BAA 95-03-AAK DUE 050495 POC Dawn M. Ross, Contract Negotiator, 513-255-6908. 17. A--INTRODUCTION: AVIONICS APPLICATIONS OF REINFORCEMENT LEARNING SYSTEMS, BAA #95-03-AAK. This is a Broad Agency Announcement (BAA). Wright Laboratory (WL/AAAT) is interested in receiving proposals (Technical and Cost) on the research effort described below. Proposals in response to this BAA shall be submitted by 04 May 95, 1500 hours Eastern Daylight Savings Time, to Wright Laboratory, Directorate of R&D Contracting, Attn: Dawn M. Ross, WL/AAKR, Bldg 7, Area B, 2530 C Street, Wright-Patterson AFB OH 45433-7607. This is an unrestricted solicitation. Small businesses are encouraged to propose on all or any part of this solicitation. Teaming arrangements between private industry and universities will also be considered and are encouraged. Proposals submitted shall be in accordance with this announcement. Proposal receipt after the cut off date and time specified herein shall be treated in accordance with restrictions of FAR 52.215-10. A copy of this provision may be obtained from the contracting point of contact. There will be no other solicitation issued in regard to this requirement. Offerors should be alert for any BAA amendments, including those that may permit subsequent submission of proposal dates. Offerors should request a copy of the WL Guide entitled ''PRDA and BAA Guide for Industry.'' This Guide was specifically designed to assist offerors in understanding the PRDA/BAA proposal process. Copies may be requested from the contracting officer cited in this announcement B--REQUIREMENTS: (1) BACKGROUND: Wright Laboratory has a patent pending on a version of machine intelligence called advantage updating which is guaranteed mathematically to find the optimal solution to any Markovian control problem. Advantage updating is the most efficient form of reinforcement learning known: it is four times less sensitive to noise and five orders of magnitude faster than its predecessor, Q-learning. More importantly, advantage updating is the first reinforcement learning system to be applicable to high-dimensional, nonlinear (even discontinuous), non deterministic plants with continuous state and action spaces. Solutions need not be known a priori, although knowledge of the problem or its solution can be incorporated before or during learning. Given sufficient sensory information about the environment, advantage updating requires only experience to derive directly the optimal control for each state. A model of the plant is not needed and the learning system does not construct a model of the plant as it learns. The objective of this effort is to apply advantage updating or advantage learning to an avionic control problem. Transition of this technology to industry is also a consideration: this effort will bring together the Wright Laboratory machine intelligence specialists who developed and continue to investigate advantage updating and learning and scientists and engineers in industry who specialize in avionic applications. Primary avionic applications are synthetic aperture radar, sensor management, Kalman filtering, communication, sensor fusion, processor scheduling, and automatic target recognition. (2) SCOPE: The objective of this effort is to apply advantage updating to an avionics problem. (3) TECHNICAL REQUIREMENTS: Suitable avionic problems will be selected by the companies that respond to this Broad Agency Announcement. Selection of a function approximation system (e.g., neural network, polynomial, table, cerebellar model articulation controller) for maintaining learned control information is a primary technical issue: some may be too slow either for learning or for response time for the avionic problem selected, others may require too much memory or processing, still others may represent continuous or high-dimensional state and action spaces too coarsely to be effective. Key milestones include: (a) Requirements Study: implement advantage updating or learning in software and select a suitable function approximation system to maintain data 3QFY96, (b) Technology Assessment: test and evaluate the system on a simple problem related to the selected avionics problem 1QFY97, (c) develop a model of the selected avionics problem 4QFY97, (d) apply advantage updating to the model 2QFY98, (e) evaluate solution 3QFY98, (f) demonstration: 4QFY98, (g) documentation: the contractor shall document the results of the Requirements Study and the Technology Assessment in an Interim Technical Report, the Final Report (1QFY99) shall document the entire program, including data from the Interim Report, (i) software: software developed under this program falls into four categories: learning algorithms, benchmarks, tools, and simulations. Learning algorithms shall be written in C or C++ for Macintosh workstations. End of Part 1. (0075)

Loren Data Corp. http://www.ld.com (SYN# 0002 19950317\A-0002.SOL)

A - Research and Development Index Page