|
COMMERCE BUSINESS DAILY ISSUE OF MARCH 20.1995 PSA#1306WL/AAKR, Bldg. 7 2530 C Street Wright Patterson AFB, OH 45433-7607 A -- AVIONICS APPLICATIONS OF REINFORCEMENT LEARNING SYSTEMS. PART 1
OF 3 SOL BAA 95-03-AAK DUE 050495 POC Dawn M. Ross, Contract
Negotiator, 513-255-6908. 17. A--INTRODUCTION: AVIONICS APPLICATIONS OF
REINFORCEMENT LEARNING SYSTEMS, BAA #95-03-AAK. This is a Broad Agency
Announcement (BAA). Wright Laboratory (WL/AAAT) is interested in
receiving proposals (Technical and Cost) on the research effort
described below. Proposals in response to this BAA shall be submitted
by 04 May 95, 1500 hours Eastern Daylight Savings Time, to Wright
Laboratory, Directorate of R&D Contracting, Attn: Dawn M. Ross,
WL/AAKR, Bldg 7, Area B, 2530 C Street, Wright-Patterson AFB OH
45433-7607. This is an unrestricted solicitation. Small businesses are
encouraged to propose on all or any part of this solicitation. Teaming
arrangements between private industry and universities will also be
considered and are encouraged. Proposals submitted shall be in
accordance with this announcement. Proposal receipt after the cut off
date and time specified herein shall be treated in accordance with
restrictions of FAR 52.215-10. A copy of this provision may be obtained
from the contracting point of contact. There will be no other
solicitation issued in regard to this requirement. Offerors should be
alert for any BAA amendments, including those that may permit
subsequent submission of proposal dates. Offerors should request a copy
of the WL Guide entitled ''PRDA and BAA Guide for Industry.'' This
Guide was specifically designed to assist offerors in understanding the
PRDA/BAA proposal process. Copies may be requested from the contracting
officer cited in this announcement B--REQUIREMENTS: (1) BACKGROUND:
Wright Laboratory has a patent pending on a version of machine
intelligence called advantage updating which is guaranteed
mathematically to find the optimal solution to any Markovian control
problem. Advantage updating is the most efficient form of reinforcement
learning known: it is four times less sensitive to noise and five
orders of magnitude faster than its predecessor, Q-learning. More
importantly, advantage updating is the first reinforcement learning
system to be applicable to high-dimensional, nonlinear (even
discontinuous), non deterministic plants with continuous state and
action spaces. Solutions need not be known a priori, although knowledge
of the problem or its solution can be incorporated before or during
learning. Given sufficient sensory information about the environment,
advantage updating requires only experience to derive directly the
optimal control for each state. A model of the plant is not needed and
the learning system does not construct a model of the plant as it
learns. The objective of this effort is to apply advantage updating or
advantage learning to an avionic control problem. Transition of this
technology to industry is also a consideration: this effort will bring
together the Wright Laboratory machine intelligence specialists who
developed and continue to investigate advantage updating and learning
and scientists and engineers in industry who specialize in avionic
applications. Primary avionic applications are synthetic aperture
radar, sensor management, Kalman filtering, communication, sensor
fusion, processor scheduling, and automatic target recognition. (2)
SCOPE: The objective of this effort is to apply advantage updating to
an avionics problem. (3) TECHNICAL REQUIREMENTS: Suitable avionic
problems will be selected by the companies that respond to this Broad
Agency Announcement. Selection of a function approximation system
(e.g., neural network, polynomial, table, cerebellar model articulation
controller) for maintaining learned control information is a primary
technical issue: some may be too slow either for learning or for
response time for the avionic problem selected, others may require too
much memory or processing, still others may represent continuous or
high-dimensional state and action spaces too coarsely to be effective.
Key milestones include: (a) Requirements Study: implement advantage
updating or learning in software and select a suitable function
approximation system to maintain data 3QFY96, (b) Technology
Assessment: test and evaluate the system on a simple problem related to
the selected avionics problem 1QFY97, (c) develop a model of the
selected avionics problem 4QFY97, (d) apply advantage updating to the
model 2QFY98, (e) evaluate solution 3QFY98, (f) demonstration: 4QFY98,
(g) documentation: the contractor shall document the results of the
Requirements Study and the Technology Assessment in an Interim
Technical Report, the Final Report (1QFY99) shall document the entire
program, including data from the Interim Report, (i) software: software
developed under this program falls into four categories: learning
algorithms, benchmarks, tools, and simulations. Learning algorithms
shall be written in C or C++ for Macintosh workstations. End of Part 1.
(0075) Loren Data Corp. http://www.ld.com (SYN# 0002 19950317\A-0002.SOL)
A - Research and Development Index Page
|
|