Imitation learning

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Imitation learning is a paradigm in reinforcement learning, where an agent learns to perform a task by supervised learning from expert demonstrations. It is also called learning from demonstration and apprenticeship learning.[1][2][3]

It has been applied to underactuated robotics,[4] self-driving cars,[5][6][7] quadcopter navigation,[8] helicopter aerobatics,[9] and locomotion.[10][11]

Approaches

[edit | edit source]

Expert demonstrations are recordings of an expert performing the desired task, often collected as state-action pairs (ot*,at*).

Behavior Cloning

[edit | edit source]

Behavior Cloning (BC) is the most basic form of imitation learning. Essentially, it uses supervised learning to train a policy πθ such that, given an observation ot, it would output an action distribution πθ(|ot) that is approximately the same as the action distribution of the experts.[12]

BC is susceptible to distribution shift. Specifically, if the trained policy differs from the expert policy, it might find itself straying from expert trajectory into observations that would have never occurred in expert trajectories.[12]

This was already noted by ALVINN, where they trained a neural network to drive a van using human demonstrations. They noticed that because a human driver never strays far from the path, the network would never be trained on what action to take if it ever finds itself straying far from the path.[5]

DAgger

[edit | edit source]

Dagger (Dataset Aggregation)[13] improves on behavior cloning by iteratively training on a dataset of expert demonstrations. In each iteration, the algorithm first collects data by rolling out the learned policy πθ. Then, it queries the expert for the optimal action at* on each observation ot encountered during the rollout. Finally, it aggregates the new data into the datasetDD{(o1,a1*),(o2,a2*),...,(oT,aT*)}and trains a new policy on the aggregated dataset.[12]

Decision transformer

[edit | edit source]
Architecture diagram of the decision transformer

The Decision Transformer approach models reinforcement learning as a sequence modelling problem.[14] Similar to Behavior Cloning, it trains a sequence model, such as a Transformer, that models rollout sequences (R1,o1,a1),(R2,o2,a2),,(Rt,ot,at),where Rt=rt+rt+1++rT is the sum of future reward in the rollout. During training time, the sequence model is trained to predict each action at, given the previous rollout as context:(R1,o1,a1),(R2,o2,a2),,(Rt,ot)During inference time, to use the sequence model as an effective controller, it is simply given a very high reward prediction R, and it would generalize by predicting an action that would result in the high reward. This was shown to scale predictably to a Transformer with 1 billion parameters that is superhuman on 41 Atari games.[15]

Other approaches

[edit | edit source]

See [16][17] for more examples.

[edit | edit source]

Inverse Reinforcement Learning (IRL) learns a reward function that explains the expert's behavior and then uses reinforcement learning to find a policy that maximizes this reward.[18] Recent works have also explored multi-agent extensions of IRL in networked systems.[19]

Generative Adversarial Imitation Learning (GAIL) uses generative adversarial networks (GANs) to match the distribution of agent behavior to the distribution of expert demonstrations.[20] It extends a previous approach using game theory.[21][16]

See also

[edit | edit source]

Further reading

[edit | edit source]
  • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

References

[edit | edit source]
  1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  3. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  4. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  5. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  7. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  8. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  9. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  10. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  11. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  12. ^ a b c CS 285 at UC Berkeley: Deep Reinforcement Learning. Lecture 2: Supervised Learning of Behaviors
  13. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  14. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  15. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  16. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  17. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  18. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  19. ^ V. S. Donge, B. Lian, F. L. Lewis and A. Davoudi, "Multiagent Graphical Games With Inverse Reinforcement Learning," in IEEE Transactions on Control of Network Systems, vol. 10, no. 2, pp. 841-852, June 2023, doi:10.1109/TCNS.2022.3210856.
  20. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  21. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).