Machine Learning methods in Modeling Human learning 

(Psy 5993-034)

University of Minnesota, Fall Semester, 2008
http://www.schrater.org

Instructors:
Paul Schrater (schrater@umn.edu)
Adam Johnson

Meeting time : Friday 2-3:30pm
Place: Elliott Hall 204

Recent advances in machine learning provide a powerful set of new tools to understand human learning. Understanding the computational principles and fundamental problems faced by attempts to produce artificial agents provides a framework for developing models of human abilities. Because of its intrinsic importance to human behavior, learning is a central problem for researchers interested in development, neuroscience, cognition and behavior, and artificial intelligence. We will study three interrelated issues where cognitive scientists have begun using machine learning tools to study learning. In particular, we will look at the role of structure learning, causal analysis, and hierarchy in explaining difficult-to-model aspects of human learning.


Format: Discussion of journal articles led by seminar members. Students will prepare a term paper or term project on a related topic.


SEPT 12th

Planning and Acting in Partially Observable Stochastic Domains
http://www.eecs.harvard.edu/~avi/CS281r/F06/Papers/kaelbling-et-al-pomdp.ps

POMDP for dummies

and/or Chapter 3 of Sutton and Barto's book Reinforcement Learning: An Introduction
http://www.cs.ualberta.ca/%7Esutton/book/ebook/node27.html

Lecture slides

SEPT 19th

Daw et al. (2005) http://www.cns.nyu.edu/~daw/dnd05.pdf

Sutton and Barto's book - chapter 6 (http://www.cs.ualberta.ca/~sutton/book/ebook/node60.html)

Dearden et al. (1998) Bayesian Q-learning
(http://www.cs.berkeley.edu/~russell/papers/aaai98-exploration.ps ) pdf

SEPT 26th

Amy Kalia will lead
Niv Y, Joel D, Dayan P (2006) A normative perspective on motivation. Trends Cogn Sci
10(8):375–381. http://www.gatsby.ucl.ac.uk/~dayan/papers/njd2006.pdf

and Paul Schrater will lead
Niv Y, Daw ND, Dayan P (2005) How fast to work: Response vigor, motivation and tonic
dopamine. In: Advances in Neural Information Processing Systems 18. Cambridge, MA:
MIT Press. http://www.cns.nyu.edu/~daw/ndd05.pdf

OCT 3rd  REPRESENTATION OF REWARD -


 Paul Schrater will continue
Niv Y, Daw ND, Dayan P (2005) How fast to work: Response vigor, motivation and tonic
dopamine. In: Advances in Neural Information Processing Systems 18. Cambridge, MA:
MIT Press. http://www.cns.nyu.edu/~daw/ndd05.pdf

Adam will lead
Kakade S, Dayan P (2002) Dopamine: generalization and bonuses. Neural Networks 15:549–
599. http://ttic.uchicago.edu/~sham/papers/neuro/nn_da.pdf

OCT 10th  REPRESENTATION OF ACTION


and Paul Schrater will lead
Hierarchical Reinforcement Learning with the MAXQ Value Function
http://www.jair.org/media/639/live-639-1834-jair.pdf

background
Recent Advances in Hierarchical Reinforcement Learning

and Daniel Acuña will lead
Mehta, N., Ray, S., Tadepalli, P., Dietterich, T. (2008). Automatic Discovery and Transfer of MAXQ Hierarchies.
International Conference on Machine Learning (ICML-2008) http://pages.cs.wisc.edu/~sray/papers/maxq.icml08.pdf

OCT 17th  GENERALIZATION OF VALUE

Steve Damer will lead
Mahadevan, S. and  Maggioni, M. "Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes" ,
Journal of Machine Learning Research, pp. 2169-2231, vol. 8, 2007, MIT Press. http://www.cs.umass.edu/~mahadeva/papers/06-35.pdf

Arsen Bagyan will lead
Mannor, S., Menache, I., Hoze, A., and Klein, U. 2004. Dynamic abstraction in reinforcement learning via clustering. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.5975&rep=rep1&type=pdf

OCT 23rd  REPRESENTATION OF ENVIRONMENT


Brett Hemes will lead

Boutilier, C., Dearden, R., Goldszmidt, M. (1995). Exploiting Structure in Policy Construction.
In IJCAI 1104-1113 http://www.isi.edu/~blythe/cs541/Readings/spi.pdf

Guestin, C., Koller, D., Parr, R., and Venkataraman, S. (2003) Efficient Solution Algorithms for Factored MDPs,
 Journal of Arti¯cial Intelligence Research 19, 399-468. http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume19/guestrin03a.pdf


OCT 31st

Susan Park Anderson will lead
Talmi D, Seymour B, Dayan P, Dolan RJ (2008) Human pavlovian-instrumental transfer. J
Neurosci 28(2):360–368. URL http://dx.doi.org/10.1523/JNEUROSCI.4028-07.2008.


NOV 7th



Jeffrey Stott will lead
Tanaka SC, Balleine BW, O’Doherty JP (2008) Calculating consequences: brain sys-
tems that encode the causal effects of actions. J Neurosci 28(26):6750–6755. URL
http://dx.doi.org/10.1523/JNEUROSCI.1808-08.2008.


NOV 14th

Paul Schrater will lecture on Causal models

    J. Pearl, "Graphs, Causality, and Structural Equation Models" 
UCLA Cognitive Systems Laboratory, Technical Report (R-253), June 1998.
Socioligical Methods and Research, Vol. 27, No. 2, 226-284, November 1998.
http://ftp.cs.ucla.edu/pub/stat_ser/R253.pdf

Glymour, C., Learning, prediction and causal Bayes nets, Trends Cogn. Sci. 7 (2003), pp. 43–48. pdf

NOV 21st

Paul Schrater  lectured on Causal models  lectureslides

NOV 28th

THANKSGIVING!


DEC 5th

Adam Steiner will lead
Model uncertainty in classical conditioning
A. Courville, N.D. Daw, G. Gordon, and D.S. Touretzky
Advances in NeuralInformation Processing 16, MIT Press, Cambridge, MA, 2005.
http://www.cns.nyu.edu/~daw/cdgt03.pdf
Adam Johnson will lead
Tse D, Langston RF, Kakeyama M, Bethus I, Spooner PA, Wood ER, Witter MP, Morris RGM
(2007) Schemas and memory consolidation. Science 316(5821):76–82. http://www.sciencemag.org/cgi/content/abstract/316/5821/76

DEC 12th

Arsen Bagyan & Paul will lead
Kemp, C., & Tenenbaum, J. B. (2008). The discovery of structural form.
Proceedings of the National Academy of Sciences. 105(31), 10687-10692. http://www.psy.cmu.edu/~ckemp/papers/kempt08.pdf

Chris Kallie will lead
if itme:  Lu, H., Rojas, R., Beckers, T., & Yuille, A. (2008). Sequential causal learning in humans and rats. 
Proceedings of the Twenty-ninth Annual Conference of the Cognitive Science Society. [PDF]

Background reading

Human learning - concepts and key results

Causal learning - theoretical framework

J. Pearl, "Graphs, Causality, and Structural Equation Models" 
UCLA Cognitive Systems Laboratory, Technical Report (R-253), June 1998.
Socioligical Methods and Research, Vol. 27, No. 2, 226-284, November 1998.
http://ftp.cs.ucla.edu/pub/stat_ser/R253.pdf

JUDEA PEARL - CAUSALITY http://bayes.cs.ucla.edu/BOOK-2K/

Sequential decision making - theoretical framework


Planning and Acting in Partially Observable Stochastic Domains
http://www.eecs.harvard.edu/~avi/CS281r/F06/Papers/kaelbling-et-al-pomdp.ps

POMDP for dummies

Recent Advances in Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning with the MAXQ Value Function
http://www.jair.org/media/639/live-639-1834-jair.pdf


Tentative Reading List


Bray S, Rangel A, Shimo jo S, Balleine B, O’Doherty JP (2008) The neural mechanisms underly-
ing the influence of pavlovian cues on human decision making. J Neurosci 28(22):5861–5866.
URL http://dx.doi.org/10.1523/JNEUROSCI.0897-08.2008.

Boutilier, C., Dearden, R., Goldszmidt, M. (1995). Exploiting Structure in Policy Construction.
In IJCAI 1104-1113 http://www.isi.edu/~blythe/cs541/Readings/spi.pdf

Cohen JD, McClure SM, Yu AJ (2007) Should i stay or should i go? how the human brain
manages the trade-off between exploitation and exploration. Philos Trans R Soc Lond B Biol
Sci 362(1481):933–942. URL http://dx.doi.org/10.1098/rstb.2007.2098.

Colwill RM, Rescorla RA (1990) Evidence for the hierarchical structure of instrumental learn-
ing. Animal Learning and Behavior 18(1):71–82.  pdf

Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dor-
solateral striatal systems for behavioral control. Nature Neuroscience 8(12):1704–1711.  pdf

Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans.
Nature 441(7095):876–879.  http://www.nature.com/nature/journal/v441/n7095/abs/nature04766.html

Glymour, C., Learning, prediction and causal Bayes nets, Trends Cogn. Sci. 7 (2003), pp. 43–48. pdf

Guestin, C., Koller, D., Parr, R., and Venkataraman, S. (2003) Efficient Solution Algorithms for Factored MDPs,
 Journal of Arti¯cial Intelligence Research 19, 399-468. http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume19/guestrin03a.pdf

Hagmayer, Y. et al. Causal reasoning through intervention.  http://else.econ.ucl.ac.uk/papers/uploaded/199.pdf
In Causal Learning: Psychology, Philosophy, and Computation (Gopnik, A. and Schulz, L., eds), Oxford University Press.

Kakade S, Dayan P (2002) Dopamine: generalization and bonuses. Neural Networks 15:549–
599. http://ttic.uchicago.edu/~sham/papers/neuro/nn_da.pdf

Kemp, C., & Tenenbaum, J. B. (2008). The discovery of structural form.
Proceedings of the National Academy of Sciences. 105(31), 10687-10692. http://www.psy.cmu.edu/~ckemp/papers/kempt08.pdf

Kemp C, Perfors A, Tenenbaum JB (2007) Learning overhypotheses with hierarchical bayesian
models. Dev Sci 10(3):307–321. http://web.mit.edu/cocosci/Papers/devsci07_kempetal.pdf

Lu H, Yuille AL, Liljeholm M, Cheng PW, Holyoak KJ (2008) Bayesian generic priors for causal learning.
Psychol Rev, in press. http://www.stat.ucla.edu/~yuille/pubs/ucla/C10_hjlu_PsychRev2007.pdf

Lu, H., Rojas, R., Beckers, T., & Yuille, A. (2008). Sequential causal learning in humans and rats. 
Proceedings of the Twenty-ninth Annual Conference of the Cognitive Science Society. [PDF]

Mahadevan, S. and  Maggioni, M. "Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes" ,
Journal of Machine Learning Research, pp. 2169-2231, vol. 8, 2007, MIT Press. http://www.cs.umass.edu/~mahadeva/papers/06-35.pdf

Mannor, S., Menache, I., Hoze, A., and Klein, U. 2004. Dynamic abstraction in reinforcement learning via clustering.
In Proceedings of the Twenty-First international Conference on Machine Learning (Banff, Alberta, Canada, July 04 - 08, 2004). ICML '04, vol. 69.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.5975&rep=rep1&type=pdf

Mehta, N., Ray, S., Tadepalli, P., Dietterich, T. (2008). Automatic Discovery and Transfer of MAXQ Hierarchies.
International Conference on Machine Learning (ICML-2008) http://pages.cs.wisc.edu/~sray/papers/maxq.icml08.pdf

Niv Y., Joel D., Meilijson I. and Ruppin E. (2002) -- Evolution of Reinforcement Learning in Uncertain Environments:
A Simple Explanation for Complex Foraging Behaviors pdf

Niv Y, Daw ND, Dayan P (2005) How fast to work: Response vigor, motivation and tonic
dopamine. In: Advances in Neural Information Processing Systems 18. Cambridge, MA:
MIT Press. http://www.cns.nyu.edu/~daw/ndd05.pdf

Niv Y, Joel D, Dayan P (2006) A normative perspective on motivation. Trends Cogn Sci
10(8):375–381. http://www.gatsby.ucl.ac.uk/~dayan/papers/njd2006.pdf

Sloman, S., Hagmayer, Y.(2006) The causal psycho-logic of choice, Trends in Cognitive SciencesVolume 10, Issue 9, Pages 407-412.
(http://www.sciencedirect.com/science/article/B6VH9-4KKNNHN-2/2/2e28a448e7044c8e93c064f7d9908c5e)

Steyvers et al., Inferring causal networks from observations and interventions, Cogn. Sci. 27 (2003), pp. 453–489.
http://web.mit.edu/cocosci/Papers/steyvers-etal-2003.pdf

Talmi D, Seymour B, Dayan P, Dolan RJ (2008) Human pavlovian-instrumental transfer. J
Neurosci 28(2):360–368. URL http://dx.doi.org/10.1523/JNEUROSCI.4028-07.2008.

Tanaka SC, Balleine BW, O’Doherty JP (2008) Calculating consequences: brain sys-
tems that encode the causal effects of actions. J Neurosci 28(26):6750–6755. URL
http://dx.doi.org/10.1523/JNEUROSCI.1808-08.2008.

Tolman EC (1939) Prediction of vicarious trial and error by means of the schematic sowbug.
Psychological Review 46:318–336.

http://www.cc.gatech.edu/ai/robot-lab/research/eBug/

Tse D, Langston RF, Kakeyama M, Bethus I, Spooner PA, Wood ER, Witter MP, Morris RGM
(2007) Schemas and memory consolidation. Science 316(5821):76–82. http://www.sciencemag.org/cgi/content/abstract/316/5821/76

Wang, G.,  Mahadevan, S. "Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes",
Proceedings of the 16th International Conference on Machine Learning (ICML '99), Bled, Slovenia, June 27-30, 1999.
http://www.cs.umass.edu/~mahadeva/papers/icml99.ps.gz

Yu AJ, Dayan P (2005) Uncertainty, neuromodulation, and attention. Neuron 46(4):681–692.
URL http://dx.doi.org/10.1016/j.neuron.2005.04.026.