Causal models, learning, & video games

University of Minnesota, Spring Semester, 2009

Recent evidence suggests that theories of Bayesian agents may provide good first-order models of how humans learn to make optimal decisions in dynamic complex tasks, such as those requiring learning underlying causal structure of the world and the effects of our actions on it. Optimal in this context is defined in terms of attaining the task goal, while minimizing loss and maximizing gains. Recently, it has also been shown that experience with certain types of video games can produce generalized transfer of learning. Although this recent theoretical and empirical research is promising, it is important to test predictions under conditions of increasing realism if we are to understand how optimal learning of complex dynamic tasks can be induced. Video game technology provides an unprecedented opportunity to experimentally manipulate and control task factors involved in skill acquistion, including manipulating the underlying causal models, task constraints, and reward models. In this course, we will discuss literature on theories of perception, cognition, and action from the point of view of Bayes agents, together with recent behavioral research on learning and skill acquisition.

Registration:
With Dan Kersten: Computational Vision (Psy8036) or Psy 5993-77760 - 016 (3 credits)
With Paul Schrater: Psy 5993-79022 - 034 (3 credits)

Instructors:
Shawn Green (csgreen@umn.edu)
Dan Kersten (kersten@umn.edu)
Paul Schrater (schrater@umn.edu)

Meeting time : 3:00 to 4:30 pm Tuesdays (First meeting: Tuesday, January 20th, 2009 )
Place (Note room change): N227 Elliott Hall

Format: Discussion of journal articles led by seminar members, term paper or term project on a related topic.

Schedule

Date	Reading	Leader
Jan 20	Introduction	Dan Kersten (pdf), Paul Schrater (pdf) & Shawn Green
Jan 27	Ahissar, M., Nahum, M., Nelken, I., & Hochstein, S. (2008). Reverse hierarchies and sensory learning. Philos Trans R Soc Lond B Biol Sci. (pdf)	Shawn Green (pdf)
Feb 3	Kaelbling LP, Littman ML, Moore AW. (1996) Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, Vol 4, 237-285 http://arxiv.org/abs/cs/9605103 (pdf)	Paul Schrater (pdf)
Feb 10	Green, C. S., & Bavelier, D. (2008). Exercising your brain: A review of human brain plasticity and training-induced learning. Psychol Aging, 23(4), 692-701. (pdf)
Feb 17	Xiao, L. Q., Zhang, J. Y., Wang, R., Klein, S. A., Levi, D. M., & Yu, C. (2008). Complete transfer of perceptual learning across retinal locations enabled by double training. Curr Biol, 18(24), 1922-1926. (pdf) Dosher, B. A., & Lu, Z. L. (2007). The functional form of performance improvements in perceptual learning: learning rates and transfer. Psychol Sci, 18(6), 531-539. (pdf)
Feb 24	Kilgard, M. P., & Merzenich, M. M. (1998). Cortical map reorganization enabled by nucleus basalis activity. Science, 279(5357), 1714-1718. (pdf) Bao, S., Chan, V. T., & Merzenich, M. M. (2001). Cortical remodelling induced by activity of ventral tegmental dopamine neurons. Nature, 412(6842), 79-83. (pdf) Koepp, M. J., Gunn, R. N., Lawrence, A. D., Cunningham, V. J., Dagher, A., Jones, T., et al. (1998). Evidence for striatal dopamine release during a video game. Nature, 393(6682), 266-268. (pdf)
Mar 3	Kemp, C. and Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences. 105(31), 10687-10692. (pdf) (suppl pdf)
Mar 10	Steyvers et al. (2003) (pdf) (Supplementary links: Pearl's cite, including a review.)
Mar 24	Dearden et al.1998 (pdf), Niv et al. 2006 (pdf)
Mar 31	Poupart et al. 2007 (pdf) (ICML 07 video) Strens 2000 (pdf)
Apr 7	Dayan & Daw 2008 (pdf) Cohen, McClure & Yu 2007 (pdf)
Apr 14	Kording et al. 2008) (pdf) Sloman et al. 2006 (pdf)
Apr 21	Boutilier et al. 1995 (pdf)
Apr 28	Cutumisu et al. 2008 (pdf) GamesforClass
May 5	Discussion of Final Project goals -- Guidelines
May 16	Final Project Due -- See Guidelines
	Final Project Results -- Selected Bibliographies: CD, JS, MAA, ME, SJ

Reading List

Ahissar, M., Nahum, M., Nelken, I., & Hochstein, S. (2008). Reverse hierarchies and sensory learning. Philos Trans R Soc Lond B Biol Sci.

Borenstein, E., & Ullman, S. (2008). Combined top-down/bottom-up segmentation. IEEE Trans Pattern Anal Mach Intell, 30(12), 2109-2125.

Boutilier, C., Dearden, R., Goldszmidt, M. (1995). Exploiting Structure in Policy Construction.
In IJCAI 1104-1113

Cohen JD, McClure SM, Yu AJ (2007) Should i stay or should i go? how the human brain manages the trade-off between exploitation and exploration. Philos Trans R Soc Lond B Biol Sci 362(1481):933–942. URL http://dx.doi.org/10.1098/rstb.2007.2098.

Colwill RM, Rescorla RA (1990) Evidence for the hierarchical structure of instrumental learn-
ing. Animal Learning and Behavior 18(1):71–82.

Cheng, P.W. (1997). From covariation to causation: a causal power theory. Psychological Review, 104, 367-405.

Courville, A. C., Daw, N. D., & Touretzky, D. S. (2006). Bayesian theories of conditioning in a changing world. Trends Cogn Sci, 10(7), 294-300.

Cutumisu Maria , Szafron Duane, Bowling Michael, and Richard Sutton, 2008. Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games

Dayan, P., & Daw, N. D. (2008). Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci, 8(4), 429-453.

Dearden Richard , Friedman Nir, and Stuart Russell, 1998. Bayesian Q-learning

Eckstein, M.P., Abbey, C.K., Pham, B.T., & Shimozaki, S.S. (2004). Perceptual learning through optimization of attentional weighting: human versus optimal Bayesian learner. Journal of Vision, 4, 1006-1019..

Epshtein, B., Lifshitz, I., & Ullman, S. (2008). Image interpretation by a single bottom-up top-down cycle. Proc Natl Acad Sci U S A, 105(38), 14298-14303.

Fahle, M. (2005). Perceptual learning: specificity versus generalization. Current Opinions in Neurobiology, 15(2), 154-160.

Feldman, J. (2000). Minimization of Boolean complexity in human concept learning. Nature, 407, 630-633.

Friston, K. (2003). Learning and inference in the brain. Neural Networks, 16(9), 1325-1352.

Friston, K. (2005). A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci, 360(1456), 815-836.

Green, C.S. & Bavelier, D. (in press). Exercising your brain: A review of human brain plasticity and training-induced learning. Psychology and Aging.

Green, C. S., & Bavelier, D. (2003). Action video game modifies visual selective attention. Nature, 423(6939), 534-537.

Glymour, C., Learning, prediction and causal Bayes nets, Trends Cogn. Sci. 7 (2003), pp. 43–48

Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2004). A theory of causal learning in children: causal maps and Bayes nets. Psychol Rev, 111(1), 3-32.

Hagmayer, Y. et al. Causal reasoning through intervention.
In Causal Learning: Psychology, Philosophy, and Computation (Gopnik, A. and Schulz, L., eds), Oxford University Press.

Heckerman D, Meek C, and Koller D (2004) Probabilistic Models for Relational Data. MSR-TR-2004-30 (pdf)

Jin Y. and S. Geman. (2006) Context and hierarchy in a probabilistic image model. CVPR (2), 2006, 2145-2152.

Kemp, C. and Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences. 105(31), 10687-10692.

Kemp C, Perfors A, Tenenbaum JB (2007) Learning overhypotheses with hierarchical bayesian
models. Dev Sci 10(3):307–321

George Konidaris and Andrew Barto, 2006. Building Portable Options: Skill Transfer in Reinforcement Learning

Kording, K. P., Beierholm, U., Ma, W. J., Quartz, S., Tenenbaum, J. B., Shams, L. (2007). Causal inference in multisensory perception PLoS ONE. September 2007, Issue 9, e943.

Pearl, J. (2000). Causality : models, reasoning, and inference. Cambridge, U.K. ; New York: Cambridge University Press.

Lee, M.D. (2008). Three case studies in the Bayesian analysis of cognitive models. Psychonomic Bulletin and Review, 15, 1-15.

Lombrozo, T. (2006). The structure and function of explanations. Trends in Cognitive Sciences, 10(10), 464-470.

Lu H, Yuille AL, Liljeholm M, Cheng PW, Holyoak KJ (2008) Bayesian generic priors for causal learning.
Psychol Rev, in press.

Ma, W. J., Beck, J. M., & Pouget, A. (2008). Spiking networks for Bayesian inference and choice. Curr Opin Neurobiol, 18(2), 217-222.

Mahadevan, S. and Maggioni, M. "Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes" ,
Journal of Machine Learning Research, pp. 2169-2231, vol. 8, 2007, MIT Press.

Mehta, N., Ray, S., Tadepalli, P., Dietterich, T. (2008). Automatic Discovery and Transfer of MAXQ Hierarchies.
International Conference on Machine Learning (ICML-2008)

Michel, M.M. & Jacobs, R.A. (2007). Parameter learning but not structure learning: a Bayesian network model of constraints on early perceptual learning. Journal of Vision, 7(1), 4, 1-18.

Poupart Pascal, Ghavamzadeh Mohammad, and Engel Yaakov, 2007
ICML-07 Tutorial on Bayesian Methods for Reinforcement Learning, Part 1-4

Niv Y, Joel D, Dayan P (2006) A normative perspective on motivation. Trends Cogn Sci
10(8):375–381.

Schulz, L. E., & Gopnik, A. (2004). Causal learning across domains. Dev Psychol, 40(2), 162-176.

Seitz, A. & Watanabe, T. (2005). A unified model for perceptual learning. Trends in Cognitive Sciences, 9(7), 329-334.

Seitz, A.R. & Dinse, H.R. (2007). A common framework for perceptual learning. Current Opinions in Neurobiology, 17(2), 148-153.

Sloman, S., Hagmayer, Y.(2006) The causal psycho-logic of choice, Trends in Cognitive SciencesVolume 10, Issue 9, Pages 407-412.
(http://www.sciencedirect.com/science/article/B6VH9-4KKNNHN-2/2/2e28a448e7044c8e93c064f7d9908c5e)

Sridharan Mohan, Jeremy Wyatt and Richard Dearden, 2008.
HiPPo: Hierarchical POMDPs for Planning Information Processing and Sensing Actions on a Robot

Strens Malcolm , 2000. A Bayesian Framework for Reinforcement Learning

Steyvers et al., Inferring causal networks from observations and interventions, Cogn. Sci. 27 (2003), pp. 453–489.

Sundareswara, R., & Schrater, P. R. (2008). Perceptual multistability predicted by search model for Bayesian decisions. J Vis, 8(5), 12 11-19.

Talmi D, Seymour B, Dayan P, Dolan RJ (2008) Human pavlovian-instrumental transfer. J
Neurosci 28(2):360–368. URL http://dx.doi.org/10.1523/JNEUROSCI.4028-07.2008.

Tanaka SC, Balleine BW, O’Doherty JP (2008) Calculating consequences: brain systems that encode the causal effects of actions. J Neurosci 28(26):6750–6755. URL http://dx.doi.org/10.1523/JNEUROSCI.1808-08.2008.

Tenenbaum, J. B., Griffiths, T. L., and Kemp, C. (2006). Theory-based Bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7), 309-318.

Tsodyks, M. & Gilbert, C. (2004). Neural networks and perceptual learning. Nature, 431(7010), 775-781.

Tu, Z., Chen, X., Yuille, A., & Zhu, S. (2005). Image Parsing: Unifying Segmentation, Detection and Recognition. International Journal of Computer Vision 63(2), 113–140.

Yuille, A., & Kersten, D. (2006). Vision as Bayesian inference: analysis by synthesis? Trends Cogn Sci, 10(7), 301-308.

Xu Z, Tresp V., Yu K, & Kriegel HP (2006)Learning Infinite Hidden Relational Models. Proc. 22nd Conf. on Uncertainty in Artificial Intelligence (UAI'06) Cambridge, MA. (pdf)

Final Assignment

What Should Transfer? How the credit assignment problem is solved should affect
what is transferable and generalizability:

1. Learned policy transfer

Requires similar state-action mapping to work well in new problem
Example: target shooting in relative coordinates
Example: Resource allocation
Example: category boundaries in feature space

2. Perceptual model transfer

Requires same relationship between sensory information and states
Example- cue reliability/combination

3. World model transfer

Requires same environment/dynamics
Predictability across time (e.g. language)

4. Reward model transfer

Requires same outcome/reward relationships and dynamics
Example: Rewarded outcomes are independent/dependent across trials

5. World metamodel /Reward metamodel