Understand The Tolman-Eichenbaum Machine (TEM)

4 min readNov 6, 2023

What TEM does?

Tolman-Eichenbaum Machine (TEM) is a form of Artificial Hippocampus, it learns a model of the world/environment.

Formally it does generalized graph learning which ensures that after experiencing many graphs with different sensory observations and learning their common relational structure, the model should maximize its ability to predict the next sensory observation after each transition on a new graph.

If all transitions have been experienced, the graph can be stored in memory and perfect predictions made without any structural abstraction. However, if structural properties (e.g. all the transition types) of the graph are known before that, perfect prediction is possible right after each node have been experienced, i.e. long before all transitions have been experience. E.g. if we know the graph is a family tree, then we’ll be able to infer all the relations between family members right after we starting from any node and transit to rest of the nodes just once, instead of have seen all the transitions/relations in that family tree.

How does TEM work at high level?

Model processing description

The TEM’s model structure is a composition of two sub-models, Generative model (red) and Inference model (green, corresponding to hippocampal formation). For the purpose of predict the next sensory input, generative model using path integration from current location variable to deduce sensory inputs in the future time frames, and inference model trying to use sensory input at current frame to calibrate the current location variable thus mitigate the path integration error.

Latent variables g and p represent the abstract location (where I am) and conjunction between the abstract location and the sensory data (i.e. Grounded location, what is at where) respectively. By separating latent variables of location that generalize across graphs, g, from those that are grounded in sensory experience and therefore specific to a particular graph p, it enables generalization of knowledge across domains.

M is a matrix that represent the Memories, it’s used to retrieve grounded location p by using either sensory input x or abstract location g (Technically it retrieve memory using matrix projected x or g).

Model training description

The objective of training is for the generative model to predict the sensory input x, and for the inference model to infer the generative models latent variables, [p, g], from the sensory input.

The model is trained in multiple different environments, differing in size and sensory experience. The training data is a continuous stream of sensory observations and actions/relations. Different environments use the same network weights, but different Hebbian weights for memory M, it using Hebbian learning for memory , not only for its biological plausibility, but to also allow rapid learning when entering a new environment. At then end of a sequence, both inference and generative models updating their parameters along the gradient that matches each others variables and also matches the data using back-propagation through time.

The most important weights are those that transition g as they encode the structure of the graph, they must ensure two constraints that are fundamental to TEM representations:

Each location in the graph has a different g representation (so a unique memory can be built)
Arriving at the same location after different actions causes the same g representation (so the same memory can be retrieved) — a form of path integration for arbitrary graph structures. E.g. the relation uncle must cause the same change in g as father followed by brother, but different from brother followed by father.

Parallel Streams

We running multiple models in parallel and only combine them when retrieving memories/grounded locations, each of those models take inputs at different frequencies. The separation into hierarchical scales helps to provide a unique code for each position, even if the same stimulus appears in several locations of one environment, since the surrounding stimuli, and therefore the lower frequency hippocampal cells, are likely to be different.

Relation to the brain

Each variables and their corresponding neurological tissue shows below. Abbreviation: Medial Entorhinal Cortex (MEC), Lateral Entorhinal Cortex (LEC), The hippocampus (HPC)