A sequential Bayesian model for
learning and memory in
multi-context environments

Dave F. Kleinchmidt Rutgers University/Princeton Neuroscience Institute
Pernille Hemmer Rutgers University

24 July 2018 // MathPsych // osf.io/dqz73

in the real world judgements are made in

context

which provides useful information

In [2]:
arena()
Out[2]:
In [3]:
x, y = -0.3, 0.4
arena([x], [y], color="black")
Out[3]:
In [5]:
plot(Gray.(rand(Bool, 100,100)), axis=false, lims=(0,100), aspect_ratio=:equal)
Out[5]:
In [6]:
arena([x], [y], color=:white, markerstrokecolor=:black, markersize=5)
annotate!(0,0, text("?", 32))
Out[6]:
?
In [7]:
arena(randn(200).*0.2, randn(200).*0.2, color=:black, markeralpha=0.25)
scatter!([x], [y], color=:white, markerstrokecolor=:black, markersize=5)
Out[7]:
In [8]:
quiver!([x], [y], quiver=(-[x*0.3], -[y.*0.3]), color=:black)
Out[8]:
In [9]:
srand(2)
θ = rand(200) * 2π
ρ = randn(200) * .05 + 0.85

arena(cos.(θ).*ρ, sin.(θ).*ρ, markeralpha=0.25, color=:black)
scatter!([x], [y], color=:white, markerstrokecolor=:black, markersize=5)


quiver!([x], [y], quiver=([x*.3], [y*.3]), color=:black)
Out[9]:

but what is a

context?

and how do you know?

history provides context

In [50]:
r1_shuffled = view(recall, randperm(180) .+ 20, :)

p = arena([], [], markeralpha=0.25, color=:black, lims=(-1.1,1.1))
anim = @animate for (x,y) in @_ zip(r1_shuffled[:x], r1_shuffled[:y])
    push!(p, x,y)
end

gif(anim, "figures/shuffled.gif", fps=5)
INFO: Saved animation to /home/dave/.dropbox-raizadalab/Dropbox (Raizada Lab)/work/dots-location-memory-pernille/figures/shuffled.gif
Out[50]:

history provides context

In [49]:
p = arena([], [], markeralpha=0.25, color=:black, lims=(-1.1,1.1))
anim = @animate for (x,y) in @_ zip(recall[:x], recall[:y]) |> It.drop(_, 20) |> It.take(_, 180)
    push!(p, x,y)
end

gif(anim, "figures/clustered.gif", fps=5)
INFO: Saved animation to /home/dave/.dropbox-raizadalab/Dropbox (Raizada Lab)/work/dots-location-memory-pernille/figures/clustered.gif
Out[49]:

Behavior

In a structured environment recall is biased towards clusters [Robbins, Hemmer, and Tang, CogSci2014]

In [14]:
arena(lims=(-1,1))
@_ known_recalled1 |>
    @by(_, :block, x_clus = mean(:x), y_clus = mean(:y)) |>
    @df(_, scatter!(:x_clus, :y_clus, color=:red, seriestype=:scatter, markerstrokecolor=:white))
@df recall1 quiver!(:x, :y, quiver=(:x_resp.-:x, :y_resp.-:y), color=:black, seriestype=:quiver, lims=(-1,1))
Out[14]:

Behavior + cluster bias

In a structured environment recall is biased towards clusters [Robbins, Hemmer, and Tang, CogSci2014]

In [16]:
arena(lims=(-1,1))
@_ known_recalled1 |>
    @by(_, :block, x_clus = mean(:x), y_clus = mean(:y)) |>
    @df(_, scatter!(:x_clus, :y_clus, color=:red, seriestype=:scatter, markerstrokecolor=:white))
@df known_recalled1 quiver!(:x, :y, quiver=(:x_resp.-:x, :y_resp.-:y), color=GrayA(0.0, 0.5))
@df known_recalled1 quiver!(:x, :y, quiver=(:x_mod.-:x, :y_mod.-:y), color=RGBA(1, 0, 0, 0.5))
Out[16]:

approach: bounded rationality

Computational-level: Dirichlet Process mixture model

  • Infer how points $x_t$ are assigned $z_t$
    • $p(z_1, \ldots, z_T | x_1, \ldots, x_T) \propto p(x_1, \ldots, x_T | z_1, \ldots, z_T) p(z_1, \ldots, z_T)$
  • Prior: "sticky" CRP $p(z_t = j | z_{1\ldots t-1}) \propto N_j (\times \frac{\rho}{1-\rho}$ if $z_{t-1}=j)$
    • $N_j = \alpha$ for all new $j$.
    • Prefer small number of contexts
    • Allow for up to $T$ (one per point)
  • Likelihood: $p(x_t | z_t, z_{1:t-1}, x_{1:t-1}) = p(x_t | \{x_i \ \mathrm{if}\ z_i = z_t\})$
    • Prefer compact clusters

Algorithmic-level

approach: bounded rationality

Computational-level

Algorithmic-level: Sequential Monte Carlo

  • online (not batch)
  • finite uncertainty
  • particle filter:
    • Each particle is one hypothetical clustering $z_{1\ldots t}$
    • Update particles in parallel following new data point
    • Re-sample when particles become too homogenous

Does it work?

Learning clusters

Recall

Prediction

Learning clusters

In [51]:
@df recalled1 plot(arena(:x, :y, group=assignments(first(particles(rf))), title="Inferred clusters"),
                   arena(:x, :y, group=:block, title="True blocks"),
                   layout=(1,2), size=(800, 400))
Out[51]:
Inferred clusters True blocks

Learning clusters

In [52]:
plot(plot(show_assignment_similarity(rf), title="Inferred clusters"),
     plot(Gray.(@with(recall1, :block .== :block')), title="True blocks"),
     axis=false, aspect_ratio=:equal, layout=(1,2), size=(800,400))
Out[52]:
Inferred clusters True blocks

Recall

Task

  • Immediate recall with mask

Model

  • Bayesian cue combination (after e.g., Huttenlocher)
  • two cues: thing you saw, and inferred context
  • weighted average (by inverse-variance)

Recall: model

In [28]:
size2 = (900,400)

p1 = @df recalled1 arena(:x, :y, quiver=(:x_mod.-:x, :y_mod.-:y), seriestype=:quiver, label="Model",
                         layout=@layout([a{0.5w} _]), size=size2)
Out[28]:

Recall: model + behavior

In [29]:
@df recalled1 quiver!(:x, :y, quiver=(:x_resp.-:x, :y_mod.-:y), color=GrayA(0.0, 0.3), label="Behavior", subplot=1)
Out[29]:

Recall: model + behavior

In [30]:
@_ baseline_good |>
    @where(_, :subjid1 .== 7) |>
    @df _ begin
        plot(arena(:x, :y, quiver=(:x_mod.-:x, :y_mod.-:y), seriestype=:quiver),
             histogram(:cos_mod, bins=20, normalized=true, yaxis=false, label="", 
                       legend=false,
                       aspect_ratio=:equal, 
                       alpha=0.5),
             layout=@layout([a{0.5w} b]), size=size2)
        quiver!(:x, :y, quiver=(:x_resp.-:x, :y_mod.-:y), color=GrayA(0.0, 0.3), label="Behavior", subplot=1)
        vline!([cosinesim(baseline_good)], subplot=2)
        vline!([0], subplot=2, color=:black)
        annotate!(cosinesim(baseline_good)*1.2, 1, text("Mean=$(round(cosinesim(baseline_good), 2))", 10, :left),
                  subplot=2)
        title!("Cosine similarity", subplot=2)
    end
Out[30]:
-1.0 -0.5 0.0 0.5 1.0 Cosine similarity Mean=0.11

Recall: overall fit

depends on stickiness (low < medium < high) and eagerness to create new clusters

In [31]:
@df(@where(recalled_summaries, : . 0.01),
    plot(:α, :cos, group=:ρ, xscale=:log10, 
         xlims=(10^-2.5, 10^1.5), ylims=(0, 0.115), seriestype=:line,
         xlabel="Eagerness to create new clusters (DP concentration)",
         ylabel=("Cosine sim. with behavior"),
        legend=:bottomright, legend_title="Stickiness", line=2))
Out[31]:
10 - 2 10 - 1 10 0 10 1 0.000 0.025 0.050 0.075 0.100 Eagerness to create new clusters (DP concentration) Cosine sim. with behavior Stickiness 0.1 0.5 0.9

Recall: baselines

consistently ≈ known clusters and > center/mean radius

In [32]:
# @df(@where(recalled_summaries, :Sσ .≈ 0.01),
#     plot(:α, :cos, group=:ρ, xscale=:log10, 
#          xlims=(10^-2.5, 10^1.5), ylims=(0, 0.115), seriestype=:line,
#          xlabel="Eagerness to create new clusters (DP concentration)",
#          ylabel=("Cosine sim. with behavior"),
#         legend=:bottomright, legend_title="Stickiness", line=2))

baseline_x = [10^-2.1, 10^1.1]

function plot_baseline!(y, label)
    plot!(baseline_x, ones(2)*y, color=Gray(0.7), label="")
    annotate!(baseline_x[end]*1.05, y, label)
end

plot_baseline!(cosinesim(known_recalled), text("Known\nclusters", 10, RGB(Gray(0.7)), :left, :bottom))

plot_baseline!(bs_center, text("Center", 10, RGB(Gray(0.7)), :left, :bottom))
plot_baseline!(bs_avgrho, text("Mean rad.", 10, RGB(Gray(0.7)), :left, :top))
Out[32]:
10 - 2 10 - 1 10 0 10 1 0.000 0.025 0.050 0.075 0.100 Eagerness to create new clusters (DP concentration) Cosine sim. with behavior Stickiness 0.1 0.5 0.9 Known clusters Center Mean rad.

Prediction

Task

  • Interleaved with recall task (every three trials)
  • Predict location 1, 5, or 10 trials in future

Model

  • Sample one particle
  • Sample $n$ steps from particle's sticky Chinese Restaurant Process prior on states
  • Draw one point from posterior predictive of sampled cluster

Prediction: model + behavior

In [43]:
let anim = Animation(), rx = recalled1[:x], ry = recalled1[:y], rnr = recalled1[:respnr]
    @byrow! predicted1 begin
        p1 = plot_prediction_task(:xys_mod, :x, :y, :x_resp, :y_resp)
        title!("Trial $(:respnr) (+$(:pred))")
        p2 = arena(rx[rnr.<=:respnr], ry[rnr.<=:respnr], lims=(-1.2, 1.2), 
                   markeralpha=0.25, color=Gray(0), title="Studied")
        plot(p1, p2, size=(800, 400))
        frame(anim)
    end
    gif(anim, "figures/pred2.gif", fps=1)
end
INFO: Saved animation to /home/dave/.dropbox-raizadalab/Dropbox (Raizada Lab)/work/dots-location-memory-pernille/figures/pred2.gif
Out[43]:

Prediction: model + behavior

In [40]:
@_ predicted1 |>
    @where(_, :respnr .== 23) |>
    plot_prediction_task(_, title=true, size=(800,400))
WARNING: Keyword argument match_dimensions not supported with Plots.GRBackend().  Choose from: Set(Symbol[:top_margin, :group, :background_color, :yforeground_color_text, :yguidefontcolor, :seriesalpha, :legendfontcolor, :seriescolor, :ztick_direction, :zlims, :overwrite_figure, :xguidefonthalign, :normalize, :linestyle, :xflip, :fillcolor, :ygrid, :background_color_inside, :zguidefonthalign, :bins, :yscale, :xtickfontcolor, :xguide, :fillalpha, :tick_direction, :yguidefontsize, :legendfontfamily, :foreground_color, :xtickfonthalign, :x, :ytickfontrotation, :legend, :discrete_values, :ytick_direction, :xguidefontrotation, :ribbon, :tickfontrotation, :xdiscrete_values, :legendtitle, :xgridstyle, :orientation, :gridstyle, :markersize, :camera, :xforeground_color_grid, :quiver, :zticks, :markerstrokecolor, :ztickfontrotation, :ztickfonthalign, :legendfonthalign, :xtickfontsize, :levels, :zgridstyle, :foreground_color_border, :zguidefontvalign, :marker_z, :markerstrokealpha, :markeralpha, :tickfontvalign, :zguidefontcolor, :ygridlinewidth, :zlink, :zscale, :smooth, :xticks, :zguidefontsize, :y, :margin, :ytickfontcolor, :yforeground_color_border, :zguidefontfamily, :zgridalpha, :yguidefontvalign, :yguidefonthalign, :ztickfontcolor, :html_output_format, :tickfontcolor, :titlefontrotation, :legendfontvalign, :tickfontsize, :z, :yforeground_color_axis, :xtickfontrotation, :xerror, :contour_labels, :xguidefontcolor, :primary, :guidefonthalign, :aspect_ratio, :link, :yguide, :guidefontvalign, :yguidefontfamily, :layout, :polar, :right_margin, :xlink, :series_annotations, :inset_subplots, :ytickfontsize, :tickfontfamily, :xgrid, :ygridalpha, :xtick_direction, :colorbar, :zflip, :ticks, :legendfontrotation, :linealpha, :arrow, :xtickfontvalign, :zgrid, :bar_width, :zguide, :zforeground_color_text, :weights, :xgridalpha, :ygridstyle, :fill_z, :ztickfontfamily, :markershape, :background_color_subplot, :xguidefontvalign, :markerstrokewidth, :xguidefontfamily, :gridlinewidth, :foreground_color_subplot, :xgridlinewidth, :foreground_color_text, :titlefonthalign, :yerror, :zgridlinewidth, :grid, :xguidefontsize, :xforeground_color_axis, :background_color_outside, :titlefontcolor, :line_z, :size, :projection, :zguidefontrotation, :ydiscrete_values, :seriestype, :yflip, :fillrange, :ztickfontvalign, :xlims, :xforeground_color_border, :markercolor, :ylink, :yforeground_color_grid, :color_palette, :lims, :xscale, :left_margin, :annotations, :window_title, :foreground_color_axis, :yguidefontrotation, :guidefontsize, :zdiscrete_values, :tickfonthalign, :bottom_margin, :framestyle, :scale, :zforeground_color_border, :background_color_legend, :linecolor, :foreground_color_legend, :title, :subplot_index, :flip, :titlefontvalign, :foreground_color_grid, :linewidth, :ztickfontsize, :gridalpha, :guidefontfamily, :ylims, :xtickfontfamily, :ytickfontvalign, :ytickfontfamily, :xforeground_color_text, :show, :guidefontrotation, :legendfontsize, :subplot, :label, :ytickfonthalign, :guide, :guidefontcolor, :titlefontsize, :titlefontfamily, :zforeground_color_axis, :zforeground_color_grid, :yticks])
Out[40]:
Trial 23 (+1) Trial 23 (+10)

Prediction: model + behavior

In [41]:
plot_prediction_task(predicted1[end-1:end,:], size=(800, 400))
Out[41]:
Trial 200 (+5) Trial 200 (+10)

Prediction: average distance

model ≈ behavior: distance of predictions at 1 trial < 5 trials < 10 trials

In [44]:
prediction_deviations = @_ predicted_good |>
    @transform(_, resp_dev = rho.(:x.-:x_resp, :y.-:y_resp), 
                  mod_dev = mean.(pairwise.(Euclidean(), 
                                            transpose.(hcat.(:x, :y)),
                                            transpose.(:xys_mod))))

@_ prediction_deviations |>
    @by(_, [:pred], resp_dev = mean(:resp_dev), mod_dev = mean(:mod_dev)) |>
    @df(_, begin 
        scatter(:resp_dev, :mod_dev, markersize=8, xlabel="Response deviation (from last recall)", ylabel="Model predicted", 
                group=:pred, legend=:bottomright, aspect_ratio=:equal, 
                legend_title="Prediction for next")
        #scatter!(:resp_dev, :mod_dev, markeralpha=0, color=:black, label="", smooth=true)
        end)


plot!(x->x, color=GrayA(0.2, 0.2), label="")
Out[44]:
0.30 0.35 0.40 0.45 0.50 0.30 0.35 0.40 0.45 0.50 Response deviation (from last recall) Model predicted Prediction for next 1 5 10

Prediction: single trial distance

captures variation within prediction horizons

In [45]:
@_ prediction_deviations |>
    @by(_, [:subjid1, :block, :respnr, :pred], resp_dev = mean(:resp_dev), mod_dev = mean(:mod_dev)) |>
    @df(_, scatter(:resp_dev, :mod_dev, xlabel="Response deviation", ylabel="Model predicted", 
                   legend_title = "Predictions for next", legend=:bottomright,
                   group=:pred, smooth=true, markerstrokecolor=:white, markeralpha=.3, line=2, aspect_ratio = :equal,
                   size=(800,400)))

plot!(linspace(0,1,100), x->x, color=GrayA(0.2, 0.2), label="")

# @_ prediction_deviations |>
#     @by(_, [:pred], resp_dev = mean(:resp_dev), mod_dev = mean(:mod_dev)) |>
#     @df(_, scatter!(:resp_dev, :mod_dev))
Out[45]:
0.0 0.5 1.0 1.5 0.00 0.25 0.50 0.75 1.00 Response deviation Model predicted Predictions for next 1 5 10

what have we learned

  • people pick up and use structure in recall and prediction
  • sequential Bayesian model learns clusters online
  • learned clusters captures behavior:
    • recall: better than simple baselines
    • prediction: distance from last last studied location

what's left

  • budget for uncertainty (number of particles)
  • online learning of stickiness/clustering/prior on cluster mean/variance
  • apply to new data (clearer clusters and no clusters)