🪴 jaden lorenc

Search

Perceiver Paper

Last updated Feb 8, 2024 Edit Source

#work/patientsim 2024-02-05

On the surface, it seems like it’s pretty much just: transformer for pictures and videos and images and whatnot.

It’s flexible. -
It scales to many more inputs than the traditional NLP transformer.
Cross-attention distills inputs into smaller latent space. Sounds like stable diffusion.
It claims to be a generalization of transformers. bold claim
is it useful for text? Like, if it’s so much better than transformers, how does it do in their home turf? It seems to match it.

[2103.03206] Perceiver: General Perception with Iterative Attention