Perceiver Paper
#work/patientsim 2024-02-05
On the surface, it seems like it’s pretty much just: transformer for pictures and videos and images and whatnot.
- It’s flexible. -
- It scales to many more inputs than the traditional NLP transformer.
- Cross-attention distills inputs into smaller latent space. Sounds like stable diffusion.
- It claims to be a generalization of transformers. bold claim
- is it useful for text? Like, if it’s so much better than transformers, how does it do in their home turf? It seems to match it.
[2103.03206] Perceiver: General Perception with Iterative Attention