• it doesn’t actually process text, which is why it’s more efficient, it can essentially take in ten times the text through images without suffering the penalties associated by having that many tokens