LLM Quantization | Simon Barnes

In this follow up Python post I reproduce the ‘prompt sensitivity’ issue I identified last year, in an OpenAI model, in an open source model running locally. I also discover that the quantization process, which shrinks models and can make them easier to run locally, is apparently responsible for this quirky behaviour. Because the closed-source model I used in my earlier blog, text-davinci-003, is no longer available this blog opens up a path for further, reproducible, exploration of this issue. ...

category: LLM Quantization

Prompt sensitivity revisited: quantization and open source models