Poro 2 - Printable Version

Poro 2 - Printable Version

+- Mini's Forum (https://forum.minipasila.net)
+-- Forum: AI (https://forum.minipasila.net/forumdisplay.php?fid=3)
+--- Forum: Large Language Models (https://forum.minipasila.net/forumdisplay.php?fid=7)
+--- Thread: Poro 2 (/showthread.php?tid=7)

Poro 2 - minipasila - 28.08.2025

So Poro 2 is another Finnish LLM made by AMD Silo AI, the TurkuNLP group of the University of Turku, and HPLT.

It's a decent model for Finnish but since some of the data was generated using Llama 3.3 70B it's not going to be quite as good as it could be. Due that model not being great at Finnish in the first place. They should have used a better model for that task. Gemma 3 27B would have been a decent choice though maybe it wasn't available at the time they were working on it.

The base model might still be fine it's just the SFT training data that has that terrible data which could be fixed by generating the data with better models. I'd probably try doing it myself but I don't really have too much I can spend atm. Plus I'd have to at least use Gemma 3 27B to generate potentially thousands of examples which might cost quite a bit. Even generating on Mistral Small 3.2 24B model it cost me like a dollar to make like just one thousand examples to clean up a dataset. And since Gemma 3 27B is a bit larger and wastes more resources/memory it will probably a bit more than that, so if I wanted like 10k examples that'd be like probably closer to 20 dollars (though checking OpenRouter it appears to be very similarly priced to Mistral Small except throughput is worse). So another thousand examples would make more sense to see if that even improves the model in the first place before I spend more. And I would have to somehow make sure it doesn't just generate garbage.

Anyway Poro 2 models are still kinda decent, since they gave us a 8B model which is probably the smallest Finnish model that's actually decent. Gemma 3 only gave us 4B and 12B and nothing in between.. 4B is just too small to be very useful and 12B happens to use too much memory to run on 8GB VRAM.. (and not just because it's 12B but because it uses more memory in general in comparison to Mistral models like NeMo). So 8B is in just the right spot of being small enough and actually useful. 70B I didn't test too much since no one is offering it via an API. And I think that's all I have to say about these models.

Links:
https://huggingface.co/collections/LumiOpen/poro-2-6835bec8186e98712b061f02