Welcome, Guest
You have to register before you can post on our site.

Username/Email:
  

Password
  





Search Forums

(Advanced Search)

Forum Statistics
» Members: 4
» Latest member: hpskljnogz
» Forum threads: 6
» Forum posts: 6

Full Statistics

Online Users
There are currently 2 online users.
» 0 Member(s) | 2 Guest(s)

Latest Threads
Poro 2
Forum: Large Language Models
Last Post: minipasila
28.08.2025, 19:16
» Replies: 0
» Views: 16
DeepSeek V3.1
Forum: Large Language Models
Last Post: minipasila
28.08.2025, 19:02
» Replies: 0
» Views: 19
Use correct Prefixes
Forum: Media
Last Post: minipasila
09.07.2025, 00:04
» Replies: 0
» Views: 35
SmolLM3-3B - Released by ...
Forum: Large Language Models
Last Post: minipasila
08.07.2025, 23:52
» Replies: 0
» Views: 46
Forums are open now!
Forum: Announcements
Last Post: minipasila
08.07.2025, 23:22
» Replies: 0
» Views: 42
What this is for?
Forum: Suggestions
Last Post: minipasila
08.07.2025, 23:08
» Replies: 0
» Views: 52

 
Photo Poro 2
Posted by: minipasila - 28.08.2025, 19:16 - Forum: Large Language Models - No Replies

So Poro 2 is another Finnish LLM made by AMD Silo AI, the TurkuNLP group of the University of Turku, and HPLT.

It's a decent model for Finnish but since some of the data was generated using Llama 3.3 70B it's not going to be quite as good as it could be. Due that model not being great at Finnish in the first place. They should have used a better model for that task. Gemma 3 27B would have been a decent choice though maybe it wasn't available at the time they were working on it.

The base model might still be fine it's just the SFT training data that has that terrible data which could be fixed by generating the data with better models. I'd probably try doing it myself but I don't really have too much I can spend atm. Plus I'd have to at least use Gemma 3 27B to generate potentially thousands of examples which might cost quite a bit. Even generating on Mistral Small 3.2 24B model it cost me like a dollar to make like just one thousand examples to clean up a dataset. And since Gemma 3 27B is a bit larger and wastes more resources/memory it will probably a bit more than that, so if I wanted like 10k examples that'd be like probably closer to 20 dollars (though checking OpenRouter it appears to be very similarly priced to Mistral Small except throughput is worse). So another thousand examples would make more sense to see if that even improves the model in the first place before I spend more. And I would have to somehow make sure it doesn't just generate garbage.

Anyway Poro 2 models are still kinda decent, since they gave us a 8B model which is probably the smallest Finnish model that's actually decent. Gemma 3 only gave us 4B and 12B and nothing in between.. 4B is just too small to be very useful and 12B happens to use too much memory to run on 8GB VRAM.. (and not just because it's 12B but because it uses more memory in general in comparison to Mistral models like NeMo). So 8B is in just the right spot of being small enough and actually useful. 70B I didn't test too much since no one is offering it via an API. And I think that's all I have to say about these models.

Links:
https://huggingface.co/collections/LumiO...712b061f02

Print this item

Photo DeepSeek V3.1
Posted by: minipasila - 28.08.2025, 19:02 - Forum: Large Language Models - No Replies

New model from DeepSeek which is still pretty similar to V3 but now it has hybrid thinking mode so it supposedly can do both non thinking and thinking in the same model but the APIs seem to not know how to enable that (at least Chutes AI).

I'm not sure if it's that much better than the previous model, in terms of multilinguality it seems to be more or less the same. Performs about as well as previous model in Finnish. For RP it seems to use shorter messages overall. And the thinking part appears to cause issues when you use prefills with <tag> like tags. If you have one of those tags in prefill it will spew out random nonsense.

Also considering that GPT-5 is way less censored there's less of a reason to use open-weight models if they are more censored. Though it is still cheaper to use than GPT-5 so price will be a big factor for a while. Even GPT-5 kinda sucks at RP:ing in Finnish so I wonder if I'll ever get a decent Finnish LLM.. SiloAI has done some stuff but it's like several months in the past in the terms of what we have had already. I'm not entirely sure if Gemma 3 is even worse than those Poro 2 models.. They used I think Llama 3.3 70B to generate Finnish data.. even though that model sucks ass in Finnish.. They should have used Gemma 3 27B at the very least because that one is the smallest best model in Finnish at the moment.

But enough about that rant.. DeepSeek V3.1 is an okay model not a huge improvement over the previous one but it's something. Maybe they should add vision to their next model or something or maybe audio... have some competition for GPT-4o..

Links:
https://api-docs.deepseek.com/news/news250821
https://huggingface.co/deepseek-ai/DeepSeek-V3.1
https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base

Print this item

Information Use correct Prefixes
Posted by: minipasila - 09.07.2025, 00:04 - Forum: Media - No Replies

Remember to use the correct Prefix in your thread/post.

If your content was created using AI then use the AI Generated Prefix.
If your content is NSFW then use the NSFW Prefix.

Failure to do so will result in a warning.
Repeat offenses will result in a ban.

Print this item

Photo SmolLM3-3B - Released by HuggingFace
Posted by: minipasila - 08.07.2025, 23:52 - Forum: Large Language Models - No Replies

A new model released by HuggingFace.
Seems to do pretty well in their benchmarks.
One disappointment is lack of multilinguality, in comparison to Gemma 3 it's lacking a lot.
64K context window is pretty good though.

Links:
https://huggingface.co/blog/smollm3
https://huggingface.co/collections/Huggi...e635317e23

Print this item

  Forums are open now!
Posted by: minipasila - 08.07.2025, 23:22 - Forum: Announcements - No Replies

I've decided to create some forums because why not.

Been ages the last time I had one (PuuCraft). I've gotten sick of Discord and Reddit. They always start out great but then comes the shittification. And since these are all centralized platforms there's nothing you can do about it. With forums you can host it yourself and not have to deal with their shit. There are alternatives for both of those services yet almost no one wants to switch.

So here's another forum for some reason. Probably won't do that well but at least I have some alternative from Discord and Reddit.

Edit: Well it was gonna be open but the email server was blocked by Outlook so no emails could be sent... but now it appears to have been fixed so it'll stay open for now.

Print this item

Question What this is for?
Posted by: minipasila - 08.07.2025, 23:08 - Forum: Suggestions - No Replies

Suggestions Forum
You can suggest for instance new forums that might be missing or complain about the theme or whatever else comes to your mind.
Any suggestions are welcome.

Print this item