Why not experiment?

by Dampfinchen - opened about 1 month ago

about 1 month ago

Why does it always have to be 3B activated parameters? That's too little for good performance. My theory is that upping that to 6B would massively improve the quality while it would still be very fast on mainstream systems.

koifish12

about 1 month ago

Why does it always have to be 3B activated parameters? That's too little for good performance. My theory is that upping that to 6B would massively improve the quality while it would still be very fast on mainstream systems.

you are arguing with people that know what they are doing

BigBeavis

27 days ago

Why does it always have to be 3B activated parameters? That's too little for good performance. My theory is that upping that to 6B would massively improve the quality while it would still be very fast on mainstream systems.

you are arguing with people that know what they are doing

it's a valid and obvious question to come up with though, the fact that people "know what they're doing" doesn't mean their approach is to maximize the gains, these small MoE models are primarily made for speed. and to give a hint to @Dampfinchen , you can manually set activated experts in your backend when running locally, so try increasing the number and see for yourself if that makes a difference.

viole

6 days ago

Couple of things:

size of models : number of experts is experimentally decided between a range of values
higher number of experts/active params does not correlate directly to performance (diminishing returns), often experimental based on arch, data, etc.
the MoE model optimization goal also considers being able to run on certain compute requirements while balancing quality (48BA3B should be similar in performance to a 32B-40B class dense model of sim arch/data)
you can learn more about it here: https://www.cerebras.ai/blog/moe-guide-scale and also look into other MoE guides for expert activation math/active param math to learn more

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment