Ask HN: What does OpenAI's O3 Arc result mean for the field?

12 points by ottaborra 3 days ago | 6 comments

From 1.criticism of how inadequate the transformer architecture is 2.depletion of training data 3.improvements to the attention mechanism 4. AGI supposedly being miles away

where does the new results put the field and what avenues need more focus and what others need less focus to the point of complete cutoff?

null_investor 3 days ago | next |

It isn't as impressive as they'd like you to believe.

They fine tuned it for that test.

What we are seeing is a marketing trick to keep markets and investors excited about AI. It's a trillion dollar industry for NVidia and other players. Fake it until you make it.

If you look deeper, there's very little change since GPT-3.5 and Anthropic have catched up with everything OpenAI has built so far.

Sora was a huge fluke with other companies clearly ahead of it. Also mostly useless.

The numbers don't add up.

ottaborra 3 days ago | root | parent | next |

Is it not true that The Arc test is designed to be one where the rules are dynamic? i.e every one of the tests are different from each other in an absolute sense. Learning about one tells you nothing of substance about the other unless of course you/the model is capable of meta-learning

Finetuning has been looked down upon because all it does is rearrange weight to learn style of the finetuning dataset. It does not teach the model anything which is in contrast to the hopes behind finetuning

If a model was able to ace the arc-test just by the merit of being finetuned, does it not imply there is something of absolute substance here? i.e the model is capable of meta-learning and all it needs to adapt to a new-task is a bit of finetuning which again I emphasize is the loweest tier in the ranks of types of training models

make_it_sure 2 days ago | root | parent |

yeah, you're right, the poster above you is just in denial

With all the hype from o1 and gpt-4o, sonnet-3.5 still performs better on the field. OpenAI, Google, and Qwen has been smashing all these benchmarks, but they have less market share on the field itself compared to the little guy ignoring benchmarks. OpenAI used to be that little guy.

qup 2 days ago | root | parent | prev |

What about if they fine tune it to do your job?

revskill 2 days ago | prev |

Just like human, i expect ai to make mistake on first iterations. But after a time, it is a chaos.