h2oGPTe Agent Tops GAIA Benchmark Test Results Dec 2024 (Graphic: Business Wire) |
KUALA LUMPUR, Dec 24 (Bernama) -- The leader in open-source generative artificial intelligence (AI) and predictive AI platforms, H2O.ai, announced its h2oGPTe Agent has taken the top position on the General AI Assistants (GAIA) benchmark leaderboard with an unprecedented score of 65 per cent.This milestone places it ahead of competitors, including Google’s Langfun Agent (49 per cent), Microsoft Research (38 per cent), and Hugging Face (33 per cent), setting a new benchmark for general-purpose AI agents.
This achievement solidifies H2O.ai’s leadership in the global race to build intelligent, adaptable AI assistants capable of transforming businesses.
H2O.ai Founder and Chief Executive Officer, Sri Ambati shared his enthusiasm, noting that AI is only 30 per cent away from matching human-level intelligence according to the GAIA benchmark.
He also emphasised the significant leap in performance, with h2oGPTe Agentic AI surpassing the previous record by 15 per cent, outperforming Google DeepMind’s researchers, and beating Microsoft Research’s agent Magentic-1 by 27 per cent.
“Agentic AI is eating SaaS and with h2oGPTe Agentic AI now being generally available, all our enterprise customers can solve a wide range of sophisticated business and research problems,” he said in a statement.
The GAIA benchmark is a critical measure of AI's ability to tackle complex, real-world tasks that require a lot of time, thought and effort from skilled humans.
It involves hundreds of challenges demanding research, data analysis, document handling and reasoning, with degree-holding human respondents achieving a score of 92 per cent and requiring several human-days to solve all 300 test set problems.
H2O.ai's h2oGPTe Agent outpaced competitors by delivering consistent robustness, accuracy and efficiency, highlighting its readiness for enterprise use cases that depend heavily on skilled human assistants.
The company’s success underscores its philosophy of simplicity and adaptability with advanced reasoning and planning, multimodal comprehension, and code execution, offering solutions for complex problems across various industries, further reaffirming its leadership in AI innovation to reshape business workflows with intelligent, adaptable agentic systems.
-- BERNAMA
No comments:
Post a Comment