Which has higher "emotional intelligence," GPT or LLaMA2?

ZhaoXiXi Wed, May 22 2024 11:18 AM EST

A paper published on May 20 in Nature Human Behaviour reveals that in tasks testing the ability to track others' mental states, also known as theory of mind, two types of large language models (LLMs) perform similarly to humans, and in some cases, even better.

Theory of mind is crucial for human social interactions, communication, and empathy. Previous studies have shown that artificial intelligence like large language models can tackle complex cognitive tasks such as multiple-choice decision-making. However, it has been unclear whether LLMs can match or surpass humans in theory of mind tasks, considered a uniquely human ability.

In this new study, James Strachan and colleagues from the University Medical Center Hamburg-Eppendorf in Germany selected tasks that assess different aspects of mental theory, including detecting false beliefs, understanding indirect speech, and recognizing rudeness.

The researchers then compared the performance of 1,907 individuals with two popular LLM families (the GPT model developed by OpenAI in the U.S. and the LLaMA2 model developed by Meta in the U.S.) in completing these tasks.

They found that the GPT model can reach or even exceed human average levels in tasks like identifying indirect requests, false beliefs, and misleading behaviors, while LLaMA2 falls short of human performance. In recognizing rudeness, LLaMA2 outperforms humans, but GPT performs poorly.

Strachan and colleagues suggest that LLaMA2's success is attributed to its lower bias in responses rather than genuine sensitivity to rudeness, while GPT's apparent failure is due to a conservative approach to conclusions rather than reasoning errors.

The researchers caution that LLMs performing comparably to humans in theory of mind tasks does not mean they possess human-like abilities or can master theory of mind. However, they highlight that these results are crucial for future research and recommend further studies on how LLMs' performance in psychological inference may impact individuals' cognition in human-computer interactions.

For more information on the paper, visit: https://doi.org/10.1038/s41562-024-01882-z

pre：Research proposes new strategy to investigate the effect of non-uniform stress on thermal conductivity modulation

next：Researchers have discovered a new method for preparing high-purity fructooligosaccharides using guar gum.

Which has higher "emotional intelligence," GPT or LLaMA2?

Navigation

Related Articles