TY - GEN
T1 - A Comprehensive Evaluation of Tool-Assisted Generation Strategies
AU - Jacovi, Alon
AU - Caciularu, Avi
AU - Herzig, Jonathan
AU - Aharoni, Roee
AU - Bohnet, Bernd
AU - Geva, Mor
N1 - Publisher Copyright: © 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - A growing area of research investigates augmenting language models with tools (e.g., search engines, calculators) to overcome their shortcomings (e.g., missing or incorrect knowledge, incorrect logical inferences). Various few-shot tool-usage strategies have been proposed. However, there is no systematic and fair comparison across different strategies, or between these strategies and strong baselines that do not leverage tools. We conduct an extensive empirical analysis, finding that (1) across various datasets, example difficulty levels, and models, strong no-tool baselines are competitive to tool-assisted strategies, implying that effectively using tools with in-context demonstrations is a difficult unsolved problem; (2) for knowledge-retrieval tasks, strategies that refine incorrect outputs with tools outperform strategies that retrieve relevant information ahead of or during generation; (3) tool-assisted strategies are expensive in the number of tokens they require to work-incurring additional costs by orders of magnitude-which does not translate into significant improvement in performance. Overall, our findings suggest that few-shot tool integration is still an open challenge, emphasizing the need for comprehensive evaluations of future strategies to accurately assess their benefits and costs.
AB - A growing area of research investigates augmenting language models with tools (e.g., search engines, calculators) to overcome their shortcomings (e.g., missing or incorrect knowledge, incorrect logical inferences). Various few-shot tool-usage strategies have been proposed. However, there is no systematic and fair comparison across different strategies, or between these strategies and strong baselines that do not leverage tools. We conduct an extensive empirical analysis, finding that (1) across various datasets, example difficulty levels, and models, strong no-tool baselines are competitive to tool-assisted strategies, implying that effectively using tools with in-context demonstrations is a difficult unsolved problem; (2) for knowledge-retrieval tasks, strategies that refine incorrect outputs with tools outperform strategies that retrieve relevant information ahead of or during generation; (3) tool-assisted strategies are expensive in the number of tokens they require to work-incurring additional costs by orders of magnitude-which does not translate into significant improvement in performance. Overall, our findings suggest that few-shot tool integration is still an open challenge, emphasizing the need for comprehensive evaluations of future strategies to accurately assess their benefits and costs.
UR - http://www.scopus.com/inward/record.url?scp=85183295505&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.findings-emnlp.926
DO - 10.18653/v1/2023.findings-emnlp.926
M3 - منشور من مؤتمر
T3 - Findings of the Association for Computational Linguistics: EMNLP 2023
SP - 13856
EP - 13878
BT - Findings of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Y2 - 6 December 2023 through 10 December 2023
ER -