AI Research Papers - AI Explorer

The most prominent AI research papers: **2017** - **Attention Is All You Need** _Authors_: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin _Summary_: This paper introduces the Transformer architecture, which relies entirely on self-attention mechanisms, eliminating the need for recurrence and convolutions in sequence modeling. This model has become foundational in natural language processing tasks. _URL_: [https://arxiv.org/abs/1706.03762](https://arxiv.org/abs/1706.03762)[NeurIPS Papers+3ACM Digital Library+3arXiv+3](https://dl.acm.org/doi/10.5555/3295222.3295349?utm_source=chatgpt.com) **2020** - **Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks** _Authors_: Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela _Summary_: The paper proposes Retrieval-Augmented Generation (RAG), a model that combines pre-trained parametric memory with non-parametric memory for improved performance on knowledge-intensive NLP tasks. RAG achieves state-of-the-art results on several open-domain question-answering benchmarks. _URL_: [https://arxiv.org/abs/2005.11401](https://arxiv.org/abs/2005.11401)[arXiv+4NeurIPS Proceedings+4ACM Digital Library+4](https://proceedings.neurips.cc/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf?utm_source=chatgpt.com) **2023** - **Direct Preference Optimization: Your Language Model is Secretly a Reward Model** _Authors_: Eric Mitchell, Ines Chami, Christopher D. Manning, Chelsea Finn _Summary_: This paper introduces Direct Preference Optimization (DPO), a method for fine-tuning language models to align with human preferences without the need for reinforcement learning or explicit reward modeling. DPO simplifies the alignment process and has shown to be stable and computationally efficient. _URL_: [https://arxiv.org/abs/2305.18290](https://arxiv.org/abs/2305.18290)[arXiv+3OpenReview+3ACM Digital Library+3](https://openreview.net/forum?id=HPuSIXJaa9&utm_source=chatgpt.com) - **Graph of Thoughts: Solving Elaborate Problems with Large Language Models** _Authors_: Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michał Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler _Summary_: The paper introduces the Graph of Thoughts (GoT) framework, which models the information generated by large language models as an arbitrary graph. This approach enhances the problem-solving capabilities of language models by allowing more flexible and structured reasoning. _URL_: [https://arxiv.org/abs/2308.09687](https://arxiv.org/abs/2308.09687)[AI Open Journal](https://ojs.aaai.org/index.php/AAAI/article/view/29720/31236?utm_source=chatgpt.com)[KDnuggets+3arXiv+3Reddit+3](https://arxiv.org/abs/2308.09687?utm_source=chatgpt.com) **2024** - **DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models** _Authors_: DeepSeek AI Team _Summary_: DeepSeekMath 7B is a domain-specific language model that continues pre-training on math-related tokens, achieving a score of 51.7% on the MATH benchmark. This model demonstrates significant advancements in mathematical reasoning capabilities of open-source language models. _URL_: [https://arxiv.org/abs/2402.03300](https://arxiv.org/abs/2402.03300)[LinkedIn+7GitHub+7Astrophysics Data System+7](https://github.com/deepseek-ai/DeepSeek-Math?utm_source=chatgpt.com) - **Group Robust Preference Optimization in Reward-free RLHF** _Authors_: Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj Mehta, Pier Giuseppe Sessa, Haitham Bou-Ammar, Ilija Bogunovic _Summary_: The paper proposes Group Robust Preference Optimization (GRPO), a method to align large language models to individual groups' preferences robustly. GRPO addresses the limitations of traditional reinforcement learning from human feedback approaches that assume a single preference model. _URL_: [https://arxiv.org/abs/2405.20304](https://arxiv.org/abs/2405.20304)[DBLP+1Google Scholar+1](https://dblp.org/rec/journals/corr/abs-2405-20304?utm_source=chatgpt.com)[OpenReview+1arXiv+1](https://openreview.net/forum?id=PRAsjrmXXK&utm_source=chatgpt.com) **2025** - **Tracing the Thoughts of a Large Language Model** _Authors_: Anthropic Research Team _Summary_: This research provides insights into the internal processes of large language models, revealing that models like Claude plan their outputs several words in advance. Understanding these internal mechanisms is crucial for improving transparency and safety in AI systems. _URL_: [https://www.anthropic.com/research/tracing-thoughts-language-model](https://www.anthropic.com/research/tracing-thoughts-language-model)[WIRED+1X (formerly Twitter)+1](https://www.wired.com/story/plaintext-anthropic-claude-brain-research?utm_source=chatgpt.com)[Hacker News+4Time+4Reddit+4](https://time.com/7272092/ai-tool-anthropic-claude-brain-scanner/?utm_source=chatgpt.com) - **Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains** _Authors_: Tencent AI Lab Team _Summary_: The paper extends the study of reinforcement learning with verifiable rewards beyond well-structured tasks, exploring its potential to improve complex reasoning abilities across diverse domains such as medicine, chemistry, and education. _URL_: [https://arxiv.org/abs/2503.23829](https://arxiv.org/abs/2503.23829)[Financial Times+3Papers with Code+3X (formerly Twitter)+3](https://paperswithcode.com/author/zhaopeng-tu?utm_source=chatgpt.com)[arXiv](https://arxiv.org/html/2503.23829v1?utm_source=chatgpt.com)