每月都有重磅研究，2024全年值得一读的论文都在这了

机器之心

2025-01-01 12:04发布于北京机器之心官方账号

机器之心报道

机器之心编辑部

2024 年，是 AI 领域让人兴奋的一年。在这一年中，各大科技公司、机构发布了数不胜数的研究。

从年初的 Sora，到年尾 DeepSeek-V3，我们见证了 AI 一轮又一轮的轰炸，AI给我们带来了意想不到的惊喜。

在这一年中，AI 论文被源源不断的产出。对于刚刚过去的 2024 年，有哪些论文值得反复阅读？知名机器学习与 AI 研究者 Sebastian Raschka 整理了一份关于LLM 的阅读清单，清单详细介绍了每个月都有哪些重要论文产出。

一月论文

论文标题：Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

论文链接：https://arxiv.org/abs/2401.00788

论文标题：A Comprehensive Study of Knowledge Editing for Large Language Models

论文链接：https://arxiv.org/abs/2401.01286

论文标题：LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

论文链接：https://arxiv.org/abs/2401.01325

论文链接：https://arxiv.org/abs/2401.01335

论文标题：LLaMA Beyond English: An Empirical Study on Language Capability Transfer

论文链接 https://arxiv.org/abs/2401.01055

论文标题：A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

论文链接：https://arxiv.org/abs/2401.01967

论文标题：LLaMA Pro: Progressive LLaMA with Block Expansion

论文链接：https://arxiv.org/abs/2401.02415

论文标题：LLM Augmented LLMs: Expanding Capabilities through Composition

论文链接：https://arxiv.org/abs/2401.02412

论文链接： https://arxiv.org/abs/2401.02994

论文标题：DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

论文链接：https://arxiv.org/abs/2401.02954

论文标题：Denoising Vision Transformers

论文链接：https://arxiv.org/abs/2401.02957

论文标题：Long Context Compression with Activation Beacon

论文链接：https://arxiv.org/abs/2401.03462

论文链接： https://arxiv.org/abs/2401.04088

论文链接：https://arxiv.org/abs/2401.04081

论文标题：A Minimaximalist Approach to Reinforcement Learning from Human Feedback

论文链接：https://arxiv.org/abs/2401.04056

论文标题：RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

论文链接： https://arxiv.org/abs/2401.04679

论文标题： Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

论文链接：https://arxiv.org/abs/2401.05566

论文标题：Transformers are Multi-State RNNs

论文链接：https://arxiv.org/abs/2401.06104

论文标题：A Closer Look at AUROC and AUPRC under Class Imbalance

论文链接：https://arxiv.org/abs/2401.06091

论文标题：An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

论文链接：https://arxiv.org/abs/2401.06692

论文标题：Tuning Language Models by Proxy

论文链接： https://arxiv.org/abs/2401.08565

论文标题：Scalable Pre-training of Large Autoregressive Image Models

论文链接 https://arxiv.org/abs/2401.08541

论文标题：Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

论文链接https://arxiv.org/abs/2401.08500

论文标题：RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

论文链接： https://arxiv.org/abs/2401.08406

论文标题：ReFT: Reasoning with Reinforced Fine-Tuning

论文链接： https://arxiv.org/abs/2401.08967

论文标题：DiffusionGPT: LLM-Driven Text-to-Image Generation System

论文链接： https://arxiv.org/abs/2401.10061

论文标题：Self-Rewarding Language Models

论文链接：https://arxiv.org/abs/2401.10020

论文链接： https://arxiv.org/abs/2401.10166

论文标题：Knowledge Fusion of Large Language Models

论文链接： https://arxiv.org/abs/2401.10491

论文标题：SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

论文链接：https://arxiv.org/abs/2401.12168

论文标题：WARM: On the Benefits of Weight Averaged Reward Models

论文链接： https://arxiv.org/abs/2401.12187

论文标题： Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

论文链接： https://arxiv.org/abs/2401.12070

论文链接：https://arxiv.org/abs/2401.13660

论文标题：SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

论文链接：https://arxiv.org/abs/2401.13160

论文标题：Rethinking Patch Dependence for Masked Autoencoders

论文链接：https://arxiv.org/abs/2401.14391

论文标题：Pix2gestalt: Amodal Segmentation by Synthesizing Wholes

论文链接：https://arxiv.org/abs/2401.14398

论文标题：Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

论文链接：https://arxiv.org/abs/2401.14405

论文标题：EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

论文链接：https://arxiv.org/abs/2401.15077

论文链接：https://arxiv.org/abs/2401.15947

论文标题：Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

论文链接： https://arxiv.org/abs/2401.16380

论文标题：KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

论文链接：https://arxiv.org/abs/2401.18079

二月论文

论文标题：Efficient Exploration for LLMs

论文链接：https://arxiv.org/abs/2402.00396

论文标题：OLMo: Accelerating the Science of Language Models

论文链接：https://arxiv.org/abs/2402.00838

论文标题：Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

论文链接：https://arxiv.org/abs/2402.00841

论文标题：Repeat After Me: Transformers are Better than State Space Models at Copying

论文链接：https://arxiv.org/abs/2402.01032

论文标题：LiPO: Listwise Preference Optimization through Learning-to-Rank

论文链接：https://arxiv.org/abs/2402.01878

论文标题：FindingEmo: An Image Dataset for Emotion Recognition in the Wild

论文链接： https://arxiv.org/abs/2402.01355

论文链接：https://arxiv.org/abs/2402.05120

论文标题：DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

论文链接： https://arxiv.org/abs/2402.03300

论文标题：MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

论文链接： https://arxiv.org/abs/2402.03766

论文标题：A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention

论文链接：https://arxiv.org/abs/2402.03902

论文标题：Scaling Laws for Downstream Task Performance of Large Language Models

论文链接：https://arxiv.org/abs/2402.04177

论文标题：MOMENT: A Family of Open Time-series Foundation Models

论文链接： https://arxiv.org/abs/2402.03885

论文标题：Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

论文链接：https://arxiv.org/abs/2402.03749

论文标题：Self-Discover: Large Language Models Self-Compose Reasoning Structures

论文链接：https://arxiv.org/abs/2402.03620

论文标题：Grandmaster-Level Chess Without Search

论文链接： https://arxiv.org/abs/2402.04494

论文标题：Direct Language Model Alignment from Online AI Feedback

论文链接： https://arxiv.org/abs/2402.04792

论文标题：Buffer Overflow in Mixture of Experts

论文链接： https://arxiv.org/abs/2402.05526

论文标题：The Boundary of Neural Network Trainability is Fractal

论文链接： https://arxiv.org/abs/2402.06184

论文标题：ODIN: Disentangled Reward Mitigates Hacking in RLHF

论文链接： https://arxiv.org/abs/2402.07319

论文标题：Policy Improvement using Language Feedback Models

论文链接： https://arxiv.org/abs/2402.07876

论文标题：Scaling Laws for Fine-Grained Mixture of Experts

论文链接：https://arxiv.org/abs/2402.07871

论文标题：Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

论文链接： https://arxiv.org/abs/2402.07610

论文标题：Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping

论文链接： https://arxiv.org/abs/2402.07610

论文标题：Suppressing Pink Elephants with Direct Principle Feedback

论文链接： https://arxiv.org/abs/2402.07896

论文标题：World Model on Million-Length Video And Language With RingAttention

论文链接：https://arxiv.org/abs/2402.08268

论文标题：Mixtures of Experts Unlock Parameter Scaling for Deep RL

论文链接： https://arxiv.org/abs/2402.08609

论文标题：DoRA: Weight-Decomposed Low-Rank Adaptation

论文链接：https://arxiv.org/abs/2402.09353

论文标题：Transformers Can Achieve Length Generalization But Not Robustly

论文链接： https://arxiv.org/abs/2402.09371

论文标题：BASE TTS: Lessons From Building a Billion-Parameter Text-to-Speech Model on 100K Hours of Data

论文链接：https://arxiv.org/abs/2402.08093

论文标题：Recovering the Pre-Fine-Tuning Weights of Generative Models

论文链接： https://arxiv.org/abs/2402.10208

论文标题：Generative Representational Instruction Tuning

论文链接： https://arxiv.org/abs/2402.09906

论文标题：FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

论文链接： https://arxiv.org/abs/2402.10986

论文链接： https://arxiv.org/abs/2402.11295

论文标题：LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

论文链接：https://arxiv.org/abs/2402.11550

论文标题：Reformatted Alignment

论文链接： https://arxiv.org/abs/2402.12219

论文链接： https://arxiv.org/abs/2402.12226

论文标题：Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs

论文链接： https://arxiv.org/abs/2402.12030

论文标题：LoRA+: Efficient Low Rank Adaptation of Large Models

论文链接： https://arxiv.org/abs/2402.12354

论文链接： https://arxiv.org/abs/2402.13144

论文链接：https://arxiv.org/abs/2402.13616

论文标题：LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

论文标题：https://arxiv.org/abs/2402.13753

论文标题：Large Language Models for Data Annotation: A Survey

论文链接：https://arxiv.org/abs/2402.13446

论文标题：TinyLLaVA: A Framework of Small-scale Large Multimodal Models

论文链接：https://arxiv.org/abs/2402.14289

论文标题：Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

论文链接：https://arxiv.org/abs/2402.14740

论文链接：https://arxiv.org/abs/2402.15391

论文标题：CARTE: Pretraining and Transfer for Tabular Learning

论文链接：https://arxiv.org/abs/2402.16785

论文链接：https://arxiv.org/abs/2402.17764

论文标题：Sora Generates Videos with Stunning Geometrical Consistency

论文链接：https://arxiv.org/abs/2402.17403

论文标题：When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

论文链接：https://arxiv.org/abs/2402.17193

论文链接：https://arxiv.org/abs/2402.19427

三月论文

论文标题：Learning and Leveraging World Models in Visual Representation Learning

论文链接： https://arxiv.org/abs/2403.00504

论文标题：Improving LLM Code Generation with Grammar Augmentation

论文链接： https://arxiv.org/abs/2403.01632

论文标题：The Hidden Attention of Mamba Models

论文链接： https://arxiv.org/abs/2403.01590

论文标题：Training-Free Pretrained Model Merging

论文链接： https://arxiv.org/abs/2403.01753

论文标题：Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

论文链接： https://arxiv.org/abs/2403.02308

论文标题：The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

论文链接：https://arxiv.org/abs/2403.03218

论文标题：Evolution Transformer: In-Context Evolutionary Optimization

论文链接： https://arxiv.org/abs/2403.02985

论文标题：Enhancing Vision-Language Pre-training with Rich Supervisions

论文链接： https://arxiv.org/abs/2403.03346

论文标题：Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

论文链接：https://arxiv.org/abs/2403.03206

论文标题：Design2Code: How Far Are We From Automating Front-End Engineering?

论文链接： https://arxiv.org/abs/2403.03163

论文标题：ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

论文链接： https://arxiv.org/abs/2403.03853

论文标题：Backtracing: Retrieving the Cause of the Query

论文链接： https://arxiv.org/abs/2403.03956

论文标题：Learning to Decode Collaboratively with Multiple Language Models

论文链接： https://arxiv.org/abs/2403.03870

论文标题：SaulLM-7B: A pioneering Large Language Model for Law

论文链接： https://arxiv.org/abs/2403.03883

论文标题：Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

论文链接： https://arxiv.org/abs/2403.03864

论文标题：3D Diffusion Policy

论文链接： https://arxiv.org/abs/2403.03954

论文标题：MedMamba: Vision Mamba for Medical Image Classification

论文链接： https://arxiv.org/abs/2403.03849

论文链接： https://arxiv.org/abs/2403.03507

论文标题：Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

论文链接： https://arxiv.org/abs/2403.03950

论文标题：How Far Are We from Intelligent Visual Deductive Reasoning?

论文链接：https://arxiv.org/abs/2403.04732

论文标题：Common 7B Language Models Already Possess Strong Math Capabilities

论文链接：https://arxiv.org/abs/2403.04706

论文链接： https://arxiv.org/abs/2403.05530

论文标题：Is Cosine-Similarity of Embeddings Really About Similarity?

论文链接：https://arxiv.org/abs/2403.05440

论文标题：LLM4Decompile: Decompiling Binary Code with Large Language Models

论文链接： https://arxiv.org/abs/2403.05286

论文标题：Algorithmic Progress in Language Models

论文链接：https://arxiv.org/abs/2403.05812

论文标题：Stealing Part of a Production Language Model

论文链接： https://arxiv.org/abs/2403.06634

论文标题：Chronos: Learning the Language of Time Series

论文链接：https://arxiv.org/abs/2403.07815

论文标题：Simple and Scalable Strategies to Continually Pre-train Large Language Models

论文链接：https://arxiv.org/abs/2403.08763

论文标题：Language Models Scale Reliably With Over-Training and on Downstream Tasks

论文链接：https://arxiv.org/abs/2403.08540

论文标题：BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

论文链接：https://arxiv.org/abs/2403.09347

论文标题： LocalMamba: Visual State Space Model with Windowed Selective Scan

论文链接：https://arxiv.org/abs/2403.09338

论文标题：GiT: Towards Generalist Vision Transformer through Universal Language Interface

论文链接：https://arxiv.org/abs/2403.09394

论文标题：MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

论文链接： https://arxiv.org/abs/2403.09611

论文标题： RAFT: Adapting Language Model to Domain Specific RAG

论文链接： https://arxiv.org/abs/2403.10131

论文标题：TnT-LLM: Text Mining at Scale with Large Language Models

论文链接： https://arxiv.org/abs/2403.12173

论文标题： Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

论文链接： https://arxiv.org/abs/2403.15447

论文标题： PERL: Parameter Efficient Reinforcement Learning from Human Feedback

论文链接： https://arxiv.org/abs/2403.10704

论文标题：RewardBench: Evaluating Reward Models for Language Modeling

论文链接：https://arxiv.org/abs/2403.13787

论文标题：LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

论文链接： https://arxiv.org/abs/2403.13372

论文标题：RakutenAI-7B: Extending Large Language Models for Japanese

论文链接： https://arxiv.org/abs/2403.15484

论文标题：SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series

论文链接：https://arxiv.org/abs/2403.15360

论文标题：Can Large Language Models Explore In-Context?

论文链接：https://arxiv.org/abs/2403.15371

论文标题：LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

论文链接：https://arxiv.org/abs/2403.15042

论文标题： LLM Agent Operating System

论文链接：https://arxiv.org/abs/2403.16971

论文标题：The Unreasonable Ineffectiveness of the Deeper Layers

论文链接：https://arxiv.org/abs/2403.17887

论文标题：BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

论文链接：https://arxiv.org/abs/2403.18421

论文标题：ViTAR: Vision Transformer with Any Resolution

论文链接：https://arxiv.org/abs/2403.18361

论文标题：Long-form Factuality in Large Language Models

论文链接：https://arxiv.org/abs/2403.18802

论文标题：Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

论文链接： https://arxiv.org/abs/2403.18814

论文标题：LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

论文链接：https://arxiv.org/abs/2403.17919

论文标题：Mechanistic Design and Scaling of Hybrid Architectures

论文链接：https://arxiv.org/abs/2403.17844

论文标题：MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

论文链接：https://arxiv.org/abs/2403.19651

论文标题：Model Stock: All We Need Is Just a Few Fine-Tuned Models

论文链接：https://arxiv.org/abs/2403.19522

四月论文

论文标题： Do Language Models Plan Ahead for Future Tokens?

论文链接： https://arxiv.org/abs/2404.00859

论文标题：Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

论文链接：https://arxiv.org/abs/2404.01367

论文标题：The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis

论文链接： https://arxiv.org/abs/2404.01204

论文标题：Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

论文链接：https://arxiv.org/abs/2404.04478

论文标题：Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models

论文链接：https://arxiv.org/abs/2404.02258

论文标题：Long-context LLMs Struggle with Long In-context Learning

论文链接：https://arxiv.org/abs/2404.02060

论文标题：Emergent Abilities in Reduced-Scale Generative Language Models

论文链接： https://arxiv.org/abs/2404.02204

论文标题：Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

论文链接： https://arxiv.org/abs/2404.02151

论文标题：On the Scalability of Diffusion-based Text-to-Image Generation

论文链接： https://arxiv.org/abs/2404.02883

论文标题：BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models

论文链接： https://arxiv.org/abs/2404.02827

论文标题：Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

论文链接： https://arxiv.org/abs/2404.02747

论文标题：Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

论文链接： https://arxiv.org/abs/2404.02151

论文标题：Training LLMs over Neurally Compressed Text

论文链接： https://arxiv.org/abs/2404.03626

论文标题：CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

论文链接： https://arxiv.org/abs/2404.03820

论文标题：ReFT: Representation Finetuning for Language Models

论文链接： https://arxiv.org/abs/2404.03592

论文标题：Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

论文链接： https://arxiv.org/abs/2404.03862

论文标题：Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

论文链接： https://arxiv.org/abs/2404.04256

论文标题：AutoCodeRover: Autonomous Program Improvement

论文链接： https://arxiv.org/abs/2404.05427

论文标题：Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

论文链接： https://arxiv.org/abs/2404.05892

论文标题：CodecLM: Aligning Language Models with Tailored Synthetic Data

论文链接： https://arxiv.org/abs/2404.05875

论文标题：MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

论文链接： https://arxiv.org/abs/2404.06395

论文标题：Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models

论文链接： https://arxiv.org/abs/2404.06209

论文标题：LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

论文链接： https://arxiv.org/abs/2404.05961

论文标题：Adapting LLaMA Decoder to Vision Transformer

论文链接： https://arxiv.org/abs/2404.06773

论文标题： Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

论文链接： https://arxiv.org/abs/2404.07143

论文标题：LLoCO: Learning Long Contexts Offline

论文链接： https://arxiv.org/abs/2404.07979

论文标题：JetMoE: Reaching Llama2 Performance with 0.1M Dollars

论文链接： https://arxiv.org/abs/2404.07413

论文标题： Best Practices and Lessons Learned on Synthetic Data for Language Models

论文链接： https://arxiv.org/abs/2404.07503

论文标题：Rho-1: Not All Tokens Are What You Need

论文链接： https://arxiv.org/abs/2404.07965

论文标题：Pre-training Small Base LMs with Fewer Tokens

论文链接： https://arxiv.org/abs/2404.08634

论文标题：Dataset Reset Policy Optimization for RLHF

论文链接： https://arxiv.org/abs/2404.08495

论文标题：LLM In-Context Recall is Prompt Dependent

论文链接： https://arxiv.org/abs/2404.08865

论文标题：State Space Model for New-Generation Network Alternative to Transformers: A Survey

论文链接： https://arxiv.org/abs/2404.09516

论文标题：Chinchilla Scaling: A Replication Attempt

论文链接： https://arxiv.org/abs/2404.10102

论文标题：Learn Your Reference Model for Real Good Alignment

论文链接： https://arxiv.org/abs/2404.09656

论文标题：Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

论文链接： https://arxiv.org/abs/2404.10719

论文标题：Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

论文链接： https://arxiv.org/abs/2404.08197

论文标题：How Faithful Are RAG Models? Quantifying the Tug-of-War Between RAG and LLMs’ Internal Prior

论文链接： https://arxiv.org/abs/2404.10198

论文标题：A Survey on Retrieval-Augmented Text Generation for Large Language Models

论文链接：https://arxiv.org/abs/2404.10981

论文标题：When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes

论文链接： https://arxiv.org/abs/2404.12365

论文标题：Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

论文链接： https://arxiv.org/abs/2404.12253

论文标题：OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

论文链接： https://arxiv.org/abs/2404.12195

论文标题：The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

论文链接： https://arxiv.org/abs/2404.13208

论文标题：An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs

论文链接： https://arxiv.org/abs/2404.14047

论文链接： https://arxiv.org/abs/2404.14219

论文链接： https://arxiv.org/abs/2404.14619

论文标题： A Survey on Self-Evolution of Large Language Models

论文链接： https://arxiv.org/abs/2404.14662

论文标题： Multi-Head Mixture-of-Experts

论文链接： https://arxiv.org/abs/2404.15045

论文标题：NExT: Teaching Large Language Models to Reason about Code Execution

论文链接： https://arxiv.org/abs/2404.14662

论文标题：Graph Machine Learning in the Era of Large Language Models (LLMs)

论文链接： https://arxiv.org/abs/2404.14928

论文标题：Retrieval Head Mechanistically Explains Long-Context Factuality

论文链接： https://arxiv.org/abs/2404.15574

论文标题：Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

论文链接： https://arxiv.org/abs/2404.16710

论文标题：Make Your LLM Fully Utilize the Context

论文链接：https://arxiv.org/abs/2404.16811

论文标题：LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

论文链接： https://arxiv.org/abs/2405.00732

论文标题：Better & Faster Large Language Models via Multi-token Prediction

论文链接： https://arxiv.org/abs/2404.19737

论文标题：RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing

论文链接： https://arxiv.org/abs/2404.19543

论文标题：A Primer on the Inner Workings of Transformer-based Language Models

论文链接： https://arxiv.org/abs/2405.00208

论文标题：When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively

论文链接：https://arxiv.org/abs/2404.19705

论文链接： https://arxiv.org/abs/2404.19756

五月论文

论文标题：Is Bigger Edit Batch Size Always Better? An Empirical Study on Model Editing with Llama-3

论文链接：https://arxiv.org/abs/2405.00664

论文链接： https://arxiv.org/abs/2405.00675

论文标题：A Careful Examination of Large Language Model Performance on Grade School Arithmetic

论文链接： https://arxiv.org/abs/2405.00332

论文标题：Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

论文链接： https://arxiv.org/abs/2405.01535

论文标题：What Matters When Building Vision-Language Models?

论文链接： https://arxiv.org/abs/2405.02246

论文标题：Is Flash Attention Stable?

论文链接：https://arxiv.org/abs/2405.02803

论文标题：vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

论文链接： https://arxiv.org/abs/2405.04437

论文链接：https://arxiv.org/abs/2405.04517

论文标题：You Only Cache Once: Decoder-Decoder Architectures for Language Models

论文链接： https://arxiv.org/abs/2405.05254

论文链接： https://arxiv.org/abs/2405.04434

论文标题：Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

论文标题： https://arxiv.org/abs/2405.05417

论文标题：Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

论文链接：https://arxiv.org/abs/2405.05904

论文标题：Value Augmented Sampling for Language Model Alignment and Personalization

论文标题： https://arxiv.org/abs/2405.06639

论文标题：PHUDGE: Phi-3 as Scalable Judge

论文链接： https://arxiv.org/abs/2405.08029

论文标题：RLHF Workflow: From Reward Modeling to Online RLHF

论文链接：https://arxiv.org/abs/2405.07863

论文标题：LoRA Learns Less and Forgets Less

论文链接：https://arxiv.org/abs/2405.09673

论文标题：Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

论文链接：https://arxiv.org/abs/2405.09215

论文标题：Chameleon: Mixed-Modal Early-Fusion Foundation Models

论文链接： https://arxiv.org/abs/2405.09818

论文标题：Towards Modular LLMs by Building and Reusing a Library of LoRAs

论文链接：https://arxiv.org/abs/2405.11157

论文标题：SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

论文链接：https://arxiv.org/abs/2405.11582

论文标题：MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

论文链接：https://arxiv.org/abs/2405.12130

论文链接：https://arxiv.org/abs/2405.13956

论文标题：Dense Connector for MLLMs

论文链接： https://arxiv.org/abs/2405.13800

论文标题：AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

论文链接： https://arxiv.org/abs/2405.14129

论文标题： SimPO: Simple Preference Optimization with a Reference-Free Reward

论文链接： https://arxiv.org/abs/2405.14734

论文标题：Instruction Tuning With Loss Over Instructions

论文链接：https://arxiv.org/abs/2405.14394

论文标题：The Road Less Scheduled

论文链接：https://arxiv.org/abs/2405.15682

论文标题：Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

论文链接： https://arxiv.org/abs/2405.15319

论文标题：gzip Predicts Data-dependent Scaling Laws

论文链接：https://arxiv.org/abs/2405.16684

论文标题：Trans-LoRA: Towards Data-free Transferable Parameter Efficient Finetuning

论文链接： https://arxiv.org/abs/2405.17258

论文标题：VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

论文链接：https://arxiv.org/abs/2405.17991

论文标题：LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models

论文链接： https://arxiv.org/abs/2405.18377

论文标题：Contextual Position Encoding: Learning to Count What’s Important

论文链接：https://arxiv.org/abs/2405.18719

六月论文

论文标题：Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback

论文链接： https://arxiv.org/abs/2406.00888

论文标题：Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

论文链接：https://arxiv.org/abs/2406.06563

论文标题：OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

论文链接：https://arxiv.org/abs/2406.01775

论文标题：The Geometry of Categorical and Hierarchical Concepts in Large Language Models

论文链接： https://arxiv.org/abs/2406.01506

论文标题：Towards Scalable Automated Alignment of LLMs: A Survey

论文链接：https://arxiv.org/abs/2406.01252

论文标题：Scalable MatMul-free Language Modeling

论文链接：https://arxiv.org/abs/2406.02528

论文标题：Block Transformer: Global-to-Local Language Modeling for Fast Inference

论文链接： https://arxiv.org/abs/2406.02657

论文标题：Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

论文链接：https://arxiv.org/abs/2406.04271

论文标题：The Prompt Report: A Systematic Survey of Prompting Techniques

论文链接： https://arxiv.org/abs/2406.06608

论文标题：Transformers Need Glasses! Information Over-Squashing in Language Tasks

论文链接： https://arxiv.org/abs/2406.04267

论文标题：Are We Done with MMLU?

论文链接：https://arxiv.org/abs/2406.04127

论文标题：Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

论文链接： https://arxiv.org/abs/2406.04314

论文标题：Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

论文链接： https://arxiv.org/abs/2406.04594

论文标题：CRAG – Comprehensive RAG Benchmark

论文链接：https://arxiv.org/abs/2406.04744

论文标题：WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

论文链接： https://arxiv.org/abs/2406.04770

论文标题：Mixture-of-Agents Enhances Large Language Model Capabilities

论文链接：https://arxiv.org/abs/2406.04692

论文标题：BERTs are Generative In-Context Learners

论文链接：https://arxiv.org/abs/2406.04823

论文标题：3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

论文链接： https://arxiv.org/abs/2406.05132

论文标题：Creativity Has Left the Chat: The Price of Debiasing Language Models

论文链接：https://arxiv.org/abs/2406.05587

论文标题：Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

论文链接： https://arxiv.org/abs/2406.06525

论文标题：Margin-aware Preference Optimization for Aligning Diffusion Models Without Reference

论文链接： https://arxiv.org/abs/2406.06424

论文标题：Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

论文链接： https://arxiv.org/abs/2406.06469

论文标题： Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

论文链接： https://arxiv.org/abs/2406.05955

论文标题：Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

论文链接： https://arxiv.org/abs/2406.06326

论文标题：An Image is Worth 32 Tokens for Reconstruction and Generation

论文链接： https://arxiv.org/abs/2406.07550

论文标题：TextGrad: Automatic “Differentiation” via Text

论文链接：https://arxiv.org/abs/2406.07496

论文标题：Simple and Effective Masked Diffusion Language Models

论文链接：https://arxiv.org/abs/2406.07524

论文标题：Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent “Middle” Enhancement

论文链接：https://arxiv.org/abs/2406.07138

论文标题：Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

论文链接： https://arxiv.org/abs/2406.07522

论文标题：Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

论文链接： https://arxiv.org/abs/2406.08464

论文标题：What If We Recaption Billions of Web Images with LLaMA-3?

论文链接：https://arxiv.org/abs/2406.08478

论文标题：Large Language Model Unlearning via Embedding-Corrupted Prompts

论文链接：https://arxiv.org/abs/2406.07933

论文标题：Large Language Models Must Be Taught to Know What They Don’t Know

论文链接： https://arxiv.org/abs/2406.08391

论文标题：An Empirical Study of Mamba-based Language Models

论文链接：https://arxiv.org/abs/2406.07887

论文标题： Discovering Preference Optimization Algorithms with and for Large Language Models

论文链接： https://arxiv.org/abs/2406.08414

论文标题：Transformers Meet Neural Algorithmic Reasoners

论文链接： https://arxiv.org/abs/2406.09308

论文标题：MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

论文链接： https://arxiv.org/abs/2406.09297

论文标题：An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

论文链接： https://arxiv.org/abs/2406.09415

论文标题：FouRA: Fourier Low Rank Adaptation

论文链接：https://arxiv.org/abs/2406.08798

论文标题： Bootstrapping Language Models with DPO Implicit Rewards

论文链接：https://arxiv.org/abs/2406.09760

论文标题：Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs

论文链接： https://arxiv.org/abs/2406.10209

论文标题：Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

论文链接： https://arxiv.org/abs/2406.10216

论文标题：THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

论文链接：https://arxiv.org/abs/2406.10996

论文标题：Task Me Anything

论文链接： https://arxiv.org/abs/2406.11775

论文标题：How Do Large Language Models Acquire Factual Knowledge During Pretraining?

论文链接： https://arxiv.org/abs/2406.11813

论文标题：mDPO: Conditional Preference Optimization for Multimodal Large Language Models

论文链接： https://arxiv.org/abs/2406.11839

论文链接：https://arxiv.org/abs/2406.11704

论文标题：DataComp-LM: In Search of the Next Generation of Training Sets for Language Models

论文链接：https://arxiv.org/abs/2406.11794

论文标题：Tokenization Falling Short: The Curse of Tokenization

论文链接： https://arxiv.org/abs/2406.11687

论文标题： DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

论文链接： https://arxiv.org/abs/2406.11931

论文标题：Unveiling Encoder-Free Vision-Language Models

论文链接：https://arxiv.org/abs/2406.11832

论文标题：Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level

论文链接： https://arxiv.org/abs/2406.11817

论文标题：HARE: HumAn pRiors, a key to small language model Efficiency

论文链接：https://arxiv.org/abs/2406.11410

论文标题：Measuring memorization in RLHF for code completion

论文链接： https://arxiv.org/abs/2406.11715

论文标题：Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

论文链接： https://arxiv.org/abs/2406.12034

论文标题：From RAGs to Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information for Factual Queries

论文链接： https://arxiv.org/abs/2406.12824

论文标题：Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

论文链接： https://arxiv.org/abs/2406.12624

论文标题：Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

论文链接： https://arxiv.org/abs/2406.13121

论文标题：Instruction Pre-Training: Language Models are Supervised Multitask Learners

论文链接： https://arxiv.org/abs/2406.14491

论文标题：Can LLMs Learn by Teaching? A Preliminary Study

论文链接：https://arxiv.org/abs/2406.14629

论文标题：A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

论文链接：https://arxiv.org/abs/2406.14972

论文标题： LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

论文链接： https://arxiv.org/abs/2406.15319

论文标题：MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

论文链接： https://arxiv.org/abs/2406.14909

论文标题：Efficient Continual Pre-training by Mitigating the Stability Gap

论文链接：https://arxiv.org/abs/2406.14833

论文标题：Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers

论文链接： https://arxiv.org/abs/2406.16747

论文标题：WARP: On the Benefits of Weight Averaged Rewarded Policies

论文链接：https://arxiv.org/abs/2406.16768

论文标题：Adam-mini: Use Fewer Learning Rates To Gain More

论文链接：https://arxiv.org/abs/2406.16793

论文标题：The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

论文链接： https://arxiv.org/abs/2406.17557

论文标题：LongIns: A Challenging Long-context Instruction-based Exam for LLMs

论文链接： https://arxiv.org/abs/2406.17588

论文标题：Following Length Constraints in Instructions

论文链接：https://arxiv.org/abs/2406.17744

论文标题：A Closer Look into Mixture-of-Experts in Large Language Models

论文链接：https://arxiv.org/abs/2406.18219

论文标题： RouteLLM: Learning to Route LLMs with Preference Data

论文链接： https://arxiv.org/abs/2406.18665

论文标题：Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

论文链接： https://arxiv.org/abs/2406.18629

论文标题：Dataset Size Recovery from LoRA Weights

论文链接： https://arxiv.org/abs/2406.19395

论文标题：From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

论文链接： https://arxiv.org/abs/2406.19292

论文标题：Changing Answer Order Can Decrease MMLU Accuracy

论文链接： https://arxiv.org/abs/2406.19470

论文标题：Direct Preference Knowledge Distillation for Large Language Models

论文链接： https://arxiv.org/abs/2406.19774

论文标题：LLM Critics Help Catch LLM Bugs

论文链接：https://arxiv.org/abs/2407.00215

论文标题：Scaling Synthetic Data Creation with 1,000,000,000 Personas

论文链接： https://arxiv.org/abs/2406.20094

七月论文

论文标题：LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives

论文链接：https://arxiv.org/abs/2407.01490

论文标题：Searching for Best Practices in Retrieval-Augmented Generation

论文链接：https://arxiv.org/abs/2407.01219

论文标题：Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

论文链接：https://arxiv.org/abs/2407.01906

论文链接：https://arxiv.org/abs/2407.01392

论文标题：Eliminating Position Bias of Language Models: A Mechanistic Approach

论文链接：https://arxiv.org/abs/2407.01100

论文标题：JMInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

论文链接：https://arxiv.org/abs/2407.02490

论文标题：TokenPacker: Efficient Visual Projector for Multimodal LLM

论文链接：https://arxiv.org/abs/2407.02392

论文标题：Reasoning in Large Language Models: A Geometric Perspective

论文链接：https://arxiv.org/abs/2407.02678

论文标题：RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

论文链接：https://arxiv.org/abs/2407.02485

论文标题：AgentInstruct: Toward Generative Teaching with Agentic Flows

论文链接：https://arxiv.org/abs/2407.03502

论文标题：HEMM: Holistic Evaluation of Multimodal Foundation Models

论文链接：https://arxiv.org/abs/2407.03418

论文链接：https://arxiv.org/abs/2407.04153

论文链接：https://arxiv.org/abs/2407.04620

论文链接：https://arxiv.org/abs/2407.06581

论文标题：Self-Recognition in Language Models

论文链接：https://arxiv.org/abs/2407.06946

论文标题：Inference Performance Optimization for Large Language Models on CPUs

论文链接：https://arxiv.org/abs/2407.07304

论文标题：Gradient Boosting Reinforcement Learning

论文链接：https://arxiv.org/abs/2407.08250

论文链接：https://arxiv.org/abs/2407.08608

论文标题：SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

论文链接：https://arxiv.org/abs/2407.09025

论文标题：New Desiderata for Direct Preference Optimization

论文链接：https://arxiv.org/abs/2407.09072

论文标题：Context Embeddings for Efficient Answer Generation in RAG

论文链接：https://arxiv.org/abs/2407.09252

论文标题：Qwen2 Technical Report

论文链接：https://arxiv.org/abs/2407.10671

论文标题：The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

论文链接：https://arxiv.org/abs/2407.10457

论文标题：From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

论文链接：https://arxiv.org/abs/2407.11239

论文标题：GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression

论文链接：https://arxiv.org/abs/2407.12077

论文标题：Scaling Diffusion Transformers to 16 Billion Parameters

论文链接：https://arxiv.org/abs/2407.11633

论文标题：NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

论文链接：https://arxiv.org/abs/2407.11963

论文标题：Patch-Level Training for Large Language Models

论文链接：https://arxiv.org/abs/2407.12665

论文链接：https://arxiv.org/abs/2407.12772

论文标题：A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks

论文链接：https://arxiv.org/abs/2407.12994

论文标题：Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

论文链接：https://arxiv.org/abs/2407.12327

论文标题：Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation

论文链接：https://arxiv.org/abs/2407.13481

论文标题：Weak-to-Strong Reasoning

论文链接：https://arxiv.org/abs/2407.13647

论文标题：Understanding Reference Policies in Direct Preference Optimization

论文链接：https://arxiv.org/abs/2407.13709

论文标题：Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

论文链接：https://arxiv.org/abs/2407.13623

论文标题：BOND: Aligning LLMs with Best-of-N Distillation

论文链接：https://arxiv.org/abs/2407.14622

论文标题：Compact Language Models via Pruning and Knowledge Distillation

论文链接：https://arxiv.org/abs/2407.14679

论文链接：https://arxiv.org/abs/2407.14057

论文标题：Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training

论文链接：https://arxiv.org/abs/2407.15892

论文标题：DDK: Distilling Domain Knowledge for Efficient Large Language Models

论文链接：https://arxiv.org/abs/2407.16154

论文标题：Generation Constraint Scaling Can Mitigate Hallucination

论文链接：https://arxiv.org/abs/2407.16908

论文标题：Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

论文链接：https://arxiv.org/abs/2407.16833

论文标题：Course-Correction: Safety Alignment Using Synthetic Preferences

论文链接：https://arxiv.org/abs/2407.16637

论文标题：Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

论文链接：https://arxiv.org/abs/2407.16607

论文链接：https://arxiv.org/abs/2407.19594

论文标题：Improving Retrieval Augmented Language Model with Self-Reasoning

论文链接：https://arxiv.org/abs/2407.19813

论文链接：https://arxiv.org/abs/2407.21075

论文标题：ThinK: Thinner Key Cache by Query-Driven Pruning

论文链接：https://arxiv.org/abs/2407.21018

论文链接：https://arxiv.org/abs/2407.21783

论文链接：https://arxiv.org/abs/2408.00118

八月论文

论文链接：https://arxiv.org/abs/2408.00714

论文标题：POA: Pre-training Once for Models of All Sizes

论文链接：https://arxiv.org/abs/2408.01031

论文标题：RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework

论文链接：https://arxiv.org/abs/2408.01262

论文标题：A Survey of Mamba

论文链接：https://arxiv.org/abs/2408.01129

论文标题：MiniCPM-V: A GPT-4V Level MLLM on Your Phone

论文链接：https://arxiv.org/abs/2408.01800

论文标题：RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

论文链接：https://arxiv.org/abs/2408.02545

论文标题：Self-Taught Evaluators

论文链接：https://arxiv.org/abs/2408.02666

论文标题：BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba

论文链接：https://arxiv.org/abs/2408.02600

论文标题：EXAONE 3.0 7.8B Instruction Tuned Language Model

论文链接：https://arxiv.org/abs/2408.03541

论文标题：1.5-Pints Technical Report: Pretraining in Days, Not Months – Your Language Model Thrives on Quality Data

论文链接：https://arxiv.org/abs/2408.03506

论文标题：Conversational Prompt Engineering

论文链接：https://arxiv.org/abs/2408.04560

论文标题：Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

论文链接：https://arxiv.org/abs/2408.04303

论文链接：https://arxiv.org/abs/2408.06292

论文标题：Hermes 3 Technical Report

论文链接：https://arxiv.org/abs/2408.12570

论文标题：Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

论文链接：https://arxiv.org/abs/2408.10159

论文标题：Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information

论文链接：https://arxiv.org/abs/2408.10615

论文链接：https://arxiv.org/abs/2408.10914

论文标题：LLM Pruning and Distillation in Practice: The Minitron Approach

论文链接：https://arxiv.org/abs/2408.11796

论文标题：Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

论文链接：https://arxiv.org/abs/2408.12570

论文标题：Controllable Text Generation for Large Language Models: A Survey

论文链接：https://arxiv.org/abs/2408.12599

论文标题：Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

论文链接：https://arxiv.org/abs/2408.13233

论文标题：A Practitioner's Guide to Continual Multimodal Pretraining

论文链接：https://arxiv.org/abs/2408.14471

论文标题：Building and better understanding vision-language models: insights and future directions

论文链接：https://arxiv.org/abs/2408.12637

论文标题：CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation

论文链接：https://arxiv.org/abs/2408.14572

论文链接：https://arxiv.org/abs/2408.15237

论文标题：ReMamba: Equip Mamba with Effective Long-Sequence Modeling

论文链接：https://arxiv.org/abs/2408.15496

论文标题：Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

论文链接：https://arxiv.org/abs/2408.16737

论文标题：LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

论文链接：https://arxiv.org/abs/2409.00509

九月论文

论文链接：https://arxiv.org/abs/2409.02060

论文标题：In Defense of RAG in the Era of Long-Context Language Models

论文链接：https://arxiv.org/abs/2409.01666

论文标题：Attention Heads of Large Language Models: A Survey

论文链接：https://arxiv.org/abs/2409.03752

论文标题：LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

论文链接：https://arxiv.org/abs/2409.02897

论文标题：How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

论文链接：https://arxiv.org/abs/2409.03810

论文标题：Theory, Analysis, and Best Practices for Sigmoid Self-Attention

论文链接：https://arxiv.org/abs/2409.04431

论文标题：LLaMA-Omni: Seamless Speech Interaction with Large Language Models

论文链接：https://arxiv.org/abs/2409.06666

论文标题：What is the Role of Small Models in the LLM Era: A Survey

论文链接：https://arxiv.org/abs/2409.06857

论文标题：Policy Filtration in RLHF to Fine-Tune LLM for Code Generation

论文链接：https://arxiv.org/abs/2409.06957

论文标题：RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

论文链接：https://arxiv.org/abs/2409.10516

论文链接：https://arxiv.org/abs/2409.12122

论文链接：https://arxiv.org/abs/2409.12186

论文标题：Instruction Following without Instruction Tuning

论文链接：https://arxiv.org/abs/2409.14254

论文标题：Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

论文链接：https://arxiv.org/abs/2409.20059

论文标题：The Perfect Blend: Redefining RLHF with Mixture of Judges

论文链接：https://arxiv.org/abs/2409.20370

十月论文

论文标题：Addition is All You Need for Energy-efficient Language Models

论文链接：https://arxiv.org/abs/2410.00907

论文标题：Quantifying Generalization Complexity for Large Language Models

论文链接：https://arxiv.org/abs/2410.01769

论文标题：When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1

论文链接：https://arxiv.org/abs/2410.01792

论文链接：https://arxiv.org/abs/2410.01201

论文标题：Selective Attention Improves Transformer

论文链接：https://arxiv.org/abs/2410.02703

论文标题：LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

论文链接：https://arxiv.org/abs/2410.02707

论文链接：https://arxiv.org/abs/2410.02712

论文标题：Differential Transformer

论文链接：https://arxiv.org/abs/2410.05258

论文标题：GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

论文链接：https://arxiv.org/abs/2410.05229

论文标题：ARIA: An Open Multimodal Native Mixture-of-Experts Model

论文链接：https://arxiv.org/abs/2410.05993

论文链接：https://arxiv.org/abs/2410.18982

论文标题：Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

论文链接：https://arxiv.org/abs/2410.05983

论文标题：From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

论文链接：https://arxiv.org/abs/2410.06456

论文标题：KV Prediction for Improved Time to First Token

论文链接：https://arxiv.org/abs/2410.08391

论文标题：Baichuan-Omni Technical Report

论文链接：https://arxiv.org/abs/2410.08565

论文标题：MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

论文链接：https://arxiv.org/abs/2410.10139

论文标题：LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

论文链接：https://arxiv.org/abs/2410.09732

论文标题：AFlow: Automating Agentic Workflow Generation

论文链接：https://arxiv.org/abs/2410.10762

论文标题：Toward General Instruction-Following Alignment for Retrieval-Augmented Generation

论文链接：https://arxiv.org/abs/2410.09584

论文标题：Pre-training Distillation for Large Language Models: A Design Space Exploration

论文链接：https://arxiv.org/abs/2410.16215

论文标题：MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

论文链接：https://arxiv.org/abs/2410.17637

论文标题：Scalable Ranked Preference Optimization for Text-to-Image Generation

论文链接：https://arxiv.org/abs/2410.18013

论文标题：Scaling Diffusion Language Models via Adaptation from Autoregressive Models

论文链接：https://arxiv.org/abs/2410.17891

论文标题：Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

论文链接：https://arxiv.org/abs/2410.19133

论文标题：Counting Ability of Large Language Models and Impact of Tokenization

论文链接：https://arxiv.org/abs/2410.19730

论文标题：A Survey of Small Language Models

论文链接：https://arxiv.org/abs/2410.20011

论文标题：Accelerating Direct Preference Optimization with Prefix Sharing

论文链接：https://arxiv.org/abs/2410.20305

论文标题：Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse

论文链接：https://arxiv.org/abs/2410.21333

论文标题：LongReward: Improving Long-context Large Language Models with AI Feedback

论文链接：https://arxiv.org/abs/2410.21252

论文标题：ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

论文链接：https://arxiv.org/abs/2410.21465

论文标题：Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications

论文链接：https://arxiv.org/abs/2410.21943

论文标题：CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

论文链接：https://arxiv.org/abs/2410.23090

论文标题：What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

论文链接：https://arxiv.org/abs/2410.23743

论文标题：GPT or BERT: why not both?

论文链接：https://arxiv.org/abs/2410.24159

论文标题：Language Models can Self-Lengthen to Generate Long Texts

论文链接：https://arxiv.org/abs/2410.23933

十一月论文

论文标题：Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations

论文链接：https://arxiv.org/abs/2411.00640

论文标题：Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

论文链接：https://arxiv.org/abs/2411.00412

论文标题：Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

论文链接：https://arxiv.org/abs/2411.00492

论文标题：Sample-Efficient Alignment for LLMs

论文链接：https://arxiv.org/abs/2411.01493

论文标题：A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

论文链接：https://arxiv.org/abs/2411.03350

论文标题："Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

论文链接：https://arxiv.org/abs/2411.02355

论文标题：Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study

论文链接：https://arxiv.org/abs/2411.02462

论文标题：HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

论文链接：https://arxiv.org/abs/2411.02959

论文标题：Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

论文链接：https://arxiv.org/abs/2411.03823

论文标题：Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

论文链接：https://arxiv.org/abs/2411.04282

论文标题：Number Cookbook: Number Understanding of Language Models and How to Improve It

论文链接：https://arxiv.org/abs/2411.03766

论文标题：Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

论文链接：https://arxiv.org/abs/2411.04996

论文标题：BitNet a4.8: 4-bit Activations for 1-bit LLMs

论文链接：https://arxiv.org/abs/2411.04965

论文标题：Scaling Laws for Precision

论文链接：https://arxiv.org/abs/2411.04330

论文标题：Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation

论文链接：https://arxiv.org/abs/2411.05966

论文标题：Balancing Pipeline Parallelism with Vocabulary Parallelism

论文链接：https://arxiv.org/abs/2411.05288

论文标题：Toward Optimal Search and Retrieval for RAG

论文链接：https://arxiv.org/abs/2411.07396

论文标题：Large Language Models Can Self-Improve in Long-context Reasoning

论文链接：https://arxiv.org/abs/2411.08147

论文标题：Stronger Models are NOT Stronger Teachers for Instruction Tuning

论文链接：https://arxiv.org/abs/2411.07133

论文标题：Direct Preference Optimization Using Sparse Feature-Level Constraints

论文链接：https://arxiv.org/abs/2411.07618

论文标题：Cut Your Losses in Large-Vocabulary Language Models

论文链接：https://arxiv.org/abs/2411.09009

论文标题：Does Prompt Formatting Have Any Impact on LLM Performance?

论文链接：https://arxiv.org/abs/2411.10541

论文标题：SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

论文链接：https://arxiv.org/abs/2411.11909

论文链接：https://arxiv.org/abs/2411.10958

论文标题：Bi-Mamba: Towards Accurate 1-Bit State Space Models

论文链接：https://arxiv.org/abs/2411.11843

论文标题：RedPajama: an Open Dataset for Training Large Language Models

论文链接：https://arxiv.org/abs/2411.12372

论文标题：Hymba: A Hybrid-head Architecture for Small Language Models

论文链接：https://arxiv.org/abs/2411.13676

论文标题：Loss-to-Loss Prediction: Scaling Laws for All Datasets

论文链接：https://arxiv.org/abs/2411.12925

论文标题：When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

论文链接：https://arxiv.org/abs/2411.13476

论文标题：Multimodal Autoregressive Pre-training of Large Vision Encoders

论文链接：https://arxiv.org/abs/2411.14402

论文标题：Natural Language Reinforcement Learning

论文链接：https://arxiv.org/abs/2411.14251

论文标题：Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

论文链接：https://arxiv.org/abs/2411.14982

论文链接：https://arxiv.org/abs/2411.15124

论文标题：MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

论文链接：https://arxiv.org/abs/2411.15296

论文标题：LLMs Do Not Think Step-by-step In Implicit Reasoning

论文链接：https://arxiv.org/abs/2411.15862

论文标题：O1 Replication Journey – Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

论文链接：https://arxiv.org/abs/2411.16489

论文标题：Star Attention: Efficient LLM Inference over Long Sequences

论文链接：https://arxiv.org/abs/2411.17116

论文标题：Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

论文链接：https://arxiv.org/abs/2411.17691

论文标题：Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

论文链接：https://arxiv.org/abs/2411.17686

论文标题：Reverse Thinking Makes LLMs Stronger Reasoners

论文链接：https://arxiv.org/abs/2411.19865

论文标题：Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

论文链接：https://arxiv.org/abs/2411.19943

十二月论文

论文标题：Designing Scale-Wise Transformers for Text-to-Image Synthesis

论文链接：https://arxiv.org/abs/2412.01819

论文标题：X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

论文链接：https://arxiv.org/abs/2412.01824

论文标题：Free Process Rewards without Process Labels

论文链接：https://arxiv.org/abs/2412.01981

论文标题：Scaling Image Tokenizers with Grouped Spherical Quantization

论文链接：https://arxiv.org/abs/2412.02632

论文标题：RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models

论文链接：https://arxiv.org/abs/2412.02830

论文标题：Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

论文链接：https://arxiv.org/abs/2412.03548

论文标题：Evaluating Language Models as Synthetic Data Generators

论文链接：https://arxiv.org/abs/2412.03679

论文标题：Best-of-N Jailbreaking

论文链接：https://arxiv.org/abs/2412.03556

论文标题：PaliGemma 2: A Family of Versatile VLMs for Transfer

论文链接：https://arxiv.org/abs/2412.03555

论文标题：VisionZip: Longer is Better but Not Necessary in Vision Language Models

论文链接：https://arxiv.org/abs/2412.04467

论文标题：Evaluating and Aligning CodeLLMs on Human Preference

论文链接：https://arxiv.org/abs/2412.05210

论文标题：MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

论文链接：https://arxiv.org/abs/2412.05237

论文标题：Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

论文链接：https://arxiv.org/abs/2412.05271

论文标题：LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

论文链接：https://arxiv.org/abs/2412.05579

论文标题：Does RLHF Scale? Exploring the Impacts From Data, Model, and Method

论文链接：https://arxiv.org/abs/2412.06000

论文标题：Unraveling the Complexity of Memory in RL Agents: An Approach for Classification and Evaluation

论文链接：https://arxiv.org/abs/2412.06531

论文标题：Training Large Language Models to Reason in a Continuous Latent Space

论文链接：https://arxiv.org/abs/2412.06769

论文标题：AutoReason: Automatic Few-Shot Reasoning Decomposition

论文链接：https://arxiv.org/abs/2412.06975

论文标题：Large Concept Models: Language Modeling in a Sentence Representation Space

论文链接：https://arxiv.org/abs/2412.08821

论文标题：Phi-4 Technical Report

论文链接：https://arxiv.org/abs/2412.08905

论文标题：Byte Latent Transformer: Patches Scale Better Than Tokens

论文链接：https://arxiv.org/abs/2412.09871

论文标题：SCBench: A KV Cache-Centric Analysis of Long-Context Methods

论文链接：https://arxiv.org/abs/2412.10319

论文标题：Cultural Evolution of Cooperation among LLM Agents

论文链接：https://arxiv.org/abs/2412.10270

论文标题：DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

论文链接：https://arxiv.org/abs/2412.10302

论文标题：No More Adam: Learning Rate Scaling at Initialization is All You Need

论文链接：https://arxiv.org/abs/2412.11768

论文标题：Precise Length Control in Large Language Models

论文链接：https://arxiv.org/abs/2412.11937

论文标题：The Open Source Advantage in Large Language Models (LLMs)

论文链接：https://arxiv.org/abs/2412.12004

论文标题：A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

论文链接：https://arxiv.org/abs/2412.11936

论文标题：Are Your LLMs Capable of Stable Reasoning?

论文链接：https://arxiv.org/abs/2412.13147

论文标题：LLM Post-Training Recipes, Improving Reasoning in LLMs

论文链接：https://arxiv.org/abs/2412.14135

论文标题：Hansel: Output Length Controlling Framework for Large Language Models

论文链接：https://arxiv.org/abs/2412.14033

论文标题：Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning

论文链接：https://arxiv.org/abs/2412.1363

论文标题：Alignment Faking in Large Language Models

论文链接：https://arxiv.org/abs/2412.14093

论文标题：SCOPE: Optimizing Key-Value Cache Compression in Long-Context Generation

论文链接：https://arxiv.org/abs/2412.13649

论文标题：LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-Context Multitasks

论文链接：https://arxiv.org/abs/2412.15204

论文标题：Offline Reinforcement Learning for LLM Multi-Step Reasoning

论文链接：https://arxiv.org/abs/2412.16145

论文标题：Mulberry: Empowering MLLM with O1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

论文链接：https://arxiv.org/abs/2412.18319

查看原图 52K