Snack On AI

Your Daily AI Snack On - AI Research, Tools, Tutorials & Insights

Join 10,000+ AI Engineers & Enthusiasts, Subscribe & Grow Together

SnackOnAI Blog

Inkling: Thinking Machines Lab Built a 975B MoE With Controllable Thinking Effort, Relative Position Embeddings, and Short Convolutions on the Residual Stream. The Self-Fine-Tuning Demo Is the Real Signal.

Jul 17, 2026

•

16 min read

Inkling: Thinking Machines Lab Built a 975B MoE With Controllable Thinking Effort, Relative Position Embeddings, and Short Convolutions on the Residual Stream. The Self-Fine-Tuning Demo Is the Real Signal.

Inkling (thinkingmachines/Inkling, open-weights, July 15, 2026) is Thinking Machines Lab's first model release: a 975B-total/41B-active Mixture-of-Experts transformer with a 1M token context window, encoder-free multimodal inputs (audio as dMel spectrograms, vision as 40x40 pixel patches via 4-layer hMLP), controllable thinking effort (a float you pass at inference time), and 30M+ RL rollouts shaping its behavior.

Mohinish S

SnackOnAI Blog

OpenScience: The Open-Source AI Workbench Launched Five Days After Claude Science. It Supports More Models, More Skills, and Runs on Your Infrastructure. The Tradeoff Is Everything That Comes With Being Five Days Old.

Jul 16, 2026

•

15 min read

OpenScience: The Open-Source AI Workbench Launched Five Days After Claude Science. It Supports More Models, More Skills, and Runs on Your Infrastructure. The Tradeoff Is Everything That Comes With Being Five Days Old.

OpenScience (synthetic-sciences/openscience, Apache 2.0, v1.2.5, YC W26, openscience.sh) is a model-agnostic AI workbench for scientific research that runs the full research loop: literature review, hypothesis, code, experiment, analysis, and write-up, in one continuous session. It ships 250+ editable skills across ML, computational biology, cheminformatics, and cloud compute, plus 30+ scientific databases (UniProt, PDB, ChEMBL, arXiv, OpenAlex, Semantic Scholar) as native agent tools. Any frontier or open-weight model works with a single configuration flag; switching is per-request.

Mohinish S

SnackOnAI Blog

Atomic Task Graph: A 7B Model That Beats GPT-4 ReAct on ALFWorld and WebShop Has Nothing to Do With the 7B. It Is the Control Framework.

Jul 15, 2026

•

18 min read

Atomic Task Graph: A 7B Model That Beats GPT-4 ReAct on ALFWorld and WebShop Has Nothing to Do With the 7B. It Is the Control Framework.

ATG (arXiv:2607.01942, South China University of Technology + Tsinghua University, July 2026) is a training-free control framework that represents LLM agent planning and execution as an explicit directed acyclic graph of atomic tool-use units.

Mohinish S

SnackOnAI Blog

DeLM: The Multi-Agent Framework That Proved the Central Orchestrator Is the Bottleneck, Not the Solution

Jul 14, 2026

•

18 min read

DeLM: The Multi-Agent Framework That Proved the Central Orchestrator Is the Bottleneck, Not the Solution

DeLM (yuzhenmao/DeLM, arXiv:2606.10662, Stanford University, June 2026) is a decentralized multi-agent framework where parallel agents coordinate through a shared verified context and a task queue, with no central controller.

Mohinish S

SnackOnAI Blog

FlashInfer: The Attention Kernel Library That Proves the Bottleneck in LLM Inference Was Never the Model. It Was the Memory Access Pattern.

Jul 13, 2026

•

17 min read

FlashInfer: The Attention Kernel Library That Proves the Bottleneck in LLM Inference Was Never the Model. It Was the Memory Access Pattern.

FlashInfer (flashinfer-ai/flashinfer, Apache 2.0, 5.8k stars, MLSys 2025, arXiv:2501.01005) is a kernel library and kernel generator for LLM inference serving. Its three core contributions are a block-sparse composable format for heterogeneous KV-cache storage, a JIT-compiled customizable attention template system, and a load-balanced scheduling algorithm that works with CUDAGraph despite dynamic batching.

Mohinish S

SnackOnAI Blog

M Star: Stanford and UW Built a Universal Multimodal Serving System. The Key Insight Is That Every Model, From BAGEL to V-JEPA to Qwen3-Omni, Is Just a Graph. Every Request Is Just a Walk.

Jul 12, 2026

•

14 min read

M Star: Stanford and UW Built a Universal Multimodal Serving System. The Key Insight Is That Every Model, From BAGEL to V-JEPA to Qwen3-Omni, Is Just a Graph. Every Request Is Just a Walk.

M (mstar-project/mstar, arXiv:2606.12688, preprint June 2026, Stanford + University of Washington + CMU) is a universal serving runtime for composite multimodal models. Its core abstraction is the Walk Graph: a model is a directed computation graph of heterogeneous components, and every request executes as a series of Walks over that graph.

Mohinish S

...

Snack On AI

Inkling: Thinking Machines Lab Built a 975B MoE With Controllable Thinking Effort, Relative Position Embeddings, and Short Convolutions on the Residual Stream. The Self-Fine-Tuning Demo Is the Real Signal.

OpenScience: The Open-Source AI Workbench Launched Five Days After Claude Science. It Supports More Models, More Skills, and Runs on Your Infrastructure. The Tradeoff Is Everything That Comes With Being Five Days Old.

Atomic Task Graph: A 7B Model That Beats GPT-4 ReAct on ALFWorld and WebShop Has Nothing to Do With the 7B. It Is the Control Framework.

DeLM: The Multi-Agent Framework That Proved the Central Orchestrator Is the Bottleneck, Not the Solution

FlashInfer: The Attention Kernel Library That Proves the Bottleneck in LLM Inference Was Never the Model. It Was the Memory Access Pattern.

M Star: Stanford and UW Built a Universal Multimodal Serving System. The Key Insight Is That Every Model, From BAGEL to V-JEPA to Qwen3-Omni, Is Just a Graph. Every Request Is Just a Walk.

Snack on the latest in AI delivered to your inbox.

Quick Links

Subscription

Socials