<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Inference on Jeanphilo Blog</title><link>https://shio-chan-dev.github.io/jeanblog/zh/tags/inference/</link><description>Recent content in Inference on Jeanphilo Blog</description><generator>Hugo -- 0.159.2</generator><language>zh-cn</language><lastBuildDate>Sun, 25 Jan 2026 12:51:15 +0800</lastBuildDate><atom:link href="https://shio-chan-dev.github.io/jeanblog/zh/tags/inference/index.xml" rel="self" type="application/rss+xml"/><item><title>FlashAttention 的 MQA/GQA：共享 KV 的等价、收益与实现要点（含可运行验证）</title><link>https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/flash-attention-mqa-gqa/</link><pubDate>Sun, 25 Jan 2026 12:51:15 +0800</pubDate><guid>https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/flash-attention-mqa-gqa/</guid><description>解释 FlashAttention 如何处理 MQA/GQA：共享 KV、按组计算与内存复用策略，并附可运行示例验证等价性。</description></item><item><title>BN 与 Dropout：训练与推理时的关键区别</title><link>https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/bn-vs-dropout-train-infer/</link><pubDate>Sat, 24 Jan 2026 16:24:44 +0800</pubDate><guid>https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/bn-vs-dropout-train-infer/</guid><description>系统对比 BatchNorm 与 Dropout 在训练/推理阶段的行为差异，并提供最小 PyTorch 示例。</description></item><item><title>BLIP/BLIP-2 实战原理与最小推理示例</title><link>https://shio-chan-dev.github.io/jeanblog/zh/ai/blip/blip-blip2-principles-minimal-inference/</link><pubDate>Sat, 24 Jan 2026 15:40:51 +0800</pubDate><guid>https://shio-chan-dev.github.io/jeanblog/zh/ai/blip/blip-blip2-principles-minimal-inference/</guid><description>按 ACERS 结构讲清 BLIP 与 BLIP-2 的原理差异，并给出最小 PyTorch 推理示例。</description></item></channel></rss>