<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Numerical-Stability on Jeanphilo Blog</title><link>https://shio-chan-dev.github.io/jeanblog/zh/tags/numerical-stability/</link><description>Recent content in Numerical-Stability on Jeanphilo Blog</description><generator>Hugo -- 0.159.2</generator><language>zh-cn</language><lastBuildDate>Sun, 25 Jan 2026 12:50:33 +0800</lastBuildDate><atom:link href="https://shio-chan-dev.github.io/jeanblog/zh/tags/numerical-stability/index.xml" rel="self" type="application/rss+xml"/><item><title>Self-Attention 计算公式与 Softmax 数值稳定：从推导到工程实现</title><link>https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/self-attention-softmax-formula-and-stability/</link><pubDate>Sun, 25 Jan 2026 12:50:33 +0800</pubDate><guid>https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/self-attention-softmax-formula-and-stability/</guid><description>用公式与可运行示例讲清 Self-Attention 的计算流程、softmax 的数值问题与工程实现要点。</description></item><item><title>为什么注意力要除以 √(d_k)：从数值稳定到工程收益</title><link>https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/why-scale-attention-by-sqrt-dk/</link><pubDate>Sat, 24 Jan 2026 16:22:25 +0800</pubDate><guid>https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/why-scale-attention-by-sqrt-dk/</guid><description>解释注意力中 QK^T 为何需要除以 √(d_k)，并给出最小 PyTorch 示例与工程场景。</description></item></channel></rss>