Transformer on Jeanphilo Blog

Transformer on Jeanphilo Bloghttps://shio-chan-dev.github.io/jeanblog/zh/tags/transformer/Recent content in Transformer on Jeanphilo BlogHugo -- 0.159.2zh-cnSun, 25 Jan 2026 20:08:41 +0800Attention Is All You Need：Transformer 的核心算法与工程落地https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/attention-is-all-you-need/Sun, 25 Jan 2026 20:08:41 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/attention-is-all-you-need/从算法抽象、复杂度与工程约束出发，解释 Transformer 如何用注意力替代递归与卷积，并给出可运行示例与选型指南。Self-Attention 计算公式与 Softmax 数值稳定：从推导到工程实现https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/self-attention-softmax-formula-and-stability/Sun, 25 Jan 2026 12:50:33 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/self-attention-softmax-formula-and-stability/用公式与可运行示例讲清 Self-Attention 的计算流程、softmax 的数值问题与工程实现要点。CNN、RNN、LSTM 与 Transformer 的区别与适用场景https://shio-chan-dev.github.io/jeanblog/zh/ai/architecture/cnn-rnn-lstm-transformer-comparison/Sat, 24 Jan 2026 16:28:18 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/architecture/cnn-rnn-lstm-transformer-comparison/从依赖路径长度与资源复杂度两个核心概念出发，系统对比 CNN、RNN、LSTM 与 Transformer，并给出可运行示例与工程选型步骤。ViT 结构描述：从 Patch Embedding 到 Transformer 编码器https://shio-chan-dev.github.io/jeanblog/zh/ai/vision/vit-architecture-overview/Sat, 24 Jan 2026 16:25:35 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/vision/vit-architecture-overview/系统讲清 ViT 的结构组件、工作流程与工程实践，并给出最小 PyTorch 示例。Transformer 中可以用 BatchNorm 吗？https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/batchnorm-in-transformer/Sat, 24 Jan 2026 16:24:03 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/batchnorm-in-transformer/讨论 Transformer 使用 BatchNorm 的可行性、限制与工程取舍，并给出最小示例。BN 与 LN 的区别：训练稳定性与工程取舍https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/batchnorm-vs-layernorm/Sat, 24 Jan 2026 16:23:47 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/batchnorm-vs-layernorm/对比 BatchNorm 与 LayerNorm 的原理、适用场景与工程代价，并提供最小 PyTorch 示例。为什么注意力要除以 √(d_k)：从数值稳定到工程收益https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/why-scale-attention-by-sqrt-dk/Sat, 24 Jan 2026 16:22:25 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/why-scale-attention-by-sqrt-dk/解释注意力中 QK^T 为何需要除以 √(d_k)，并给出最小 PyTorch 示例与工程场景。残差连接的作用：为什么深度网络离不开它https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/residual-connection-role/Sat, 24 Jan 2026 16:22:22 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/residual-connection-role/解释残差连接在深度网络中的作用与原理，并提供最小可运行示例。Attention 的复杂度与为什么需要位置编码https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/attention-complexity-and-positional-encoding/Sat, 24 Jan 2026 16:21:51 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/attention-complexity-and-positional-encoding/解释注意力的时间/空间复杂度，并说明位置编码对序列建模的必要性，含最小示例。为什么使用多头注意力机制：能力、稳定性与工程取舍https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/why-multi-head-attention/Sat, 24 Jan 2026 16:20:59 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/why-multi-head-attention/用 ACERS 框架解释多头注意力的必要性、核心原理与工程场景，并给出最小可运行示例。Transformer 结构描述：从编码器到解码器https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/transformer-architecture-overview/Sat, 24 Jan 2026 16:18:19 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/transformer-architecture-overview/用 ACERS 框架讲清 Transformer 结构、模块职责与工程场景，并给出最小可运行示例。为什么 GPT 是 Decoder-Only：自回归生成的最佳形态https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/why-gpt-decoder-only/Sat, 24 Jan 2026 16:15:34 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/why-gpt-decoder-only/解释 GPT 选择 decoder-only 结构的原因，并与 encoder-only / encoder-decoder 做工程对比。LLaMA 中 RMSNorm 相比 LayerNorm 的优势https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/rmsnorm-vs-layernorm-llama/Sat, 24 Jan 2026 15:52:58 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/llm/rmsnorm-vs-layernorm-llama/从公式、复杂度与工程实践出发，解析 LLaMA 选择 RMSNorm 的原因，并给出最小 PyTorch 示例。Self-Attention vs Cross-Attention：机制、差异与工程应用https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/self-attention-vs-cross-attention/Sat, 24 Jan 2026 15:44:12 +0800https://shio-chan-dev.github.io/jeanblog/zh/ai/attention/self-attention-vs-cross-attention/用 ACERS 框架讲清 self-attention 与 cross-attention 的核心差异、公式与工程场景。