大语言模型微调及其应用的探索 跟踪前沿的技术
公式:
${z = W^T X + b}$
${z = w_1x_1 +w_2x_2 + … + b(神经元)}$
for i in rang(n):
z += w[i] * x[j];
z += b;
w_t = w.T;
z = np.dot(w_t, x) + b;
graph LR
subgraph Neuron[单个神经元(向量化)]
X[输入向量 X<br/>(形状: n × m)] --> DOT[线性组合 z = W^T X + b]
W[权重 W<br/>(形状: n × 1)] --> DOT
b[偏置 b<br/>(标量或 1 × m 广播)] --> DOT
DOT --> SIG[激活 a = sigmoid(z)]
SIG --> Yhat[预测 ŷ (1 × m)]
end
说明:通常
X的形状为(n, m)(n 为特征数,m 为样本数),W为(n, 1),因此z = W^T X + b的结果为(1, m)。
flowchart TD
A[输入: X (n,m), Y (1,m), W (n,1), b] --> B[前向:z = W^T X + b (1,m)]
B --> C[激活:A = sigmoid(z) (1,m)]
C --> D[计算损失:L = -1/m * sum(Y*log(A) + (1-Y)*log(1-A))]
D --> E[反向:dZ = A - Y (1,m)]
E --> F[梯度:dW = 1/m * X · dZ^T (n,1) \n db = 1/m * sum(dZ)]
F --> G[参数更新:W := W - lr * dW; b := b - lr * db]
G --> H[重复直到收敛]
前向
损失(对 m 个样本取平均)
反向(梯度)
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# X shape: (n, m), Y shape: (1, m)
# W shape: (n, 1), b is scalar (or shape (1,))
Z = np.dot(W.T, X) + b # (1, m)
A = sigmoid(Z) # (1, m)
m = X.shape[1]
# cost
cost = -np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A)) / m
# gradients
dZ = A - Y # (1, m)
dW = np.dot(X, dZ.T) / m # (n, 1)
db = np.sum(dZ) / m # scalar
# update
W = W - learning_rate * dW
b = b - learning_rate * db
X 的列为样本,行为特征。log(1-A) 时可用 np.clip(A, 1e-12, 1-1e-12) 防止 NaN。