线性方程
线线方程(1):
y=\beta_0 + \beta_1x_1 +\beta_2x_2 + \beta_3x_3+...\beta_kx_k + \epsilon
式(1)用矩阵表示为(2):
\begin{bmatrix}
y_1 \\
y_2 \\
y_3 \\
... \\
y_n
\end{bmatrix}_{n\times1}=\begin{bmatrix}
1 & x_{11} & x_{12} & x_{13} & ... & x_{1k} \\
1 & x_{21} & x_{22} & x_{23} & ... & x_{2k} \\
1 & x_{31} & x_{32} & x_{33} & ... & x_{3k}\\
... & ... & ... & ... & ... & ... &\\
1 & x_{n1} & x_{n2} & x_{n3} & ... & x_{nk}
\end{bmatrix}_{n\times(k+1)} \times \begin{bmatrix}
\beta_0 \\
\beta_1 \\
\beta_2 \\
... \\
\beta_k
\end{bmatrix}_{(K+1)\times1} + \begin{bmatrix}
\epsilon_1 \\
\epsilon_2 \\
\epsilon_3 \\
... \\
\epsilon_n
\end{bmatrix}_{n\times1}
即:
y= X\beta + \epsilon
线性回归(LR)
求解
线性回归前提:
- 各个自变量x(x_1, x_2, x_3...x_k)之间不存在严格线性相关性
- 样本n满足: n >= 30 或 n >=3*(k+1)
- 干扰因子\epsilon与x无线性相关, 且\epsilon均值为零
由线性方程(2)可得:
\epsilon=y-X\beta
线性回归的目的是找到一个\beta, 使得\epsilon的平均方差最小, 最小二乘(OLS)提供了最小平均方差的计算公式:
\beta=(X'X)^{-1}X'y
y值预测
求解后预测公式为:
\hat y = \hat x^T \beta
代码
#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;
int main() {
// Input data
MatrixXd X(5, 2);
X << 1, 2,
2, 4,
3, 6,
4, 8,
5, 10;
VectorXd y(5);
y << 3, 6, 9, 12, 15;
// Add a column of ones to X for the intercept term
MatrixXd X_ones(X.rows(), X.cols() + 1);
X_ones << MatrixXd::Ones(X.rows(), 1), X;
// Calculate the least squares solution
VectorXd beta = (X_ones.transpose() * X_ones).inverse() * X_ones.transpose() * y;
// Print the coefficients (intercept and slope)
std::cout << "Intercept: " << beta(0) << std::endl;
std::cout << "Slope 1: " << beta(1) << std::endl;
std::cout << "Slope 2: " << beta(2) << std::endl;
return 0;
}
贝叶斯回归(BLR)
求解
贝叶斯线性回归前提:
- 残差(噪声)\xi服从正态分布, 其方差\sigma^2服从逆Gamma分布(Inverse-Gamma distribution)
在此条件下, 当X确定时, y被观察到的分布, 也就是似然函数p(y|X)为:
p(y|X,\beta,\theta^2)=\frac{1}{(2\pi\sigma^2)^{n/2}}exp(-\frac{1}{2\sigma^2}(y-X\beta)^T(y-X\beta))
结合贝叶斯定理:
P(A|B)=\frac{P(A)P(B|A)}{P(B)}
- P(A)是A的先验概率, 它不考虑任何B方面的因素
- P(A|B)是已知B发生后A的条件概率
- P(B|A)是已知A发生后B的条件概率
- P(B)是B的先验概率, 它不考虑任何A方面的因素
推导可得后验分布(3):
p(\beta,\sigma^2|X,y) \propto p(y|X,\beta,\sigma^2).p(\beta).p(\sigma^2)
预测
p(\hat y|\hat x,X, y)=\int p(\hat y|\hat x, \beta, \sigma^2)p(\beta,\sigma^2|X,y)d\beta d\sigma^2
代码
#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;
int main() {
// Input data
MatrixXd X(5, 2);
X << 1, 2,
2, 4,
3, 6,
4, 8,
5, 10;
VectorXd y(5);
y << 3, 6, 9, 12, 15;
// Add a column of ones to X for the intercept term
MatrixXd X_ones(X.rows(), X.cols() + 1);
X_ones << MatrixXd::Ones(X.rows(), 1), X;
// Prior parameters
double alpha = 1.0; // Prior precision for the coefficients
double beta = 1.0; // Prior precision for the noise variance
// Calculate posterior parameters
double N = X_ones.rows();
double M = X_ones.cols();
MatrixXd XtX = X_ones.transpose() * X_ones;
MatrixXd Xty = X_ones.transpose() * y;
MatrixXd Sigma_inv = alpha * MatrixXd::Identity(M, M) + beta * XtX;
MatrixXd Sigma = Sigma_inv.inverse();
VectorXd mu = beta * Sigma * Xty;
// Calculate predictive distribution for new input data points
MatrixXd X_star(1, 2); // New input point for prediction
X_star << 6, 12;
MatrixXd X_star_ones(X_star.rows(), X_star.cols() + 1);
X_star_ones << MatrixXd::Ones(X_star.rows(), 1), X_star;
double y_pred = X_star_ones * mu;
// Print the coefficients (intercept and slope)
std::cout << "Intercept: " << mu(0) << std::endl;
std::cout << "Slope 1: " << mu(1) << std::endl;
std::cout << "Slope 2: " << mu(2) << std::endl;
// Print the predicted output
std::cout << "Predicted Output: " << y_pred << std::endl;
return 0;
}
高斯过程回归(GPR)
求解
高斯过程回归的前提:
- 任何有限数量的自变量x的组合都服从联合高斯分布
根据高斯模型方程(4):
y=f(X)+\epsilon
和前验分布方程(5):
p(f(X)) \sim N(m(X),K(X,X))
- m(X)是均值函数
- K(X,X)是X的自身的协方差矩阵
得到后验分布为:
p(f(X)|X,y) \sim N(\mu(X),\sum(X,X))
预测
p(\hat y| \hat x, X, y) \sim N(\mu(\hat x),\sigma^2(\hat x))
- \mu(\hat x)是预测均值
- \sigma^2(\hat x)是预测方差
代码
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/Cholesky>
using namespace Eigen;
// Define the kernel function (Squared Exponential Kernel)
double kernel(const VectorXd& x1, const VectorXd& x2, double lengthScale, double variance) {
double squaredDistance = (x1 - x2).squaredNorm();
return variance * exp(-squaredDistance / (2.0 * lengthScale * lengthScale));
}
int main() {
// Input data
MatrixXd X(5, 1);
X << 1,
2,
3,
4,
5;
VectorXd y(5);
y << 3, 6, 9, 12, 15;
// New input point for prediction
VectorXd X_star(1);
X_star << 6;
// Kernel hyperparameters
double lengthScale = 1.0;
double variance = 1.0;
// Calculate the kernel matrix (covariance matrix)
int n = X.rows();
MatrixXd K(n, n);
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
K(i, j) = kernel(X.row(i), X.row(j), lengthScale, variance);
}
}
// Add small noise to the diagonal of the kernel matrix for numerical stability
K.diagonal() += 1e-6;
// Compute the kernel vector for the new input point
VectorXd k_star(n);
for (int i = 0; i < n; i++) {
k_star(i) = kernel(X.row(i), X_star, lengthScale, variance);
}
// Calculate the predictive mean and variance
VectorXd mu = k_star.transpose() * K.inverse() * y;
double sigma = kernel(X_star, X_star, lengthScale, variance) - k_star.transpose() * K.inverse() * k_star;
// Print the predictive mean and variance
std::cout << "Predicted Mean: " << mu(0) << std::endl;
std::cout << "Predicted Variance: " << sigma << std::endl;
return 0;
}
人工神经网络线性回归(ANNR)
求解
人工神经网络是利用残差网络求解出损失L最小时, 权重参数值w:
y_{pred}= w_0 + w_1x_1 + w_2x_2 +w_2x_2 +...w_kx_k \\
L=\frac{1}{n}\sum_{i=1}^{n}(y_{pred}^i-y_{true})^2
对于神经网络其解问题分为以下步骤:
- 选择网络结构: 对于线性回归一般选择单层神经网络.
- 选择选择激活函数: 对于线性回归选择线性激活函数
- 正向传导计算得到y_{pred}
- 计算损失值L
- 反向传导: 选择梯度下降或者其他算法迭代更新权值w
- 修正权重参数w, 重复执行3,4,5,6直到损失L最小
预测
y_{pred}= w_0 + w_1x_1 + w_2x_2 +w_2x_2 +...w_kx_k
代码
#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;
// Linear Activation Function (Identity Function)
MatrixXd linearActivation(const MatrixXd& x) {
return x;
}
// Neural Network Prediction
VectorXd predict(const MatrixXd& X, const MatrixXd& W) {
return X * W;
}
int main() {
// Input data (X) and true output (y)
MatrixXd X(5, 2);
X << 1, 2,
2, 4,
3, 6,
4, 8,
5, 10;
VectorXd y(5);
y << 3, 6, 9, 12, 15;
// Add a column of ones to X for the bias term
MatrixXd X_bias(X.rows(), X.cols() + 1);
X_bias << MatrixXd::Ones(X.rows(), 1), X;
// Randomly initialize the weights (including bias weight)
MatrixXd W(X_bias.cols(), 1);
W.setRandom();
// Hyperparameters
double learningRate = 0.01;
int epochs = 1000;
// Neural Network Training (Gradient Descent)
for (int epoch = 0; epoch < epochs; ++epoch) {
// Forward pass: Prediction
VectorXd y_pred = predict(X_bias, W);
// Calculate the loss (Mean Squared Error)
VectorXd loss = y_pred - y;
double meanSquaredError = loss.squaredNorm() / X.rows();
// Backpropagation: Update the weights
MatrixXd gradient = X_bias.transpose() * loss / X.rows();
W -= learningRate * gradient;
}
// Print the learned weights
std::cout << "Learned Weights:" << std::endl;
std::cout << W << std::endl;
return 0;
}