Halo
发布于 2023-07-20 / 154 阅读 / 0 评论 / 0 点赞

线性拟合

线性方程

线线方程(1):

y=\beta_0 + \beta_1x_1 +\beta_2x_2 + \beta_3x_3+...\beta_kx_k + \epsilon

式(1)用矩阵表示为(2):

\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ ... \\ y_n \end{bmatrix}_{n\times1}=\begin{bmatrix} 1 & x_{11} & x_{12} & x_{13} & ... & x_{1k} \\ 1 & x_{21} & x_{22} & x_{23} & ... & x_{2k} \\ 1 & x_{31} & x_{32} & x_{33} & ... & x_{3k}\\ ... & ... & ... & ... & ... & ... &\\ 1 & x_{n1} & x_{n2} & x_{n3} & ... & x_{nk} \end{bmatrix}_{n\times(k+1)} \times \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ ... \\ \beta_k \end{bmatrix}_{(K+1)\times1} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ ... \\ \epsilon_n \end{bmatrix}_{n\times1}

即:

y= X\beta + \epsilon

线性回归(LR)

求解

线性回归前提:

  • 各个自变量x(x_1, x_2, x_3...x_k)之间不存在严格线性相关性
  • 样本n满足: n >= 30 或 n >=3*(k+1)
  • 干扰因子\epsilon与x无线性相关, 且\epsilon均值为零
    由线性方程(2)可得:
\epsilon=y-X\beta

线性回归的目的是找到一个\beta, 使得\epsilon的平均方差最小, 最小二乘(OLS)提供了最小平均方差的计算公式:

\beta=(X'X)^{-1}X'y

y值预测

求解后预测公式为:

\hat y = \hat x^T \beta

代码

#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;
int main() {
// Input data
MatrixXd X(5, 2);
X << 1, 2,
2, 4,
3, 6,
4, 8,
5, 10;
VectorXd y(5);
y << 3, 6, 9, 12, 15;

// Add a column of ones to X for the intercept term
MatrixXd X_ones(X.rows(), X.cols() + 1);
X_ones << MatrixXd::Ones(X.rows(), 1), X;

// Calculate the least squares solution
VectorXd beta = (X_ones.transpose() * X_ones).inverse() * X_ones.transpose() * y;

// Print the coefficients (intercept and slope)
std::cout << "Intercept: " << beta(0) << std::endl;
std::cout << "Slope 1: " << beta(1) << std::endl; 
std::cout << "Slope 2: " << beta(2) << std::endl;
return 0;
}

贝叶斯回归(BLR)

求解

贝叶斯线性回归前提:

  • 残差(噪声)\xi服从正态分布, 其方差\sigma^2服从逆Gamma分布(Inverse-Gamma distribution)

在此条件下, 当X确定时, y被观察到的分布, 也就是似然函数p(y|X)为:

p(y|X,\beta,\theta^2)=\frac{1}{(2\pi\sigma^2)^{n/2}}exp(-\frac{1}{2\sigma^2}(y-X\beta)^T(y-X\beta))

结合贝叶斯定理:

P(A|B)=\frac{P(A)P(B|A)}{P(B)}
  • P(A)是A的先验概率, 它不考虑任何B方面的因素
  • P(A|B)是已知B发生后A的条件概率
  • P(B|A)是已知A发生后B的条件概率
  • P(B)是B的先验概率, 它不考虑任何A方面的因素

推导可得后验分布(3):

p(\beta,\sigma^2|X,y) \propto p(y|X,\beta,\sigma^2).p(\beta).p(\sigma^2)

预测

p(\hat y|\hat x,X, y)=\int p(\hat y|\hat x, \beta, \sigma^2)p(\beta,\sigma^2|X,y)d\beta d\sigma^2

代码

#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;

int main() {
    // Input data
    MatrixXd X(5, 2);
    X << 1, 2,
         2, 4,
         3, 6,
         4, 8,
         5, 10;
    VectorXd y(5);
    y << 3, 6, 9, 12, 15;

    // Add a column of ones to X for the intercept term
    MatrixXd X_ones(X.rows(), X.cols() + 1);
    X_ones << MatrixXd::Ones(X.rows(), 1), X;

    // Prior parameters
    double alpha = 1.0; // Prior precision for the coefficients
    double beta = 1.0;  // Prior precision for the noise variance

    // Calculate posterior parameters
    double N = X_ones.rows();
    double M = X_ones.cols();
    MatrixXd XtX = X_ones.transpose() * X_ones;
    MatrixXd Xty = X_ones.transpose() * y;
    MatrixXd Sigma_inv = alpha * MatrixXd::Identity(M, M) + beta * XtX;
    MatrixXd Sigma = Sigma_inv.inverse();
    VectorXd mu = beta * Sigma * Xty;

    // Calculate predictive distribution for new input data points
    MatrixXd X_star(1, 2); // New input point for prediction
    X_star << 6, 12;

    MatrixXd X_star_ones(X_star.rows(), X_star.cols() + 1);
    X_star_ones << MatrixXd::Ones(X_star.rows(), 1), X_star;

    double y_pred = X_star_ones * mu;

    // Print the coefficients (intercept and slope)
    std::cout << "Intercept: " << mu(0) << std::endl;
    std::cout << "Slope 1: " << mu(1) << std::endl;
    std::cout << "Slope 2: " << mu(2) << std::endl;

    // Print the predicted output
    std::cout << "Predicted Output: " << y_pred << std::endl;

    return 0;
}

高斯过程回归(GPR)

求解

高斯过程回归的前提:

  • 任何有限数量的自变量x的组合都服从联合高斯分布

根据高斯模型方程(4):

y=f(X)+\epsilon

和前验分布方程(5):

p(f(X)) \sim N(m(X),K(X,X))
  • m(X)是均值函数
  • K(X,X)是X的自身的协方差矩阵

得到后验分布为:

p(f(X)|X,y) \sim N(\mu(X),\sum(X,X))

预测

p(\hat y| \hat x, X, y) \sim N(\mu(\hat x),\sigma^2(\hat x))
  • \mu(\hat x)是预测均值
  • \sigma^2(\hat x)是预测方差

代码

#include <iostream>
#include <Eigen/Dense>
#include <Eigen/Cholesky>

using namespace Eigen;

// Define the kernel function (Squared Exponential Kernel)
double kernel(const VectorXd& x1, const VectorXd& x2, double lengthScale, double variance) {
    double squaredDistance = (x1 - x2).squaredNorm();
    return variance * exp(-squaredDistance / (2.0 * lengthScale * lengthScale));
}

int main() {
    // Input data
    MatrixXd X(5, 1);
    X << 1,
         2,
         3,
         4,
         5;

    VectorXd y(5);
    y << 3, 6, 9, 12, 15;

    // New input point for prediction
    VectorXd X_star(1);
    X_star << 6;

    // Kernel hyperparameters
    double lengthScale = 1.0;
    double variance = 1.0;

    // Calculate the kernel matrix (covariance matrix)
    int n = X.rows();
    MatrixXd K(n, n);
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            K(i, j) = kernel(X.row(i), X.row(j), lengthScale, variance);
        }
    }

    // Add small noise to the diagonal of the kernel matrix for numerical stability
    K.diagonal() += 1e-6;

    // Compute the kernel vector for the new input point
    VectorXd k_star(n);
    for (int i = 0; i < n; i++) {
        k_star(i) = kernel(X.row(i), X_star, lengthScale, variance);
    }

    // Calculate the predictive mean and variance
    VectorXd mu = k_star.transpose() * K.inverse() * y;
    double sigma = kernel(X_star, X_star, lengthScale, variance) - k_star.transpose() * K.inverse() * k_star;

    // Print the predictive mean and variance
    std::cout << "Predicted Mean: " << mu(0) << std::endl;
    std::cout << "Predicted Variance: " << sigma << std::endl;

    return 0;
}

人工神经网络线性回归(ANNR)

求解

人工神经网络是利用残差网络求解出损失L最小时, 权重参数值w:

y_{pred}= w_0 + w_1x_1 + w_2x_2 +w_2x_2 +...w_kx_k \\ L=\frac{1}{n}\sum_{i=1}^{n}(y_{pred}^i-y_{true})^2

对于神经网络其解问题分为以下步骤:

  1. 选择网络结构: 对于线性回归一般选择单层神经网络.
  2. 选择选择激活函数: 对于线性回归选择线性激活函数
  3. 正向传导计算得到y_{pred}
  4. 计算损失值L
  5. 反向传导: 选择梯度下降或者其他算法迭代更新权值w
  6. 修正权重参数w, 重复执行3,4,5,6直到损失L最小

预测

y_{pred}= w_0 + w_1x_1 + w_2x_2 +w_2x_2 +...w_kx_k

代码

#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;

// Linear Activation Function (Identity Function)
MatrixXd linearActivation(const MatrixXd& x) {
    return x;
}

// Neural Network Prediction
VectorXd predict(const MatrixXd& X, const MatrixXd& W) {
    return X * W;
}

int main() {
    // Input data (X) and true output (y)
    MatrixXd X(5, 2);
    X << 1, 2,
         2, 4,
         3, 6,
         4, 8,
         5, 10;

    VectorXd y(5);
    y << 3, 6, 9, 12, 15;

    // Add a column of ones to X for the bias term
    MatrixXd X_bias(X.rows(), X.cols() + 1);
    X_bias << MatrixXd::Ones(X.rows(), 1), X;

    // Randomly initialize the weights (including bias weight)
    MatrixXd W(X_bias.cols(), 1);
    W.setRandom();

    // Hyperparameters
    double learningRate = 0.01;
    int epochs = 1000;

    // Neural Network Training (Gradient Descent)
    for (int epoch = 0; epoch < epochs; ++epoch) {
        // Forward pass: Prediction
        VectorXd y_pred = predict(X_bias, W);

        // Calculate the loss (Mean Squared Error)
        VectorXd loss = y_pred - y;
        double meanSquaredError = loss.squaredNorm() / X.rows();

        // Backpropagation: Update the weights
        MatrixXd gradient = X_bias.transpose() * loss / X.rows();
        W -= learningRate * gradient;
    }

    // Print the learned weights
    std::cout << "Learned Weights:" << std::endl;
    std::cout << W << std::endl;

    return 0;
}


评论