pca 如何画pca图

2022-03-24 11:14| 来源: 网络整理| 查看: 265

确定主要成分 (Determining the principal components)

Now that the data has been standardized we can perform the PCA. Remember the goal of PCA:

现在数据已经标准化，我们可以执行PCA了。记住PCA的目标：

PCA creates principal components that capture an increasing proportion of the variance in the dataset. The first principal component captures the largest amount of variance and so on…

PCA创建可捕获数据集中越来越大比例的方差的主成分。第一个主成分捕获最大的方差，依此类推...

In other words, we want to draw a line through our data points that captures the largest amount of variance that can be captured by a line. Let me provide you with an illustration that I made.

换句话说，我们希望通过数据点画一条线，以捕获一条线可以捕获的最大方差。让我为您提供我所做的插图。

The red dots are the data points, the blue line is the candidate principal component and the dashed gray lines are orthogonal projections of the data points on the blue line.

红色点是数据点，蓝色线是候选主成分，灰色虚线是数据点在蓝色线上的正交投影。

A bad principal component — Image by author 一个坏的主要组成部分—照片作者作者

The figure above shows a line (blue) that is a bad candidate to be the first principal component. When the red dots are projected on the blue line they lie very close to each other, or in other words, they have a low variance.

上图显示了一条线(蓝色)，它不适合用作第一个主要成分。当红点投射在蓝线上时，它们彼此非常靠近，换句话说，它们的方差很小。

A better candidate for the principal component is the following blue line:

下面的蓝线是更好的主成分候选：

A better candidate for the principal component — Image by author 主成分更好的候选人—照片作者作者

When the red dots are projected on this blue line, they are far apart from each other which means that their variance is large.

当红点投影在此蓝线上时，它们彼此相距很远，这意味着它们的方差很大。

Hence,

因此，

To determine the first principal component, we chose the principal component that maximizes the variance of the projections on said principal component.

为了确定第一主成分，我们选择了使所述主成分上的投影的方差最大的主成分。

Why do we focus so much on maximizing the amount of variance? Well, the variance is often considered a measure for the amount of information in data. Look at the following sequence: 1,1,1,1,1,1,1. The sequence does not contain much information and its variance is equal to zero. By maximizing the amount of variance that is captured by the principal component, we are maximizing the amount of information represented by the principal component.

为什么我们如此集中精力于最大化方差量？好吧，方差通常被认为是衡量数据信息量的指标。请看以下顺序：1,1,1,1,1,1,1。该序列包含的信息不多，其方差等于零。通过最大化主成分捕获的方差量，我们可以最大化主成分表示的信息量。

The following R code calculates the optimal first principal component. It does this by trying out different candidate principal components and selecting the one for which the variance of the projected observations is maximal.

以下R代码计算出最佳的第一主成分。它通过尝试不同的候选主成分并选择投影观测值方差最大的成分来做到这一点。

variances = c() # Because we centered the variables, all principal components will go through # the origin. The line is thus defined by exclusively the angle. # Try all possible angles 1 -> 89 (since slope clearly > 0 based on plot) for(angle in 1:89){ # Line through origin with specified angle Line

【本文地址】

公司简介

联系我们