Python

使用.corr获取两列之间的相关性

发布于 2021-01-29 19:37:06

我有以下pandas数据框Top15：在此处输入图片说明

我创建了一个估计每人可引用文件数量的列：

Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst']

我想知道人均引用文件数量与人均能源供应之间的相关性。因此，我使用该.corr()方法
（皮尔逊相关性）：

data = Top15[['Citable docs per Capita','Energy Supply per Capita']]
correlation = data.corr(method='pearson')

关注者

被浏览

1 个回答

面试哥 2021-01-29

为面试而生，有面试问题，就找面试哥。
没有实际数据，很难回答这个问题，但是我想您正在
寻找这样的东西：
```
Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])
```
That calculates the correlation between your two
columns
'Citable docs per Capita'
and 'Energy Supply per Capita'.

To give an example:
```
import pandas as pd

df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})

   A  B
0  0  0
1  1  2
2  2  4
3  3  6
```
Then
```
df['A'].corr(df['B'])
```
gives 1 as expected.

Now, if you change a value, e.g.
```
df.loc[2, 'B'] = 4.5

   A    B
0  0  0.0
1  1  2.0
2  2  4.5
3  3  6.0
```
the command
```
df['A'].corr(df['B'])
```
returns
```
0.99586
```
which is still close to 1, as expected.

If you apply .corr directly to your dataframe, it will return all pairwise
correlations between your columns; that’s why you then
observe 1s at the diagonal of your matrix (each column is perfectly
correlated with itself).
```
df.corr()
```
will therefore return
```
          A         B
A  1.000000  0.995862
B  0.995862  1.000000
```
在您显示的图形中，仅表示相关矩阵的左上角（我假设）。

有可能的情况下，你在哪里得到NaN您的解决方案的S -检查这个职位的一个例子。

如果要过滤高于或低于某个阈值的条目，可以检查此问题。如果要绘制相关
系数的热图，则可以检查该答案，如果然后遇到轴标签重叠的问题，请检查以下文章。

知识点

Python

面圈网VIP题库全新上线，海量真题题库资源。 90大类考试，超10万份考试真题开放下载啦

去下载看看