简体   繁体   English

Python:从 dataframe 创建邻接矩阵

[英]Python: Creating an adjacency matrix from a dataframe

I have the following data frame:我有以下数据框:

Company Firm
125911  1
125911  2
32679   3
32679   5
32679   5
32679   8
32679   10
32679   12
43805   14
67734   8
67734   9
67734   10
67734   10
67734   11
67734   12
67734   13
74240   4
74240   6
74240   7

Where basically the firm makes an investment into the company at a specific year which in this case is the same year for all companies.基本上,公司在特定年份对公司进行投资,在这种情况下,所有公司都是同一年。 What I want to do in python is to create a simple adjacency matrix with only 0's and 1's.我想在 python 中做的是创建一个只有 0 和 1 的简单邻接矩阵。 1 if two firms has made an investment into the same company. 1 如果两家公司对同一家公司进行了投资。 So even if firm 10 and 8 for example have invested in two different firms at the same it will still be a 1. The resulting matrix I am looking for looks like:因此,即使公司 10 和 8 例如同时投资了两家不同的公司,它仍然是 1。我正在寻找的结果矩阵如下所示:

Firm 1  2   3   4   5   6   7   8   9   10  11  12  13  14
1   0   1   0   0   0   0   0   0   0   0   0   0   0   0
2   1   0   0   0   0   0   0   0   0   0   0   0   0   0
3   0   0   0   0   1   0   0   1   0   1   0   1   0   0
4   0   0   0   0   0   1   1   0   0   0   0   0   0   0
5   0   0   1   0   0   0   0   1   0   1   0   1   0   0
6   0   0   0   1   0   0   1   0   0   0   0   0   0   0
7   0   0   0   1   0   1   0   0   0   0   0   0   0   0
8   0   0   1   0   1   0   0   0   1   1   1   1   1   0
9   0   0   0   0   0   0   0   1   0   1   1   1   1   0
10  0   0   1   0   1   0   0   1   1   0   1   1   1   0
11  0   0   0   0   0   0   0   1   1   1   0   1   1   0
12  0   0   1   0   1   0   0   1   1   1   1   0   1   0
13  0   0   0   0   0   0   0   1   1   1   1   1   0   0
14  0   0   0   0   0   0   0   0   0   0   0   0   0   0

I have seen similar questions where you can use crosstab , however in that case each company will only have one row with all the firms in different columns instead.我见过类似的问题,您可以在其中使用crosstab ,但是在这种情况下,每家公司将只有一行,所有公司都位于不同的列中。 So I am wondering what the best and most efficient way to tackle this specific problem is?所以我想知道解决这个特定问题的最佳和最有效的方法是什么? Any help is greatly appreciated.任何帮助是极大的赞赏。

dfs = []
for s in df.groupby("Company").agg(list).values:
    dfs.append(pd.DataFrame(index=set(s[0]), columns=set(s[0])).fillna(1))

out = pd.concat(dfs).groupby(level=0).sum().gt(0).astype(int)
np.fill_diagonal(out.values, 0)
print(out)

Prints:印刷:

    1   2   3   4   5   6   7   8   9   10  11  12  13  14
1    0   1   0   0   0   0   0   0   0   0   0   0   0   0
2    1   0   0   0   0   0   0   0   0   0   0   0   0   0
3    0   0   0   0   1   0   0   1   0   1   0   1   0   0
4    0   0   0   0   0   1   1   0   0   0   0   0   0   0
5    0   0   1   0   0   0   0   1   0   1   0   1   0   0
6    0   0   0   1   0   0   1   0   0   0   0   0   0   0
7    0   0   0   1   0   1   0   0   0   0   0   0   0   0
8    0   0   1   0   1   0   0   0   1   1   1   1   1   0
9    0   0   0   0   0   0   0   1   0   1   1   1   1   0
10   0   0   1   0   1   0   0   1   1   0   1   1   1   0
11   0   0   0   0   0   0   0   1   1   1   0   1   1   0
12   0   0   1   0   1   0   0   1   1   1   1   0   1   0
13   0   0   0   0   0   0   0   1   1   1   1   1   0   0
14   0   0   0   0   0   0   0   0   0   0   0   0   0   0
dfm = df.merge(df, on="Company").query("Firm_x != Firm_y")
out = pd.crosstab(dfm['Firm_x'], dfm['Firm_y'])
>>> out
Firm_y  1   2   3   4   5   6   7   8   9   10  11  12  13  14
Firm_x
1        1   0   0   0   0   0   0   0   0   0   0   0   0   0
2        0   1   0   0   0   0   0   0   0   0   0   0   0   0
3        0   0   1   0   0   0   0   0   0   0   0   0   0   0
4        0   0   0   1   0   0   0   0   0   0   0   0   0   0
5        0   0   0   0   4   0   0   0   0   0   0   0   0   0
6        0   0   0   0   0   1   0   0   0   0   0   0   0   0
7        0   0   0   0   0   0   1   0   0   0   0   0   0   0
8        0   0   0   0   0   0   0   2   0   0   0   0   0   0
9        0   0   0   0   0   0   0   0   1   0   0   0   0   0
10       0   0   0   0   0   0   0   0   0   5   0   0   0   0
11       0   0   0   0   0   0   0   0   0   0   1   0   0   0
12       0   0   0   0   0   0   0   0   0   0   0   2   0   0
13       0   0   0   0   0   0   0   0   0   0   0   0   1   0
14       0   0   0   0   0   0   0   0   0   0   0   0   0   1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM