简体   繁体   English

如何使用自定义标头从 pandas 中的 3 个数据帧收集信息?

[英]How do I gather information from 3 dataframes in pandas with custom headers?

I am learning pandas and doing some exercies but without much source我正在学习 pandas 并做了一些练习,但没有太多资源

So basically I have these 4 dataframes below:所以基本上我在下面有这 4 个数据框: 在此处输入图像描述

在此处输入图像描述

So for every bill in the dataset, i want to know how many legislators supported the bill and how many legislators opposed the bill, also who was the primary sponsor of the bill?因此,对于数据集中的每一项法案,我想知道有多少立法者支持该法案,有多少立法者反对该法案,还有谁是该法案的主要发起人?

This is what I am trying to achieve:这就是我想要实现的目标: 在此处输入图像描述

I was able to solve this one: Is there a way to count how many entries exists with a certain filter for python pandas?我能够解决这个问题: 有没有办法计算 python pandas 的某个过滤器存在多少条目?

But what I'm asking now involves 3 tables I guess(?)但是我现在要问的是我猜的 3 个表(?)

Use following loigc:使用以下逻辑:

  • Join "bills" with "votes" on "bill_id"在“bill_id”上加入“bills”和“votes”
  • Join "vote_results" to above on "vote_id"将“vote_results”加入上面的“vote_id”
  • Group by "bill_id"按“bill_id”分组
  • Aggregate by filtering and counting by vote_type通过 vote_type 筛选和计数进行聚合

I have assumed some dummy data:我假设了一些虚拟数据:

bills = pd.DataFrame(data=[[1,"Bill #1","P1"],[2,"Bill #2","P2"]], columns=["id","title","Primary Sponsor"])

legislators = pd.DataFrame(data=[[1,"Legislator A"],[2,"Legislator B"],[3,"Legislator C"]], columns=["id","name"])

votes = pd.DataFrame(data=[[1,1],[2,1],[3,1],[4,2],[5,2],[6,2]], columns=["id","bill_id"])

vote_results = pd.DataFrame(data=[[1,1,1,1],[2,2,2,2],[3,3,3,1],[4,1,4,1],[5,2,5,2],[6,3,6,2]], columns=["id","legislator_id","vote_id","vote_type"])


result_df = bills.merge(votes.rename(columns={"id": "vote_id"}), left_on="id", right_on="bill_id") \
                 .merge(vote_results.rename(columns={"vote_id": "vote_id2"}).drop("id", axis=1), left_on="vote_id", right_on="vote_id2") \
                 .groupby(["id","title","Primary Sponsor"]) \
                 .apply(lambda x: pd.Series({
                     "supporter_count": len([v for v in x.vote_type if v==1]),
                     "opposer_count": len([v for v in x.vote_type if v==2]),
                     })) \
                 .reset_index()

Output: Output:

   id    title Primary Sponsor  supporter_count  opposer_count
0   1  Bill #1              P1                2              1
1   2  Bill #2              P2                1              2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM