简体   繁体   English

根据其他列表中的值在 dataframe 中创建一列

[英]Making a column in a dataframe based of values in other lists

enter image description here I have two data frames.在此处输入图像描述我有两个数据框。 Each value of the 'Zip code' column contains a Zip code that is in either District 2,5, or 7. I want to make a brand new column called 'District' in the codes dataframe that corresponds to which district that zip code belongs too. “邮政编码”列的每个值都包含一个 Zip 代码,该代码位于第 2,5 区或第 7 区。我想在代码 dataframe 中创建一个名为“区”的全新列,该列对应于 ZADCDBD2ZD79A82D84175CADCDBD2ZD79A82D84175也。 This for loop doesn't seem to be working.这个 for 循环似乎不起作用。 I have attempted to make each of these columns into a list and then use a for loop but this doesn't seem to work since there are more District Codes than actual Zip Codes.我试图将这些列中的每一个列成一个列表,然后使用 for 循环,但这似乎不起作用,因为区域代码比实际的 Zip 代码多。 It ends up saying ValueError: Length of values does not match length of index它最终说 ValueError: 值的长度与索引的长度不匹配

Here is the code.这是代码。

d2 = d_codes['District 2'].tolist()   
d5 = d_codes['District 5'].tolist() 
d7 = d_codes['District 7'].tolist() 
main_zips = codes['Zip Code'].tolist()

result = [] 
for value in main_zips: 
    if value in d2: 
       result.append("District 2") 
    elif value in d5: 
       result.append("District 5") 
    elif value in d7: 
       result.append("District 7") 
   

codes["Result"] = result代码[“结果”] = 结果

Is there a better way to perform this task?有没有更好的方法来执行此任务?

A small note to start- it's best to give people a fully working example of your problem.一个小提示开始 - 最好给人们一个关于你的问题的完整工作示例。 Giving some fake data will make it a lot easier for people to help you.提供一些虚假数据将使人们更容易帮助您。

I would try to get your districts into a different structure- a single dataframe, districts, with two columns- zipcode and district.我会尝试让您的地区进入不同的结构 - 单个 dataframe,地区,有两列 - 邮政编码和地区。 Pandas melt is perfect for this: Pandas 熔体非常适合:

import pandas as pd
df = pd.read_csv("fake_data.csv")
print(df.head())
   District 2   District 5   District 7
0       23081        20106        20106
1       23090        20106        20106
2       23185        20106        20106
districts = df.melt()
print(districts)
      variable  value
0   District 2  23081
1   District 2  23090
2   District 2  23185
3   District 5  20106
4   District 5  20106
5   District 5  20106
6   District 7  20106
7   District 7  20106
8   District 7  20106

You can then merge your dataframes based on the zipcode column.然后,您可以根据 zipcode 列合并您的数据框。

codes = codes.merge(districts, how="left", left_on="zipcode", right_on="zipcode")
   x  zipcode   district
0  1    23081  District2
1  2    23090  District2
2  3    20106  District5
3  3    20106  District5
4  3    20106  District5
5  3    20106  District7
6  3    20106  District7
7  3    20106  District7

There's a couple of problems though, your screenshot shows the same zipcodes appearing in multiple districts, and also, you have duplicate zipcodes.但是有几个问题,您的屏幕截图显示多个地区出现相同的邮政编码,而且您有重复的邮政编码。 Merge will find all matches, so you'll end up with additional rows after the merge.合并将找到所有匹配项,因此您将在合并后得到额外的行。 You should fix the issue that puts the same zipcodes in multiple districts, and then you should deduplicate the zipcode column to ensure there's only one matching district per zipcode.您应该解决将相同邮政编码放在多个地区的问题,然后您应该对邮政编码列进行重复数据删除,以确保每个邮政编码只有一个匹配的地区。 Once that's done, then do the merge.完成后,进行合并。

Feel free to hit me up if you have any issues!如果您有任何问题,请随时联系我!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM