[英]How to add new columns to dataframe with value taken from another dataframe?
I have two dataframes: df1.head():我有两个数据框:df1.head():
nazwa województwa gmina nazwa gminy rodzaj gminy
0 Zachodniopomorskie 320101 Białogard 1
1 Zachodniopomorskie 320101 Białogard 1
2 Zachodniopomorskie 320101 Białogard 1
3 Zachodniopomorskie 320101 Białogard 1
4 Zachodniopomorskie 320101 Białogard 1
and kts_df.head():和 kts_df.head():
name type KTS_code TERYT_code
0 Polska COUNTRY 10000000000000 None
1 Bochnia RURAL_GMINA 10011212001022 1201022
2 Drwinia RURAL_GMINA 10011212001032 1201032
3 Iwanowice RURAL_GMINA 10011212006032 1206032
4 Lipnica Murowana RURAL_GMINA 10011212001042 1201042
Currently to add new column to df1 I am using目前要向我正在使用的 df1 添加新列
df['kts'] = df.apply(lambda row: self.get_kts_code(row, kts_df), axis=1)
where df['kts'] = df.apply(lambda row: self.get_kts_code(row, kts_df), axis=1)
其中
def get_kts_code(self, row, kts_df: DataFrame) -> str:
"""Get the KTS code of each miasto/wieś."""
gmina_types = {
'1': AdministrativeUnitType.URBAN_GMINA,
'2': AdministrativeUnitType.RURAL_GMINA,
'4': AdministrativeUnitType.MIXED_GMINA,
'5': AdministrativeUnitType.RURAL_AREA,
'8': AdministrativeUnitType.DISTRICT,
'9': AdministrativeUnitType.DELEGATION,
}
nazwa_gminy = row['nazwa gminy']
gmina_type = gmina_types[str(row['rodzaj gminy'])]
teryt = kts_df['TERYT_code'].str.contains(str(row['gmina']))
kts_code = kts_df.loc[
(kts_df['name'] == nazwa_gminy) & (kts_df['type'] == gmina_type) & (teryt)
]
kts_code = kts_code['KTS_code'].values[0]
return kts_code
This code works well, but to process df1 with about 200k rows it takes about an hour, too slow.这段代码运行良好,但要处理大约 200k 行的 df1 大约需要一个小时,太慢了。 Probably there can be another way to quickly find correct kts_code from kts_df for each row of df1?
可能还有另一种方法可以从 kts_df 中为 df1 的每一行快速找到正确的 kts_code?
I'm not sure if I've got your demand right, but you could try the following:我不确定我是否满足您的要求,但您可以尝试以下方法:
Code Example:代码示例:
gmina_types = {
'id': [1, 2, 3, 4, 5],
'type': ['URBAN_GMINA', 'RURAL_GMINA', 'MIXED_GMINA', 'RURAL_AREA', 'DISTRICT']
}
gmina_types_df = pd.DataFrame.from_dict(gmina_types)
kts_df = kts_df.join(gmina_types_df.set_index('type'), on='type')
df1 = df1.join(kts_df.set_index('id'), on='rodzaj gminy')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.