简体   繁体   English

如何将新列添加到 dataframe,其值取自另一个 dataframe?

[英]How to add new columns to dataframe with value taken from another dataframe?

I have two dataframes: df1.head():我有两个数据框:df1.head():

    nazwa województwa   gmina nazwa gminy  rodzaj gminy
0  Zachodniopomorskie  320101   Białogard             1
1  Zachodniopomorskie  320101   Białogard             1
2  Zachodniopomorskie  320101   Białogard             1
3  Zachodniopomorskie  320101   Białogard             1
4  Zachodniopomorskie  320101   Białogard             1

and kts_df.head():和 kts_df.head():

               name         type        KTS_code TERYT_code
0            Polska      COUNTRY  10000000000000       None
1           Bochnia  RURAL_GMINA  10011212001022    1201022
2           Drwinia  RURAL_GMINA  10011212001032    1201032
3         Iwanowice  RURAL_GMINA  10011212006032    1206032
4  Lipnica Murowana  RURAL_GMINA  10011212001042    1201042

Currently to add new column to df1 I am using目前要向我正在使用的 df1 添加新列

df['kts'] = df.apply(lambda row: self.get_kts_code(row, kts_df), axis=1) where df['kts'] = df.apply(lambda row: self.get_kts_code(row, kts_df), axis=1)其中

    def get_kts_code(self, row, kts_df: DataFrame) -> str:
            """Get the KTS code of each miasto/wieś."""
            gmina_types = {
                '1': AdministrativeUnitType.URBAN_GMINA,
                '2': AdministrativeUnitType.RURAL_GMINA,
                '4': AdministrativeUnitType.MIXED_GMINA,
                '5': AdministrativeUnitType.RURAL_AREA,
                '8': AdministrativeUnitType.DISTRICT,
                '9': AdministrativeUnitType.DELEGATION,
            }
            nazwa_gminy = row['nazwa gminy']
            gmina_type = gmina_types[str(row['rodzaj gminy'])]
            teryt = kts_df['TERYT_code'].str.contains(str(row['gmina']))
            kts_code = kts_df.loc[
                (kts_df['name'] == nazwa_gminy) & (kts_df['type'] == gmina_type) & (teryt)
            ]
            kts_code = kts_code['KTS_code'].values[0]
            return kts_code

This code works well, but to process df1 with about 200k rows it takes about an hour, too slow.这段代码运行良好,但要处理大约 200k 行的 df1 大约需要一个小时,太慢了。 Probably there can be another way to quickly find correct kts_code from kts_df for each row of df1?可能还有另一种方法可以从 kts_df 中为 df1 的每一行快速找到正确的 kts_code?

I'm not sure if I've got your demand right, but you could try the following:我不确定我是否满足您的要求,但您可以尝试以下方法:

  • create a df with the corresponding gmina_type mapping使用相应的 gmina_type 映射创建一个 df
  • join kts_df with gmina_types_df to get the gmina IDs将 kts_df 与 gmina_types_df 连接以获取 gmina ID
  • join df1 with the enriched kts_df用丰富的 kts_df 加入 df1

Code Example:代码示例:

gmina_types = {
    'id': [1, 2, 3, 4, 5],
    'type': ['URBAN_GMINA', 'RURAL_GMINA', 'MIXED_GMINA', 'RURAL_AREA', 'DISTRICT']
}

gmina_types_df = pd.DataFrame.from_dict(gmina_types)

kts_df = kts_df.join(gmina_types_df.set_index('type'), on='type')
df1 = df1.join(kts_df.set_index('id'), on='rodzaj gminy')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM