简体   繁体   English

如何加快将列值从 pandas dataframe 传输到另一个 dataframe

[英]How to speed up transferring column values from a pandas dataframe to another dataframe

I have a pandas dataframe such as:我有一个 pandas dataframe 例如:

在此处输入图像描述

And after a complex process I want a dataframe such as:经过一个复杂的过程,我想要一个 dataframe 例如:

在此处输入图像描述

So, I do this:所以,我这样做:

import pandas as pd

def complex_process(value):
    values=value.split(',')
    return ['results for '+x for x in values]

df=pd.DataFrame([['id1','a,b,c'],['id2','d'],['id3','e,f']],columns=['id','value'])

result_list=[]
id_list=[]
value_list=[]
for row in df.itertuples():
    results=complex_process(row.value)
    for result in results:
        result_list.append(result)
        id_list.append(row.id)
        value_list.append(row.value)
df_new=pd.DataFrame()
df_new['id']=id_list
df_new['value']=value_list
df_new['result']=result_list

This takes a long time with a large dataset.对于大型数据集,这需要很长时间。 I tested the complex process and it doesn't take very long.我测试了复杂的过程,它不需要很长时间。 Is there a faster way to transfer the columns?有没有更快的方法来转移列?

Doing this operation with lists and loops is cumbersome and looping through DataFrames is computationally expensive, but pandas has lots of built-in operations so you shouldn't need to iterate through DataFrames most of the time.使用列表和循环执行此操作很麻烦,并且遍历 DataFrame 的计算成本很高,但是 pandas 有很多内置操作,因此您大部分时间都不需要遍历 DataFrame。

Since your complex_process function is intended as a placeholder, let's apply your function to each row using .apply , and save the results in a new row called result :由于您的complex_process function 旨在用作占位符,因此让我们使用.apply将您的 function 应用于每一行,并将结果保存在名为result的新行中:

df['result'] = df.value.apply(complex_process)

Your DataFrame will look like this:您的 DataFrame 将如下所示:

>>> df
    id  value                                        results
0  id1  a,b,c  [results for a, results for b, results for c]
1  id2      d                                [results for d]
2  id3    e,f                 [results for e, results for f]

Now you can use the convenient .explode method to expand a list-like column into rows.现在您可以使用方便的.explode方法将类似列表的列展开为行。 This will duplicate the other columns and the index, so we can reset the index as well, and drop the old index.这将复制其他列和索引,因此我们也可以重置索引,并删除旧索引。

df_new = df.explode('result').reset_index(drop=True)

Final result:最后结果:

>>> df_new
    id  value         result
0  id1  a,b,c  results for a
1  id1  a,b,c  results for b
2  id1  a,b,c  results for c
3  id2      d  results for d
4  id3    e,f  results for e
5  id3    e,f  results for f

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas 在列值匹配时使用来自另一个数据帧的值更新数据帧 - pandas update a dataframe with values from another dataframe on the match of column values 熊猫:从另一列修改数据框中的值 - pandas: modifying values in dataframe from another column 如何加快 Pandas 应用 function 在 dataframe 中创建新列? - How to speed up Pandas apply function to create a new column in the dataframe? 如何根据大熊猫数据帧的列值加速行选择 - How to speed up row selection by column value for big Pandas dataframe 如何加快访问pandas dataframe列中的dicts列表? - How to speed up accessing a list of dicts inside a pandas dataframe column? 如何在熊猫中添加来自其他数据框的值的列 - How to add a column in pandas with values taken from another dataframe 熊猫:从另一个数据框中的列值插入数据框中的行 - pandas: insert rows in a dataframe from column values in another dataframe 使用来自另一个DataFrame的值将列有效地添加到Pandas DataFrame - Efficiently add column to Pandas DataFrame with values from another DataFrame Pandas:根据另一个数据框中的值更新数据框中的多列 - Pandas : Updating multiple column in a dataframe based on values from another dataframe Pandas:将从 DataFrame 中提取的值乘以另一个 DataFrame 中的列值 - Pandas: Multiplying a value extracted from a DataFrame to column values in another DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM