如何加快将列值从 pandas dataframe 传输到另一个 dataframe

Question

I have a pandas dataframe such as:我有一个 pandas dataframe 例如：

And after a complex process I want a dataframe such as:经过一个复杂的过程，我想要一个 dataframe 例如：

So, I do this:所以，我这样做：

import pandas as pd

def complex_process(value):
    values=value.split(',')
    return ['results for '+x for x in values]

df=pd.DataFrame([['id1','a,b,c'],['id2','d'],['id3','e,f']],columns=['id','value'])

result_list=[]
id_list=[]
value_list=[]
for row in df.itertuples():
    results=complex_process(row.value)
    for result in results:
        result_list.append(result)
        id_list.append(row.id)
        value_list.append(row.value)
df_new=pd.DataFrame()
df_new['id']=id_list
df_new['value']=value_list
df_new['result']=result_list

This takes a long time with a large dataset.对于大型数据集，这需要很长时间。 I tested the complex process and it doesn't take very long.我测试了复杂的过程，它不需要很长时间。 Is there a faster way to transfer the columns?有没有更快的方法来转移列？

Answer 1

Doing this operation with lists and loops is cumbersome and looping through DataFrames is computationally expensive, but pandas has lots of built-in operations so you shouldn't need to iterate through DataFrames most of the time.使用列表和循环执行此操作很麻烦，并且遍历 DataFrame 的计算成本很高，但是 pandas 有很多内置操作，因此您大部分时间都不需要遍历 DataFrame。

Since your complex_process function is intended as a placeholder, let's apply your function to each row using .apply , and save the results in a new row called result :由于您的complex_process function 旨在用作占位符，因此让我们使用.apply将您的 function 应用于每一行，并将结果保存在名为result的新行中：

df['result'] = df.value.apply(complex_process)

Your DataFrame will look like this:您的 DataFrame 将如下所示：

>>> df
    id  value                                        results
0  id1  a,b,c  [results for a, results for b, results for c]
1  id2      d                                [results for d]
2  id3    e,f                 [results for e, results for f]

Now you can use the convenient .explode method to expand a list-like column into rows.现在您可以使用方便的.explode方法将类似列表的列展开为行。 This will duplicate the other columns and the index, so we can reset the index as well, and drop the old index.这将复制其他列和索引，因此我们也可以重置索引，并删除旧索引。

df_new = df.explode('result').reset_index(drop=True)

Final result:最后结果：

>>> df_new
    id  value         result
0  id1  a,b,c  results for a
1  id1  a,b,c  results for b
2  id1  a,b,c  results for c
3  id2      d  results for d
4  id3    e,f  results for e
5  id3    e,f  results for f

如何加快将列值从 pandas dataframe 传输到另一个 dataframe

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-15 07:05:53

如何加快将列值从 pandas dataframe 传输到另一个 dataframe

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-15 07:05:53

解决方案1
2 已采纳 2021-04-15 07:05:53