简体   繁体   English

是否有更快/更少的 RAM 使用方式来使用 Python 汇集数据?

[英]Is there any faster/ less RAM using way to pool the data using Python?

enter image description here在此处输入图像描述

https://kin-phinf.pstatic.net/20221001_267/1664597566757fY2pz_PNG/%C8%AD%B8%E9_%C4%B8%C3%B3_2022-10-01_001049.png?type=w750 https://kin-phinf.pstatic.net/20221001_267/1664597566757fY2pz_PNG/%C8%AD%B8%E9_%C4%B8%C3%B3_2022-10-01_001049.png?type=w750

I want to pool a data like the figure above, but it takes too much time and RAM usage.我想汇集一个像上图这样的数据,但是它需要太多的时间和 RAM 使用。 Can I make it faster / efficient?我能让它更快/更有效率吗?

My code is like this:我的代码是这样的:

data = df.groupby(['Name', 'Age', 'Pet', 'Allergy']).apply(lambda x: pd.Series(range(x['Amount'].squeeze()))).reset_index()
data = df.groupby(['Name', 'Age', 'Pet', 'Allergy']).apply(lambda x: pd.Series(range(x['Amount'].squeeze()))).reset_index()[['Name', 'Age', 'Pet', 'Allergy']]

enter image description here It's kind of an abbreviated form, but my actual dataset is 3.5GB.. So it takes really long time. enter image description here这是一种缩写形式,但我的实际数据集是 3.5GB ..所以它需要很长时间。 I wonder if there's any other way to do this work more fast.我想知道是否有任何其他方法可以更快地完成这项工作。

I'd appreciate any help!我将不胜感激任何帮助! Thank you!谢谢!

You could preallocate the final dataframe, then iterate the original dataframe, reassigning rows in the final.您可以预分配最终的 dataframe,然后迭代原始的 dataframe,重新分配最终的行。

import pandas as pd
import numpy as np

df = pd.DataFrame({"Name":["Male", "Female"],
    "Age":[29, 43], "Pet":["Cat", "Dog"],
    "Allergy":["Negative", "Positive"],
    "Amount":[2, 4]})

amounts = df["Amount"]
df.drop("Amount", axis=1, inplace=True)
counts = amounts.sum()

new_df = pd.DataFrame(columns=df.columns, index=np.arange(counts))
new_index = 0

for amount, (_, row) in zip(amounts, df.iterrows()):
    for i in range(new_index, new_index+amount):
        new_df.iloc[i] = row
    new_index = new_index+amount

del df, amounts, row

print(new_df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM