简体   繁体   English

"试图在 Panda DataFrame 中洗牌"

[英]Trying to shuffle rows in Panda DataFrame

Hopefully, someone can help, I'm trying to randomize the output 15 times, and save into excel, however, python is only giving me 1 output instead of 15希望有人可以提供帮助,我正在尝试将输出随机化 15 次,然后保存到 excel 中,但是,python 只给我 1 个输出而不是 15 个

import pandas as pd

# create a DataFrame
sales_to_do = {'Task': ['Call with the client', 'Preparing for the calls', 'Training staff',
                    'Daily tasks (Emails, questions, chasing)'],

           'Type of task': ['Call (external - new lead)', 'Preparing communication with 
leads', 'Training',
                            'Call (external - new lead)']}

df = pd.DataFrame(sales_to_do)
df_shuffled = df.sample(frac=1)

def randomize():
    df = pd.DataFrame(sales_to_do)
    df_shuffled = df.sample(frac=1)
    print(df_shuffled)


for i in range(15):
    randomize()

df_shuffled.to_excel(r'C:\Users\Alex\Desktop\Output1.xlsx', index=False, header=True)

You need to review the scoping rules.您需要查看范围规则。 You have two independent variables named df_shuffled<\/code> , one each in randomize<\/code> and your main program.您有两个名为df_shuffled<\/code>的自变量,一个在randomize<\/code>和您的主程序中。 You never link the two.你永远不会将两者联系起来。 As a result, all that randomize<\/code> does is to shuffle the local DF and print the result -- the main program never references that ordering.因此, randomize<\/code>所做的只是打乱本地 DF 并打印结果——主程序从不引用该排序。 At the end of your main, you simply dump the unchanged df_shuffled<\/code> from your first sampling.在 main 结束时,您只需从第一次采样中转储未更改的df_shuffled<\/code> 。

Also, you haven't concatenated the 15 shufflings;此外,您还没有连接 15 次改组; it's not clear what result you expect.目前尚不清楚您期望什么结果。 However, I can help with the function.但是,我可以帮助使用该功能。

def randomize():
    df = pd.DataFrame(sales_to_do)
    df_shuffled = df.sample(frac=1)
    return df_shuffled

for i in range(15):
    df_shuffled = randomize()
    # Adapt this output to append results per your needs
    df_shuffled.to_excel(r'C:\Users\Alex\Desktop\Output1.xlsx',
        index=False, header=True)

Something like this where you just return the shuffled df, and use pd.concat<\/code> on a list of these.像这样的东西,你只需返回洗牌的 df,并在这些列表上使用pd.concat<\/code> 。

sales_to_do = pd.DataFrame({'id':[1,2], 'name':['bob','mike']})

def randomize(df):
    return df.sample(frac=1)
    
df_shuffled = pd.concat([randomize(sales_to_do) for x in range(15)])

df_shuffled.to_excel(r'C:\Users\Alex\Desktop\Output1.xlsx', index=False, header=True)

Currently the easiest and quick way to do it is using shuffle<\/code> from sklearn.utils<\/code>目前最简单快捷的方法是使用sklearn.utils<\/code>中的shuffle<\/code>

    from sklearn.utils import shuffle
    df = shuffle(df, random_state=0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM