简体   繁体   English

如何在python,pandas中循环输出?

[英]How to loop output in python, pandas?

I want to randomly select data on a quarterly basis and set the output in different csv files, and make this loop for several years. 我想每季度随机选择数据,并将输出设置在不同的csv文件中,并使该循环运行数年。 The following is an example of data. 以下是数据示例。

 <table><tbody><tr><th>Event number</th><th>Month</th><th>Year</th><th>Unauthorised activity</th><th>Theft and fraud (internal)</th><th>Theft and fraud (external)</th></tr><tr><td>72</td><td>1</td><td>2015</td><td>0</td><td>1</td><td>1</td></tr><tr><td>73</td><td>2</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>74</td><td>3</td><td>2015</td><td>0</td><td>0</td><td>1</td></tr><tr><td>75</td><td>4</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>76</td><td>5</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>80</td><td>6</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>81</td><td>7</td><td>2015</td><td>0</td><td>0</td><td>1</td></tr><tr><td>83</td><td>8</td><td>2015</td><td>0</td><td>1</td><td>0</td></tr><tr><td>84</td><td>9</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>87</td><td>10</td><td>2015</td><td>0</td><td>0</td><td>1</td></tr><tr><td>90</td><td>11</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>103</td><td>12</td><td>2015</td><td>1</td><td>0</td><td>0</td></tr></tbody></table> 

Here is my current code: 这是我当前的代码:

df = pd.read_pickle('data.pkl')
df.set_index(['Claim  Number'], inplace = True)

df2015q1 = df[(1 <= df.Month )&(df.Month <= 3) & (df.Year == 2015)]

df2015q1_random = df2015q1.sample(n=200)
df2015q1_random.sort_index(inplace=True)
df2015q1_random = df2015q1_random.drop(['Month', 'Year'], axis = 1)
df2015q1_random = df2015q1_random.drop_duplicates()

df2015q1_random.to_csv('2015Q1.csv')

The expected output for 2015 quarter 1 is 2015Q1.csv,for quarter 2 is 2015Q2.csv etc. Currently, my output for a single stage is right, but I do not know how to write a loop for this. 2015年第1季度的预期输出为2015Q1.csv,第2季度的预期输出为2015Q2.csv等。目前,我的单阶段输出是正确的,但我不知道如何为此编写循环。 How can I do this for several years, say 2010 to 2016, and write the output in different files? 我如何在几年内(例如2010年至2016年)执行此操作,并将输出写入不同的文件中? Thanks. 谢谢。

Let's create a function and use a list as inputs to make this happen. 让我们创建一个函数,并使用列表作为输入来实现此目的。 I haven't tested the code so you'll have to do that yourself. 我尚未测试代码,因此您必须自己做。 This is more to give you an idea how it can be done. 这更多是为了让您了解如何完成此操作。 Basically, you create a function for re-usability and afterwards you loop over a list of years to get a set of results. 基本上,您创建了一个可重用性的函数,然后循环浏览年份列表以获取一组结果。

# create a function that will report on a specific year
def save_file(df, year):
    dfq1 = df[(1 <= df.Month )&(df.Month <= 3) & (df.Year == year)]
    dfq1_random = dfq1.sample(n=200)
    dfq1_random.sort_index(inplace=True)
    dfq1_random = dfq1_random.drop(['Month', 'Year'], axis = 1)
    dfq1_random = dfq1_random.drop_duplicates()

    dfq1_random.to_csv(str(year) + 'Q1.csv')

# load the data and call your function for each year you want reported on
df = pd.read_pickle('data.pkl')
df.set_index(['Claim  Number'], inplace = True)

list_years = [2015, 2016] 
for year in list_years:
    save_file(df, year)

I would approach this with a groupby statement 我会用groupby语句来解决这个问题

import pandas
years_of_interest = [2010, 2011, 2012, 2013, 2014, 2015, 2016]
data = {'Claim Number': ['1234x', '2345x', '34567x', '78910x', '87911x', '98732x'],
        'Month': [1, 2, 3, 2, 1, 7],
        'Year': [2010, 2010, 2013, 2014, 2015, 2015]}

df = pandas.DataFrame(data).set_index('Claim Number'):
grouper = df.groupby('Year')
for year, data in grouper:
   if year in years_of_interest:
       q1_data = data[df.Month <=3]
       # Do your other work and save

You can use something like this 你可以用这样的东西

years = [2015,2016] 
qtrs = [1,2,3,4] 
for year in years:
    for qtr in qtrs:
        temp = df[(df.Month <= 3*qtr) (df.Month >= 3*(qtr-1)) & (df.Year == year)] 
        temp_random = temp.sample(n=200) 
        temp_random.sort_index(inplace=True) 
        temp_random = temp_random.drop(['Month', 'Year'], axis = 1) 
        temp_random = temp_random.drop_duplicates()
        temp_random.to_csv((str(year)+'Q'+str(qtr)+'.csv')     

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM