如何在python，pandas中循环输出？

Question

I want to randomly select data on a quarterly basis and set the output in different csv files, and make this loop for several years. 我想每季度随机选择数据，并将输出设置在不同的csv文件中，并使该循环运行数年。 The following is an example of data. 以下是数据示例。

 <table><tbody><tr><th>Event number</th><th>Month</th><th>Year</th><th>Unauthorised activity</th><th>Theft and fraud (internal)</th><th>Theft and fraud (external)</th></tr><tr><td>72</td><td>1</td><td>2015</td><td>0</td><td>1</td><td>1</td></tr><tr><td>73</td><td>2</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>74</td><td>3</td><td>2015</td><td>0</td><td>0</td><td>1</td></tr><tr><td>75</td><td>4</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>76</td><td>5</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>80</td><td>6</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>81</td><td>7</td><td>2015</td><td>0</td><td>0</td><td>1</td></tr><tr><td>83</td><td>8</td><td>2015</td><td>0</td><td>1</td><td>0</td></tr><tr><td>84</td><td>9</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>87</td><td>10</td><td>2015</td><td>0</td><td>0</td><td>1</td></tr><tr><td>90</td><td>11</td><td>2015</td><td>0</td><td>0</td><td>0</td></tr><tr><td>103</td><td>12</td><td>2015</td><td>1</td><td>0</td><td>0</td></tr></tbody></table>

Here is my current code: 这是我当前的代码：

df = pd.read_pickle('data.pkl')
df.set_index(['Claim  Number'], inplace = True)

df2015q1 = df[(1 <= df.Month )&(df.Month <= 3) & (df.Year == 2015)]

df2015q1_random = df2015q1.sample(n=200)
df2015q1_random.sort_index(inplace=True)
df2015q1_random = df2015q1_random.drop(['Month', 'Year'], axis = 1)
df2015q1_random = df2015q1_random.drop_duplicates()

df2015q1_random.to_csv('2015Q1.csv')

The expected output for 2015 quarter 1 is 2015Q1.csv，for quarter 2 is 2015Q2.csv etc. Currently, my output for a single stage is right, but I do not know how to write a loop for this. 2015年第1季度的预期输出为2015Q1.csv，第2季度的预期输出为2015Q2.csv等。目前，我的单阶段输出是正确的，但我不知道如何为此编写循环。 How can I do this for several years, say 2010 to 2016, and write the output in different files? 我如何在几年内（例如2010年至2016年）执行此操作，并将输出写入不同的文件中？ Thanks. 谢谢。

Answer 1

Let's create a function and use a list as inputs to make this happen. 让我们创建一个函数，并使用列表作为输入来实现此目的。 I haven't tested the code so you'll have to do that yourself. 我尚未测试代码，因此您必须自己做。 This is more to give you an idea how it can be done. 这更多是为了让您了解如何完成此操作。 Basically, you create a function for re-usability and afterwards you loop over a list of years to get a set of results. 基本上，您创建了一个可重用性的函数，然后循环浏览年份列表以获取一组结果。

# create a function that will report on a specific year
def save_file(df, year):
    dfq1 = df[(1 <= df.Month )&(df.Month <= 3) & (df.Year == year)]
    dfq1_random = dfq1.sample(n=200)
    dfq1_random.sort_index(inplace=True)
    dfq1_random = dfq1_random.drop(['Month', 'Year'], axis = 1)
    dfq1_random = dfq1_random.drop_duplicates()

    dfq1_random.to_csv(str(year) + 'Q1.csv')

# load the data and call your function for each year you want reported on
df = pd.read_pickle('data.pkl')
df.set_index(['Claim  Number'], inplace = True)

list_years = [2015, 2016] 
for year in list_years:
    save_file(df, year)

Answer 2

I would approach this with a groupby statement 我会用groupby语句来解决这个问题

import pandas
years_of_interest = [2010, 2011, 2012, 2013, 2014, 2015, 2016]
data = {'Claim Number': ['1234x', '2345x', '34567x', '78910x', '87911x', '98732x'],
        'Month': [1, 2, 3, 2, 1, 7],
        'Year': [2010, 2010, 2013, 2014, 2015, 2015]}

df = pandas.DataFrame(data).set_index('Claim Number'):
grouper = df.groupby('Year')
for year, data in grouper:
   if year in years_of_interest:
       q1_data = data[df.Month <=3]
       # Do your other work and save

Answer 3

You can use something like this 你可以用这样的东西

years = [2015,2016] 
qtrs = [1,2,3,4] 
for year in years:
    for qtr in qtrs:
        temp = df[(df.Month <= 3*qtr) (df.Month >= 3*(qtr-1)) & (df.Year == year)] 
        temp_random = temp.sample(n=200) 
        temp_random.sort_index(inplace=True) 
        temp_random = temp_random.drop(['Month', 'Year'], axis = 1) 
        temp_random = temp_random.drop_duplicates()
        temp_random.to_csv((str(year)+'Q'+str(qtr)+'.csv')

如何在python，pandas中循环输出？

问题描述

3 个解决方案

解决方案1
2 2017-12-05 11:18:00

解决方案2
-1 2017-12-05 11:50:57

解决方案3
-1 已采纳 2017-12-05 12:35:05

如何在python，pandas中循环输出？

问题描述

3 个解决方案

解决方案1 2 2017-12-05 11:18:00

解决方案2 -1 2017-12-05 11:50:57

解决方案3 -1 已采纳 2017-12-05 12:35:05

解决方案1
2 2017-12-05 11:18:00

解决方案2
-1 2017-12-05 11:50:57

解决方案3
-1 已采纳 2017-12-05 12:35:05