简体   繁体   中英

sorting rows when exporting pandas csv file

I have a CSV file that contains 700k rows and what I need to do is create an additional CSV that has taken the data I need and sorted it into order.

So for example my original csv file has data that looks a bit like this.

Name     Code   Date        Area
Peter     01    01/01/2016  Wales
Peter     02    01/02/2017  England
Peter     34    25/02/2018  Wales
Paul      65    01/12/2015  Scotland
Paul      12    02/12/2015  Scotland
Simon     12    23/08/2016  England
Simon     12    28/09/2016  Wales
Simon     12    27/10/2018  England

What I need to do is create a unique row for a person but list the codes based on when the code was made (oldest date first). I should point out that I sorted the dataset in Excel to list the dates in the correct order (oldest first) to see if that would help.

So the output I need should look like this:

Name   Codes   
Peter  01,02,34
Paul   65,12
Simon  12,12,12

The codes are listed in order by oldest date.

I'm not interested in the Area as it has no relevance to final answer.

I have successfully managed to get the names and codes into the relevant columns but for some reason the codes are not listed by oldest date.

I have tried searching for this but can't seem to word it correctly to get a relevant result.

Does anyone know why the sequence does not export correctly?

The code I am using is:

df2 = df2.groupby ('Name')['Code'].apply(', '.join).reset_index()
df2
export_csv = df2.to_csv(r'Filelocation.csv', index = None, header = True)

Thanks

我想以下可能有效:

df2.groupby('Name')['Code'].apply(lambda x: " " % ', '.join(x))

As I'm not aware of what df2 holds for you, so I created a dataframe and verified your code which gives me the correct sequence. Implement and let me know what doesn't work for you:

df = pd.DataFrame({'Name': ['Peter', 'Peter', 'Peter', 'Paul', 'Paul', 'Simon', 'Simon', 'Simon'],
                   'Code': ['01', '02', '34', '65', '12', '12', '12', '12']})
df

    Code    Name
0   01  Peter
1   02  Peter
2   34  Peter
3   65  Paul
4   12  Paul
5   12  Simon
6   12  Simon
7   12  Simon

dfn = df.groupby ('Name')['Code'].apply(', '.join).reset_index()
dfn

    Name    Code
0   Paul    65, 12
1   Peter   01, 02, 34
2   Simon   12, 12, 12

dfn.index = dfn.Name # Mapping index with column Name
dfn.drop(columns=['Name'], inplace=True) # Dropping column Name
dfn

        Code
Name    
Paul    65, 12
Peter   01, 02, 34
Simon   12, 12, 12

dfn = dfn.loc[df.Name.unique()] # Bringing dataframe into your desired order
dfn

        Code
Name    
Peter   01, 02, 34
Paul    65, 12
Simon   12, 12, 12


dfn.to_csv('sample.csv')

img

Is this what you need? If not, comment.

The problem is likely you didn't specify dayfirst=True when converting your series to datetime :

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

You can then sort by Date and perform a groupby operation as normal:

res = df.sort_values('Date')\
        .groupby('Name')['Code']\
        .agg(lambda x: ','.join(map(str, x)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM