I have a CSV file that contains 700k rows and what I need to do is create an additional CSV that has taken the data I need and sorted it into order.
So for example my original csv file has data that looks a bit like this.
Name Code Date Area
Peter 01 01/01/2016 Wales
Peter 02 01/02/2017 England
Peter 34 25/02/2018 Wales
Paul 65 01/12/2015 Scotland
Paul 12 02/12/2015 Scotland
Simon 12 23/08/2016 England
Simon 12 28/09/2016 Wales
Simon 12 27/10/2018 England
What I need to do is create a unique row for a person but list the codes based on when the code was made (oldest date first). I should point out that I sorted the dataset in Excel to list the dates in the correct order (oldest first) to see if that would help.
So the output I need should look like this:
Name Codes
Peter 01,02,34
Paul 65,12
Simon 12,12,12
The codes are listed in order by oldest date.
I'm not interested in the Area as it has no relevance to final answer.
I have successfully managed to get the names and codes into the relevant columns but for some reason the codes are not listed by oldest date.
I have tried searching for this but can't seem to word it correctly to get a relevant result.
Does anyone know why the sequence does not export correctly?
The code I am using is:
df2 = df2.groupby ('Name')['Code'].apply(', '.join).reset_index()
df2
export_csv = df2.to_csv(r'Filelocation.csv', index = None, header = True)
Thanks
我想以下可能有效:
df2.groupby('Name')['Code'].apply(lambda x: " " % ', '.join(x))
As I'm not aware of what df2
holds for you, so I created a dataframe and verified your code which gives me the correct sequence. Implement and let me know what doesn't work for you:
df = pd.DataFrame({'Name': ['Peter', 'Peter', 'Peter', 'Paul', 'Paul', 'Simon', 'Simon', 'Simon'],
'Code': ['01', '02', '34', '65', '12', '12', '12', '12']})
df
Code Name
0 01 Peter
1 02 Peter
2 34 Peter
3 65 Paul
4 12 Paul
5 12 Simon
6 12 Simon
7 12 Simon
dfn = df.groupby ('Name')['Code'].apply(', '.join).reset_index()
dfn
Name Code
0 Paul 65, 12
1 Peter 01, 02, 34
2 Simon 12, 12, 12
dfn.index = dfn.Name # Mapping index with column Name
dfn.drop(columns=['Name'], inplace=True) # Dropping column Name
dfn
Code
Name
Paul 65, 12
Peter 01, 02, 34
Simon 12, 12, 12
dfn = dfn.loc[df.Name.unique()] # Bringing dataframe into your desired order
dfn
Code
Name
Peter 01, 02, 34
Paul 65, 12
Simon 12, 12, 12
dfn.to_csv('sample.csv')
Is this what you need? If not, comment.
The problem is likely you didn't specify dayfirst=True
when converting your series to datetime
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
You can then sort by Date
and perform a groupby
operation as normal:
res = df.sort_values('Date')\
.groupby('Name')['Code']\
.agg(lambda x: ','.join(map(str, x)))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.