[英]python pandas iterating rows of two different columns and returning the repeated one once and corresponding values of repeated values in single row
for instance, I have a .csv file with 1000s of rows like below:例如,我有一个包含 1000 行的 .csv 文件,如下所示:
year,name
1992,Alex
1992,Anna
1993,Max
1993,Bob
1993,Tom
so on...很快...
I want my output to be:我希望我的输出是:
year name
1992 Alex, Anna
1993 Max, Bob, Tom
this looks simple but I'm not able to make the corresponding rows in a single row appended by a comma ','这看起来很简单,但我无法在单行中添加相应的行,并附加一个逗号“,”
You can achieve this by using groupby and aggregation.您可以通过使用 groupby 和聚合来实现这一点。 Try the below code:
试试下面的代码:
df = df.groupby("year").agg({
"year":"first",
"name":", ".join
})
You can save the dataframe values to csv by ignoring index您可以通过忽略索引将数据帧值保存到 csv
df.to_csv("output.csv",index=False)
This may help you这可能会帮助你
df = df.groupby('year')['name'].unique().reset_index()
df['name'] = df['name'].apply(lambda x: ', '.join(x))
Output:输出:
year name
0 1992 Alex, Anna
1 1993 Max, Bob, Tom
How about this one?这个怎么样?
import pandas as pd
x = pd.DataFrame.from_dict({'year':['1992', '1992', '1993', '1993', '1993'],
'name':['ALEX', 'ANNA', 'MAX', 'BOB', 'TOM'],
'col':range(5)})
print (x)
a = x.groupby('year').agg({'name': lambda x: tuple(set(x)), 'col':'sum'})
print (a)
Result:结果:
name col
year
1992 (ALEX, ANNA) 1
1993 (BOB, TOM, MAX) 9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.