[英]how to convert multiple rows into single row for same id using pandas
I have text file, in below format and it has unique IDs and each unique IDs have four rows, now I need to convert into single row for particular ID.我有以下格式的文本文件,它具有唯一 ID,每个唯一 ID 有四行,现在我需要将特定 ID 转换为单行。 let say if have 8 rows and the output should give 2 rows.
假设如果有 8 行,output 应该有 2 行。 And it doesn't have header which I need do using pandas!
而且它没有 header ,我需要使用熊猫!
xyz,name,,,12345
2nd street,add,,,12345
xyx@mail.com,email,,,12345
575xxx5678,contact,,,12345
output output
xyz,name,,,12345,2nd street,add,,,12345,xyx@mail.com,email,,,12345,575xxx5678,contact,,,12345
Consider unique ID as 12345, can help me to resolve this.将唯一 ID 视为 12345,可以帮助我解决此问题。 It would be great.
那会很好。 Thanks in Advance.
提前致谢。
Suppose you have this file.csv
:假设你有这个
file.csv
:
www,contact,,,99999
xyz,name,,,12345
2nd street,add,,,12345
xyx@mail.com,email,,,12345
575xxx5678,contact,,,12345
qqq,contact,,,99999
To read it to pandas:要将其读取到 pandas:
df = pd.read_csv("file.csv", names=["col1", "col2", "col3", "col4", "ID"])
print(df)
Prints:印刷:
col1 col2 col3 col4 ID
0 www contact NaN NaN 99999
1 xyz name NaN NaN 12345
2 2nd street add NaN NaN 12345
3 xyx@mail.com email NaN NaN 12345
4 575xxx5678 contact NaN NaN 12345
5 qqq contact NaN NaN 99999
Then to convert it to your desired output:然后将其转换为您想要的 output:
x = (
df.assign(ID2=df["ID"])
.groupby("ID")
.agg(list)
.apply(lambda x: [v for l in zip(*x) for v in l], axis=1)
)
pd.DataFrame(x.tolist()).to_csv("output.txt", sep=",", header=None, index=None)
This creates output.txt
:这将创建
output.txt
:
xyz,name,,,12345,2nd street,add,,,12345,xyx@mail.com,email,,,12345.0,575xxx5678,contact,,,12345.0
www,contact,,,99999,qqq,contact,,,99999,,,,,,,,,,
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.