[英]How to rearrange the rows of a dataframe so that each row starts with the same string
I have this dataframe:我有这个 dataframe:
mp4 mp3 txt csv
123IT_DB1.mp4 123IT_DB1.mp3 123IT_DB1.txt 123IT_FDG_DB1.csv
NaN 123IT_DB1_2.mp3 NaN NaN
123IT_DB1_2.mp4 NaN NaN NaN
NaN NaN 123IT_DB_2.txt NaN
NaN NaN NaN 123IT_GUY_DB1_2.csv
234IT_DB1.mp4 NaN 234IT_DB1.txt 234IT_FDG_DB1.csv
234IT_DB1_2.mp4 234IT_DB1.mp3 NaN NaN
345IT_DB1.mp4 345IT_DB1.mp3 345IT_DB1.txt 345IT_FDG_DB1.csv
345IT_DB1_2.mp4 345IT_DB1_2.mp3 NaN NaN
345IT_DB1_3.mp4 NaN NaN NaN
456IT_DB1.mp4 456IT_DB1.mp3 456IT_DB1.txt 456_DB1.csv
I want to rearrange this dataframe so that all values that start with the same split at the first underscore are on the same row.我想重新排列这个 dataframe 以便在第一个下划线处以相同拆分开头的所有值都在同一行。 However, if there are more than one values that start with said string, then that row should only contain that element and the rest of the columns should be blank.
但是,如果有多个值以所述字符串开头,则该行应仅包含该元素,并且列的 rest 应为空白。 The resulting input should look like this:
结果输入应如下所示:
mp4 mp3 txt csv
123IT_DB1.mp4 123IT_DB1.mp3 123IT_DB1.txt 123IT_FDG_DB1.csv
123IT_DB1_2.mp4 123IT_DB1_2.mp3 123IT_DB_2.txt 123IT_2_DB1.csv
234IT_DB1.mp4 234IT_DB1.mp3 234IT_DB1.txt 234IT_FDG_DB1.csv
234IT_DB1_2.mp4 NaN NaN NaN
345IT_DB1.mp4 345IT_DB1.mp3 345IT_DB1.txt 345IT_FDG_DB1.csv
345IT_DB1_2.mp4 345IT_DB1_2.mp3 NaN NaN
345IT_DB1_3 NaN NaN NaN
456IT_DB1.mp4 456IT_DB1.mp3 456IT_DB1.txt 456_DB1.csv
As you can see, I can't just delete the NaN's because I need some of them to stay.如您所见,我不能只删除 NaN,因为我需要其中一些保留。 Any help would be much appreciated.
任何帮助将非常感激。
To get to your target到达你的目标
groupby()
these then use cumcount
to get an incremental number for each grouped file groupby()
然后使用cumcount
获取每个分组文件的增量编号import io
df = pd.read_csv(io.StringIO("""1 2 3 4
123IT_DB1.mp4 123IT_DB1.mp3 123IT_DB1.txt 123IT_FDG_DB1.csv
NaN 123IT_DB1_2.mp3 NaN NaN
123IT_DB1_2.mp4 NaN NaN NaN
NaN NaN 123IT_DB_2.txt NaN
NaN NaN NaN 123IT_GUY_DB1_2.csv
234IT_DB1.mp4 NaN 234IT_DB1.txt 234IT_FDG_DB1.csv
234IT_DB1_2.mp4 234IT_DB1.mp3 NaN NaN
345IT_DB1.mp4 345IT_DB1.mp3 345IT_DB1.txt 345IT_FDG_DB1.csv
345IT_DB1_2.mp4 345IT_DB1_2.mp3 NaN NaN
345IT_DB1_3.mp4 NaN NaN NaN
456IT_DB1.mp4 456IT_DB1.mp3 456IT_DB1.txt 456_DB1.csv"""), sep="\s+")
# change from a table to a list, create columns that are the head & tail
df2 = df.rename_axis("col", axis=1).unstack().reset_index(drop=True).dropna().apply(lambda s: {
"h":s.split(".")[0].split("_")[0],
"t":s.split(".")[0].split("_")[-1],
"o":s}).apply(pd.Series).sort_values(["h","t","o"])
# work out ordering of file, then transform back into a table
df2 = df2.assign(col=df2.groupby(["h","t"])["o"].transform("cumcount") + 1).set_index(["col","h","t"]).unstack(0).reset_index(drop=True).droplevel(0, axis=1)
1 ![]() |
2 ![]() |
3 ![]() |
4 ![]() |
|
---|---|---|---|---|
0 ![]() |
123IT_DB1_2.mp3 ![]() |
123IT_DB1_2.mp4 ![]() |
123IT_DB_2.txt ![]() |
123IT_GUY_DB1_2.csv ![]() |
1 ![]() |
123IT_DB1.mp3 ![]() |
123IT_DB1.mp4 ![]() |
123IT_DB1.txt ![]() |
123IT_FDG_DB1.csv ![]() |
2 ![]() |
234IT_DB1_2.mp4 ![]() |
nan![]() |
nan![]() |
nan![]() |
3 ![]() |
234IT_DB1.mp3 ![]() |
234IT_DB1.mp4 ![]() |
234IT_DB1.txt ![]() |
234IT_FDG_DB1.csv ![]() |
4 ![]() |
345IT_DB1_2.mp3 ![]() |
345IT_DB1_2.mp4 ![]() |
nan![]() |
nan![]() |
5 ![]() |
345IT_DB1_3.mp4 ![]() |
nan![]() |
nan![]() |
nan![]() |
6 ![]() |
345IT_DB1.mp3 ![]() |
345IT_DB1.mp4 ![]() |
345IT_DB1.txt ![]() |
345IT_FDG_DB1.csv ![]() |
7 ![]() |
456_DB1.csv ![]() |
nan![]() |
nan![]() |
nan![]() |
8 ![]() |
456IT_DB1.mp3 ![]() |
456IT_DB1.mp4 ![]() |
456IT_DB1.txt ![]() |
nan![]() |
# change from a table to a list, create columns that are the head
df2 = df.rename_axis("col", axis=1).unstack().reset_index(drop=True).dropna().apply(lambda s: {
"h":s.split(".")[0].split("_")[0],
"o":s}).apply(pd.Series).sort_values(["h","o"])
# work out ordering of file, then transform back into a table
df2 = df2.assign(col=df2.groupby(["h"])["o"].transform("cumcount") + 1).set_index(["col","h"]).unstack(0).reset_index(drop=True).droplevel(0, axis=1)
1 ![]() |
2 ![]() |
3 ![]() |
4 ![]() |
5 ![]() |
6 ![]() |
7 ![]() |
8 ![]() |
|
---|---|---|---|---|---|---|---|---|
0 ![]() |
123IT_DB1.mp3 ![]() |
123IT_DB1.mp4 ![]() |
123IT_DB1.txt ![]() |
123IT_DB1_2.mp3 ![]() |
123IT_DB1_2.mp4 ![]() |
123IT_DB_2.txt ![]() |
123IT_FDG_DB1.csv ![]() |
123IT_GUY_DB1_2.csv ![]() |
1 ![]() |
234IT_DB1.mp3 ![]() |
234IT_DB1.mp4 ![]() |
234IT_DB1.txt ![]() |
234IT_DB1_2.mp4 ![]() |
234IT_FDG_DB1.csv ![]() |
nan![]() |
nan![]() |
nan![]() |
2 ![]() |
345IT_DB1.mp3 ![]() |
345IT_DB1.mp4 ![]() |
345IT_DB1.txt ![]() |
345IT_DB1_2.mp3 ![]() |
345IT_DB1_2.mp4 ![]() |
345IT_DB1_3.mp4 ![]() |
345IT_FDG_DB1.csv ![]() |
nan![]() |
3 ![]() |
456_DB1.csv ![]() |
nan![]() |
nan![]() |
nan![]() |
nan![]() |
nan![]() |
nan![]() |
nan![]() |
4 ![]() |
456IT_DB1.mp3 ![]() |
456IT_DB1.mp4 ![]() |
456IT_DB1.txt ![]() |
nan![]() |
nan![]() |
nan![]() |
nan![]() |
nan![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.