[英]Compare a row (pandas) with the next row using for loop, and if not the same get a value from a column
I have this pandas Dataframe:我有这个 pandas Dataframe:
full path name time
0 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:20
1 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:25
2 C:\Users\User\Desktop\Test1\Test2\1.txt 1.txt 10:30
3 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:40
4 C:\Users\User\Desktop\Test1\2.txt 2.txt 10:50
5 C:\Users\User\Desktop\Test1\Test2\1.txt 2.txt 10:60
I want to compare all rows with the same name and the same path and if the paths changes get time and folder moved to.我想比较具有相同名称和相同路径的所有行,如果路径发生更改,则将时间和文件夹移动到。 For example first row comparing with the second row has no changes in 'name' and 'full path' so it should pass.例如,第一行与第二行相比,“名称”和“完整路径”没有变化,所以它应该通过。 Then second row comparing the third row, name is the same but the path is changed, so I need to get the time for example the time of the third row "10:30 and the folder (Test2)" and put it in a new column.然后第二行比较第三行,名称相同但路径改变了,所以我需要获取时间例如第三行“10:30和文件夹(Test2)”的时间并将其放入新的柱子。
The desired output is:所需的 output 是:
full path name time time_when_path_changed
0 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:20
1 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:25
2 C:\Users\User\Desktop\Test1\Test2\1.txt 1.txt 10:30 10:30 - Test2
3 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:40 10:40 - Test1
4 C:\Users\User\Desktop\Test1\2.txt 2.txt 10:50
5 C:\Users\User\Desktop\Test1\Test2\1.txt 2.txt 10:60 10:60 - Test2
EDITED:编辑:
Yes, @erfan it worked perfectly for the problem I described but I wrote names in name in order like 1 1 1 but when I have a data frame like below it didn't work.是的,@erfan 它完美地解决了我描述的问题,但是我按照 1 1 1 的顺序写了名字,但是当我有一个像下面这样的数据框时,它就不起作用了。 I also made a modification in the desired output.我还在所需的 output 中进行了修改。 Do you have solution for this also.你也有这个解决方案。
Thanks in advance.提前致谢。
full path name time
0 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:20
1 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:25
2 C:\Users\User\Desktop\Test1\2.txt 2.txt 10:50
2 C:\Users\User\Desktop\Test1\Test2\1.txt 1.txt 10:30
3 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:40
5 C:\Users\User\Desktop\Test1\Test2\2.txt 2.txt 10:60
Desired output:所需的 output:
full path name time moved to "Test2" moved to "Test1"
0 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:20
1 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:25
2 C:\Users\User\Desktop\Test1\2.txt 2.txt 10:50
3 C:\Users\User\Desktop\Test1\Test2\1.txt 1.txt 10:30 10:30
5 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:40 10:40
5 C:\Users\User\Desktop\Test1\Test2\2.txt 2.txt 10:60 10:60
We can use the following logic:我们可以使用以下逻辑:
full path
is not equal to row before如果full path
不等于之前的行name
is equal to row before (same groups) name
等于之前的行(相同的组)time
+ deepest path如果第 1 点和第 2 点为真,我们得到time
+ 最深路径m1 = df["full path"].ne(df["full path"].shift(1, fill_value=df["full path"].iloc[0]))
m2 = df["name"].eq(df["name"].shift(fill_value=df["name"].iloc[0]))
folder = df["full path"].str.rsplit("\\", 2).str[-2]
df["time_when_path_changed"] = np.where(m1 & m2, df["time"] + " - " + folder, "")
full path name time \
0 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:20
1 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:25
2 C:\Users\User\Desktop\Test1\Test2\1.txt 1.txt 10:30
3 C:\Users\User\Desktop\Test1\1.txt 1.txt 10:40
4 C:\Users\User\Desktop\Test1\2.txt 2.txt 10:50
5 C:\Users\User\Desktop\Test1\Test2\1.txt 2.txt 10:60
time_when_path_changed
0
1
2 10:30 - Test2
3 10:40 - Test1
4
5 10:60 - Test2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.