[英]Pandas: Remove Rows Where the Value of One Column Appears on Any Row in Another
Example data is: 示例数据是:
000000008,2, 1,000000010
000000009,1, 1,000000011
000000010,1, 1,000000008
000000011,2, 1,000000032
000000012,3, 1,000000009
000000013,2, 1,000000108
You can see that some values in the first column also appear in the fourth column. 您可以看到第一列中的某些值也出现在第四列中。 I want to remove those rows, where the value in the fourth column also appears on any row in the first column.
我要删除那些行,其中第四列中的值也会出现在第一列中的任何行上。
Therefore, in this example, following rows should be removed: 因此,在此示例中,应删除以下行:
000000008,2, 1,000000010
000000010,1, 1,000000008
000000012,3, 1,000000009
000000009,1, 1,000000011
Code starting point: 代码起点:
import numpy as np
import pandas as pd
T = u'''000000008,2, 1,000000010
000000009,1, 1,000000011
000000010,1, 1,000000008
000000011,2, 1,000000032
000000012,3, 1,000000009
000000013,2, 1,000000108'''
from io import StringIO
df = pd.read_csv(StringIO(T), header=None)
print(df)
IIUC, from your description, you can do: IIUC,根据您的描述,您可以执行以下操作:
df[~df.iloc[:,3].isin(df.iloc[:,0])]
Which returns: 哪个返回:
0 1 2 3
3 11 2 1 32
5 13 2 1 108
Contrary to your desired output, this removes the row with 000000011
, but not the one with 000000108
, because 000000011
is found in both columns, but 000000108
is not 相反,你需要的输出,这消除了一行
000000011
,而不是一个与000000108
,因为000000011
两列中发现,但000000108
不
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.