当行中存在与另一行相同的值时为行分配标签

Question

I am doing some data cleaning and I came across this problem.我正在做一些数据清理，我遇到了这个问题。 An example of my dataframe (some entries are missing) is:我的 dataframe （缺少一些条目）的示例是：

A一个	B乙	C C
1 1	7 7
1 1		8 8
1 1		9 9
1 1
2 2	2 2	5 5
2 2		5 5
		3 3
4 4	5 5	9 9
	5 5

and my expected output dataframe is:我预期的 output dataframe 是：

A一个	B乙	C C	Label Label
1 1	7 7		0 0
1 1		8 8	0 0
1 1		9 9	0 0
1 1			0 0
2 2	2 2	5 5	1 1
2 2		5 5	1 1
		3 3	2 2
4 4	5 5	9 9	0 0
	5 5		0 0

Are there ways in pandas/dplyr to get this output? pandas/dplyr 有没有办法得到这个 output？

Edit: For example, the value 1 appears in the first 4 rows of A so these rows should have the same label.编辑：例如，值 1 出现在 A 的前 4 行中，因此这些行应该具有相同的 label。 The 3rd row and 2nd last row has value 9 in column C so they should also have the same label.第三行和倒数第二行在 C 列中的值为 9，因此它们也应该具有相同的 label。 The last 2 rows have values 5 in column B so they should also be the same label.最后两行在 B 列中的值为 5，因此它们也应该是相同的 label。 The 3rd last row does not have any values in the row that matches any value in the same column of any row so it is a unique label.最后第三行没有任何值与任何行的同一列中的任何值匹配，因此它是唯一的 label。

Answer 1

This will accomplish the task.这将完成任务。 In your example, there is a slight conflict with your comment.在您的示例中，与您的评论略有冲突。 Since rows 8 and 9 have label 0 and contain 5's in at least one column, rows 5 and 6 should retroactively be changed to label 0, because they contain 5's in at least one column as well.由于第 8 行和第 9 行的 label 为 0 并且在至少一列中包含 5，因此第 5 行和第 6 行应追溯更改为 label 0，因为它们在至少一列中也包含 5。

text="A B   C
1   7   NA
1   NA  8
1   NA  9
1   NA NA   
2   2   5
2   NA  5
3 NA NA
4   5   9
5 NA NA"
df=read.table(text=text, header=TRUE)
my_maps=list()
my_maps[[1]]=unique(unlist(df[1,]))
my_maps[[1]]=my_maps[[1]][!is.na(my_maps[[1]])]
for (i in 2:nrow(df)) {
  currow=df[i,]
  currow=unique(unlist(currow))
  currow=currow[!is.na(currow)]
  boo=TRUE
  for (k in 1:length(my_maps)) {
    if (length(intersect(my_maps[[k]], currow))>0) {
      my_maps[[k]]=union(my_maps[[k]], currow)
      boo=FALSE
    }
  }
  if (boo) {
    my_maps[[length(my_maps)+1]]=currow
  }
}

i=1
while (i < length(my_maps)) {
  j=i+1
  while (j <= length(my_maps)) {
    if (length(intersect(my_maps[[i]], my_maps[[j]]))>0) {
      my_maps[[i]]=union(my_maps[[i]], my_maps[[j]])
      my_maps[[j]]=NULL
    }
    j=j+1
  }
  i=i+1
}

label=c()
for (i in 1:nrow(df)) {
  for (j in 1:length(my_maps)) {
    if (length((intersect(my_maps[[j]], unique(unlist(df[i,])))))>0) {
      label[i]=j
      break
    }
  }
}

df=mutate(df, label=label-1)

  A  B  C label
1 1  7 NA     0
2 1 NA  8     0
3 1 NA  9     0
4 1 NA NA     0
5 2  2  5     0
6 2 NA  5     0
7 3 NA NA     1
8 4  5  9     0
9 5 NA NA     0

当行中存在与另一行相同的值时为行分配标签

问题描述

1 个解决方案

解决方案1
0 2021-03-06 15:08:59

A一个	B乙	C C	Label Label
1 1	7 7		0 0
1 1		8 8	0 0
1 1		9 9	0 0
1 1			0 0
2 2	2 2	5 5	1 1
2 2		5 5	1 1
		3 3	2 2
4 4	5 5	9 9	0 0
	5 5		0 0

A一个	B乙	C C	Label Label
1 1	7 7		0 0
1 1		8 8	0 0
1 1		9 9	0 0
1 1			0 0
2 2	2 2	5 5	1 1
2 2		5 5	1 1
		3 3	2 2
4 4	5 5	9 9	0 0
	5 5		0 0

当行中存在与另一行相同的值时为行分配标签

问题描述

1 个解决方案

解决方案1 0 2021-03-06 15:08:59

解决方案1
0 2021-03-06 15:08:59

A一个	B乙	C C	Label Label
1 1	7 7		0 0
1 1		8 8	0 0
1 1		9 9	0 0
1 1			0 0
2 2	2 2	5 5	1 1
2 2		5 5	1 1
		3 3	2 2
4 4	5 5	9 9	0 0
	5 5		0 0