![](/img/trans.png)
[英]Pandas create new column in df2 from another df1 column2 if a value in df1 column1 matches value in a list
[英]Add value to DF if column value matches value in list of another DF
我有一个DF1
:
+---------------+
| colName|
+---------------+
| a|
| m|
| f|
| o|
+---------------+
另一个DF2
:
+---------------+
| col|
+---------------+
| [a,b,b,c,d]|
| [e,f,g,h]|
| [i,j,k]|
| [l,m,n,o,p]|
+---------------+
如果存储在列表DF2.col
有在元素DF1.colName
一个新的数据框(或DF2
)应该是这样的:
+---------------+---------------+
| col| bool|
+---------------+---------------+
| [a,b,c,d]| 1| #Since "a" was in `DF1.colName`
| [e,f,g,h]| 1| #Since "f" was in `DF1.colName`
| [i,j,k]| 0| #Since no element was not in `DF1.colName`
| [l,m,n,o,p]| 1| #Since "f" was in `DF1.colName`
+---------------+---------------+
我以前曾想过使用UserDefinedFunction和 Pandas 函数isIn()但无济于事。 任何能帮助我完成这件事的东西都将不胜感激。 谢谢你。
您可以将值转换为set
s 并使用isdisjoint
:
s = set(DF1.colName)
DF2['bool'] = DF2['col'].apply(lambda x: not set(x).isdisjoint(s)).astype(int)
print (DF2)
col bool
0 [a, b, b, c, d] 1
1 [e, f, g, h] 1
2 [i, j, k] 0
3 [l, m, n, o, p] 1
或者使用交集,将False
转换为 bool 到空集,然后转换为整数, True, False
到1,0
映射:
s = set(DF1.colName)
DF2['bool'] = DF2['col'].apply(lambda x: bool(set(s).intersection(x))).astype(int)
print (DF2)
col bool
0 [a, b, b, c, d] 1
1 [e, f, g, h] 1
2 [i, j, k] 0
3 [l, m, n, o, p] 1
尝试这个
df2['bool'] = df2.col.apply(lambda x: any(df1.colName.isin(x))).astype(int)
print(df2)
输出:
col bool
0 [a, b, b, c, d] 1
1 [e, f, g, h] 1
2 [i, j, k] 0
3 [l, m, n, o, p] 1
与pyspark
,可以用检查array_intersect
,然后使用与case语句确定数组的大小when+otherwise
;
arr = df1.select("colName").rdd.flatMap(lambda x:x).collect()
size = F.size(F.array_intersect("col",F.array([F.lit(i) for i in arr])))
df2.withColumn("t",F.when(size>0,1).otherwise(0)).show()
+---------------+---+
| col| t|
+---------------+---+
|[a, b, b, c, d]| 1|
| [e, f, g, h]| 1|
| [i, j, k]| 0|
|[l, m, n, o, p]| 1|
+---------------+---+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.