檢查字符串是否在另一列熊貓中

Question

下面是我的 DF

df= pd.DataFrame({'col1': ['[7]', '[30]', '[0]', '[7]'], 'col2': ['[0%, 7%]', '[30%]', '[30%, 7%]', '[7%]']})

col1    col2    
[7]     [0%, 7%]
[30]    [30%]
[0]     [30%, 7%]
[7]     [7%]

目的是檢查 col1 值是否包含在下面的 col2 中是我嘗試過的

df['test'] = df.apply(lambda x: str(x.col1) in str(x.col2), axis=1)

以下是預期的輸出

col1    col2       col3
[7]     [0%, 7%]   True
[30]    [30%]      True
[0]     [30%, 7%]  False
[7]     [7%]       True

Answer 1

使用Series.str.extractall獲取數字，通過Series.unstack重塑，因此可以通過DataFrame.isin與DataFrame.any進行比較：

df['test'] = (df['col2'].str.extractall('(\d+)')[0].unstack()
                        .isin(df['col1'].str.strip('[]'))
                        .any(axis=1))
print (df)
   col1       col2   test
0   [7]   [0%, 7%]   True
1  [30]      [30%]   True
2   [0]  [30%, 7%]  False
3   [7]       [7%]   True

Answer 2

您可以提取兩列和join上的數字，然后使用eval + groupby + any檢查每個 id 是否至少有一個匹配項：

(df['col2'].str.extractall('(?P<col2>\d+)').droplevel(1)
   .join(df['col1'].str[1:-1])
   .eval('col2 == col1')
   .groupby(level=0).any()
)

輸出：

0     True
1     True
2    False
3     True

Answer 3

一種方法：

import ast

# convert to integer list
col2_lst = df["col2"].str.replace("%", "").apply(ast.literal_eval)

# check list containment
df["col3"] = [all(bi in a for bi in b)  for a, b in zip(col2_lst, df["col1"].apply( ast.literal_eval)) ]

print(df)

輸出

   col1       col2   col3
0   [7]   [0%, 7%]   True
1  [30]      [30%]   True
2   [0]  [30%, 7%]  False
3   [7]       [7%]   True

Answer 4

您還可以用單詞邊界\\b替換方括號並使用re.search像

import re
#...
df.apply(lambda x: bool(re.search(x['col1'].replace("[",r"\b").replace("]",r"\b"), x['col2'])), axis=1)
# => 0     True
#    1     True
#    2    False
#    3     True
#    dtype: bool

這會起作用，因為\\b7\\b會在[0%, 7%]找到匹配項，因為7既不前面也不后面跟字母、數字或下划線。 在[30%, 7%]中找不到任何匹配項，因為\\b0\\b不匹配數字后的零（此處為3 ）。

檢查字符串是否在另一列熊貓中

問題描述

4 個解決方案

解決方案1
1 2021-11-08 11:03:02

解決方案2
1 2021-11-08 11:06:08

解決方案3
1 2021-11-08 11:07:23

解決方案4
1 2021-11-08 11:13:44

檢查字符串是否在另一列熊貓中

問題描述

4 個解決方案

解決方案1 1 2021-11-08 11:03:02

解決方案2 1 2021-11-08 11:06:08

解決方案3 1 2021-11-08 11:07:23

解決方案4 1 2021-11-08 11:13:44

解決方案1
1 2021-11-08 11:03:02

解決方案2
1 2021-11-08 11:06:08

解決方案3
1 2021-11-08 11:07:23

解決方案4
1 2021-11-08 11:13:44