計算包含兩個精確字符串的行數

Question

這是我的 df1

df1 = pd.DataFrame(
    [
        ["apple,orange,milk"],
        ["orange,watermelon,apple"],
        ["milk,banana,apple"]
    ], 
    columns=['fruits']
)

df1

0 apple,orange,milk
1 orange,watermelon,apple
2 milk,banana,apple

這是我的 df2

df2 = pd.DataFrame(["apple","orange","banana"], columns=['fruits'])

df2

0 apple
1 orange
2 banana

我想找到兩個確切字符串一起出現的行數。 例如，當蘋果和牛奶一起出現在行中時，計算行數這是我的代碼

for i,row in df2.iterrows():
    for j,rows in df1.iterrows():
        b = (rows.str.contains('(?:\s|\S|[,;])milk(?:\s|\S|[,;])') & rows.str.contains('(?:\s|\S|[,;])+df2.iloc[i]+(?:\s|\S|[,;])')).sum()
        if b>0:
            c=c+1
    print(c)

我從這里得到的輸出總是 0

0
0
0

輸出應該是：

2
1
1

Answer 1

首先，您的 DataFrame 構造函數不起作用，因為它拼寫錯誤並且提供了錯誤的輸入。 更正為：

df1 = pd.DataFrame(["apple,orange,milk", "orange,watermelon,apple", "milk,banana,apple"])
df2 = pd.DataFrame(["apple", "orange", "banana"])

其次，你的問題不清楚。 如果我要重新表述它，我會這樣說：“我想在一組搜索詞中找到兩個搜索詞出現在同一單元格中的次數”。 然而，我不是更清楚的 100pc。 那說...

創建一個使用 string 的函數，該函數包含兩個特定參數（以及用於識別應該搜索的位置的必要項）：

def find2(df, col, s1, s2):
    return sum(df[col].str.contains(s1) & df[col].str.contains(s2))

它的作用是詢問整個列，是否有任何行包含搜索詞s1或s2 。 然后將兩者相交並將結果數相加。 執行：

df2[0].apply(lambda i: find2(df1, 0, 'milk', i))
Out[10]: 
0    2
1    1
2    1
Name: 0, dtype: int64

Answer 2

首先，正如@ifly6 所發布的，您的問題需要修復數據幀的創建。

其次，我假設（與其他答案不同）您想要查找df2定義的每個df1行出現多少個連續字符串。 一個解決方案可以是首先從df2創建可能的連續字符串，然后遍歷df1以查看是否有任何匹配以及匹配包含多少單詞。 例如，

import pandas as pd
import itertools

def contiguous_indices(xs):
    n = len(xs)
    indices = list(range(n+1))
    for i,j in itertools.combinations(indices,2):
        yield xs[i:j]

df1=pd.DataFrame(["apple,orange,milk","orange,watermelon,apple","milk,banana,apple"])
df2=pd.DataFrame(["apple","orange","banana"])

# Define the list of possible contiguous strings in df2
s_list = []
for indx_list in contiguous_indices(range(df2[0].size)):
    s = ''
    for indx in indx_list:
        s += df2[0][indx] + ','
    s_list.append(s[:-1])
print(s_list) 
# ['apple', 'apple,orange', 'apple,orange,banana', 'orange', 'orange,banana', 'banana']

# Iterate through df1 and count max number of contiguous strings matches
for i, s1 in df1.iterrows():
    c_max = 0
    s_save = ''
    for s in s_list:
        if s in s1[0] and len(s.split(',')) > c_max:
            c_max = len(s.split(','))
            s_save = s
    print(i, c_max, s_save)

輸出將是：

0 2 apple,orange
1 1 apple
2 1 apple

計算包含兩個精確字符串的行數

問題描述

2 個解決方案

解決方案1
2 2020-02-26 15:29:52

解決方案2
0 2020-02-26 16:10:33

計算包含兩個精確字符串的行數

問題描述

2 個解決方案

解決方案1 2 2020-02-26 15:29:52

解決方案2 0 2020-02-26 16:10:33

解決方案1
2 2020-02-26 15:29:52

解決方案2
0 2020-02-26 16:10:33