根據行值刪除 dataframe 中的行

Question

我有一些 word 文檔在讀入數據幀之前變成了字符串。 每個 dataframe 只有一列寬但多行長。 它們看起來都像這樣：

0| this document is a survey
1| please fill in fully
2| Send back to address on the bottom of the sheet
etc....

每個 dataframe 的開頭都是胡言亂語，我不需要，所以我需要刪除包含值“問題”的行之前的所有行。 但是它並不位於每個 dataframe 的相同索引上，所以我不能只刪除前 20 行，因為它會對每個 dataframe 產生不同的影響。

我如何刪除每個 dataframe 中“問題”之前的所有行

Answer 1

假設您只需要在第一次出現“問題”后保留行，那么這種方法應該可以解決問題：

虛擬數據和設置

import pandas as pd

data = {
    'x': [
          'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'
    ]
}

df = pd.DataFrame(data)
df

Output：

解決方案

在這里，我將保留第一次出現以字母“f”開頭的條目后的所有行：

df[df.x.str.startswith('f').cumsum() > 0]

Output：

解釋

該解決方案依賴於兩個主要的pandas功能：

pd.DataFrame().str.startswith True為任何以給定字符串開頭的單元格獲取一個 boolean 數組（本例中為“f”，但“問題”也可以）。
cumsum()它將 boolean 值轉換為整數，因此確保第一次出現之后的所有行都大於零。

通過使用這些索引原始dataframe，我們得到了解決方案。

Answer 2

另一種選擇是使用str.contains() 。 使用玩具 pandas 系列：

import pandas as pd

# create dataframe
d = ["nothing", "target is here", "help", "more_words"]
df = pd.Series(data=d)

如果您想在一個單詞之后保留所有行（包括），請說“這里”，您可以通過以下方式執行此操作：

# check rows to determine whether they contain "here"
keyword_bool = df.str.contains("here", regex=False) 
# return index as int
idx = keyword_bool[keyword_bool==True].index[0] 

# slice dataframe
df = df.iloc[idx:]

根據行值刪除 dataframe 中的行

問題描述

2 個解決方案

解決方案1
0 已采納 2020-08-12 09:16:08

虛擬數據和設置

解決方案

解釋

解決方案2
0 2020-08-12 09:36:20

根據行值刪除 dataframe 中的行

問題描述

2 個解決方案

解決方案1 0 已采納 2020-08-12 09:16:08

虛擬數據和設置

解決方案

解釋

解決方案2 0 2020-08-12 09:36:20

解決方案1
0 已采納 2020-08-12 09:16:08

解決方案2
0 2020-08-12 09:36:20