根据行值删除 dataframe 中的行

Question

我有一些 word 文档在读入数据帧之前变成了字符串。 每个 dataframe 只有一列宽但多行长。 它们看起来都像这样：

0| this document is a survey
1| please fill in fully
2| Send back to address on the bottom of the sheet
etc....

每个 dataframe 的开头都是胡言乱语，我不需要，所以我需要删除包含值“问题”的行之前的所有行。 但是它并不位于每个 dataframe 的相同索引上，所以我不能只删除前 20 行，因为它会对每个 dataframe 产生不同的影响。

我如何删除每个 dataframe 中“问题”之前的所有行

Answer 1

假设您只需要在第一次出现“问题”后保留行，那么这种方法应该可以解决问题：

虚拟数据和设置

import pandas as pd

data = {
    'x': [
          'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'
    ]
}

df = pd.DataFrame(data)
df

Output：

解决方案

在这里，我将保留第一次出现以字母“f”开头的条目后的所有行：

df[df.x.str.startswith('f').cumsum() > 0]

Output：

解释

该解决方案依赖于两个主要的pandas功能：

pd.DataFrame().str.startswith True为任何以给定字符串开头的单元格获取一个 boolean 数组（本例中为“f”，但“问题”也可以）。
cumsum()它将 boolean 值转换为整数，因此确保第一次出现之后的所有行都大于零。

通过使用这些索引原始dataframe，我们得到了解决方案。

Answer 2

另一种选择是使用str.contains() 。 使用玩具 pandas 系列：

import pandas as pd

# create dataframe
d = ["nothing", "target is here", "help", "more_words"]
df = pd.Series(data=d)

如果您想在一个单词之后保留所有行（包括），请说“这里”，您可以通过以下方式执行此操作：

# check rows to determine whether they contain "here"
keyword_bool = df.str.contains("here", regex=False) 
# return index as int
idx = keyword_bool[keyword_bool==True].index[0] 

# slice dataframe
df = df.iloc[idx:]

根据行值删除 dataframe 中的行

问题描述

2 个解决方案

解决方案1
0 已采纳 2020-08-12 09:16:08

虚拟数据和设置

解决方案

解释

解决方案2
0 2020-08-12 09:36:20

根据行值删除 dataframe 中的行

问题描述

2 个解决方案

解决方案1 0 已采纳 2020-08-12 09:16:08

虚拟数据和设置

解决方案

解释

解决方案2 0 2020-08-12 09:36:20

解决方案1
0 已采纳 2020-08-12 09:16:08

解决方案2
0 2020-08-12 09:36:20