熊貓-在ANY列中選擇包含某個正則表達式的數據框的行

Question

早上好

給定一個包含文本數據的數據框，例如：

df = pandas.DataFrame({
    'a':['first', 'second', 'third'], 
    'b':['null', 'third', 'first']})

我可以通過以下方式選擇包含單詞'first'的行：

df.a.str.contains('first') | df.b.str.contains('first')

這將產生

0     True
1    False
2     True
dtype: bool

要將相同的條件應用於幾十個我可以使用isin的列，但是如果我需要用正則表達式替換'first'似乎不起作用，例如regex = '(?=.*first)(?=.*second)' 。

有沒有更多的pythonic和優雅的方法可以在多列上進行選擇，而不僅僅是將多個單列df.<column_name>.str.contains(regex)條件用| 在代碼中？ 謝謝

Answer 1

我們為什么不在整個數據框架上使用applymap 。 這與處理列有所不同，但可以使if-else條件更容易應用於（我希望）：

In [62]: l = ['first', 'second']

In [63]: df
Out[63]: 
        a      b
0   first   null
1  second  third
2   third  first

In [64]: df.appl
df.apply     df.applymap  

In [64]: df.applymap(lambda v: True if v in l else False)
Out[64]: 
       a      b
0   True  False
1   True  False
2  False   True

更新：

（感謝@Pythonic進行此更新）

我們可以在applymap提供正則表達式， applymap所示：

regex = '(^fi)'
df.applymap(lambda v: bool(re.search(regex, v)))
## -- End pasted text --
Out[38]: 
       a      b
0   True  False
1  False  False
2  False   True

以下示例啟用了re.flags：

In [44]: df = pandas.DataFrame({
   ....:     'a':['First', 'second', 'NULL'], 
   ....:     'b':['null', 'third', 'first']})

In [45]: regex = re.compile('(^fi)', flags=re.IGNORECASE)

In [46]: df.applymap(lambda v: bool(re.search(regex_ignore_case, v)))
Out[46]: 
       a      b
0   True  False
1  False  False
2  False   True

熊貓-在ANY列中選擇包含某個正則表達式的數據框的行

問題描述

1 個解決方案

解決方案1
2 已采納 2015-05-09 13:43:56

更新：

熊貓-在ANY列中選擇包含某個正則表達式的數據框的行

問題描述

1 個解決方案

解決方案1 2 已采納 2015-05-09 13:43:56

更新：

解決方案1
2 已采納 2015-05-09 13:43:56