根據表中的字符串刪除行

Question

基於部分字符串刪除行的代碼不起作用。

非常簡單的代碼，它運行良好但不會刪除我想要的行。

pdf 中的原始表如下所示：

化工	價值	單元	類型
氟化物	0.23	微克/升	實驗室
汞	0.15	微克/升	實驗室
長鏈聚合物的總和	0.33
短鏈聚合物的部分和	0.40

我做了什么：

import csv 
import tabula

dfs = tabula.read _pdf("Test.pdf", pages= 'all')
file = "Test.pdf"
tables = tabula.read_pdf(file, pages=2, stream=True, multiple_tables=True)

table1 = tables[1]
table1.drop('Unit', axis=1, inplace=True) 
table1.drop('Type', axis=1, inplace=True)
discard = ['sum','Sum']
table1[~table1.Chemical.str.contains('|'.join(discard))]
print(table1)
table1.to_csv('test.csv')

結果是它刪除了我不想要的 2 列，所以沒關系。 但它並沒有刪除其中帶有“sum”或“Sum”字樣的行。 有什么見解嗎？

Answer 1

你很接近。 您確實刪除了行，但沒有保存結果。

import pandas as pd

example = {'Chemical': ['Fluoride', 'Mercury', 'Sum of Long Chained Polymers',
                'Partialsum of Short Chained Polymers'], 
            'Value': [0.23, 0.15, 0.33, 0.4], 
            'Unit': ['ug/L', 'ug/L', '', ''], 
            'Type': ['Lab', 'Lab', '', '']}

table1 = pd.DataFrame(example)
table1.drop('Unit', axis=1, inplace=True)
table1.drop('Type', axis=1, inplace=True)
discard = ['sum','Sum']
table1 = table1[~table1.Chemical.str.contains('|'.join(discard))]
print(table1)

Answer 2

您可以使用帶有參數case=False的pd.Series.str.contains來忽略大小寫：

此外，這不是法律，但通常被認為是使用inplace=True的不良做法......因為它在某種程度上會導致像你正在經歷的那樣的混亂。

鑒於df ：

                               Chemical  Value  Unit  Type
0                              Fluoride   0.23  ug/L   Lab
1                               Mercury   0.15  ug/L   Lab
2          Sum of Long Chained Polymers   0.33   NaN   NaN
3  Partialsum of Short Chained Polymers   0.40   NaN   NaN

正在做：

df = (df.drop(['Unit', 'Type'], axis=1)
        .loc[~df.Chemical.str.contains('sum', case=False)])

Output：

   Chemical  Value
0  Fluoride   0.23
1   Mercury   0.15

根據表中的字符串刪除行

問題描述

2 個解決方案

解決方案1
0 2022-11-20 19:00:15

解決方案2
0 2022-11-21 00:14:14

根據表中的字符串刪除行

問題描述

2 個解決方案

解決方案1 0 2022-11-20 19:00:15

解決方案2 0 2022-11-21 00:14:14

解決方案1
0 2022-11-20 19:00:15

解決方案2
0 2022-11-21 00:14:14