pandas.read_csv 跳過行直到找到某個字符串

Question

在R ，有一個名為fread的常用函數，用於讀取 tsv/csv/... 文件。 它有一個非常有用的參數，稱為skip ，它允許您指定一個字符串，然后將找到該字符串的行用作標題（如果您指定列名行的子字符串，則很有用）

我想知道 python 中是否有類似的函數，因為它看起來非常有用。

干杯!

Answer 1

我有時使用的一種技術（例如過濾錯誤數據，當pandas.read_csv()的其他出色功能似乎都無法解決手頭的情況時）是定義io.TextIOWrapper 。

在你的情況下，你可以寫：

class SkipUntilMatchWrapper(io.TextIOWrapper):
    def __init__(self, f, matcher, include_matching=False):
        super().__init__(f, line_buffering=True)
        self.f = f
        self.matcher = matcher
        self.include_matching = include_matching
        self.has_matched = False

    def read(self, size=None):
        while not self.has_matched:
            line = self.readline()
            if self.matcher(line):
                self.has_matched = True
                if self.include_matching:
                    return line
        return super().read(size)

讓我們用一個簡單的例子來試試：

# make an example
with open('sample.csv', 'w') as f:
    print('garbage 1', file=f)
    print('garbage 2', file=f)
    print('and now for some data', file=f)
    print('a,b,c', file=f)
    x = np.random.randint(0, 10, size=(5, 3))
    np.savetxt(f, x, fmt='%d', delimiter=',')

讀：

with open('sample.csv', 'rb') as f_orig:
    with SkipUntilMatchWrapper(f_orig, lambda s: 'a,b,c' in s, include_matching=True) as f:
        df = pd.read_csv(f)
>>> df
   a  b  c
0  2  7  8
1  7  3  3
2  3  6  9
3  0  6  0
4  4  0  9

其它的辦法：

with open('sample.csv', 'rb') as f_orig:
    with SkipUntilMatchWrapper(f_orig, lambda s: 'for some data' in s) as f:
        df = pd.read_csv(f)
>>> df
   a  b  c
0  2  7  8
1  7  3  3
2  3  6  9
3  0  6  0
4  4  0  9

pandas.read_csv 跳過行直到找到某個字符串

問題描述

1 個解決方案

解決方案1
1 2021-10-12 18:11:03

pandas.read_csv 跳過行直到找到某個字符串

問題描述

1 個解決方案

解決方案1 1 2021-10-12 18:11:03

解決方案1
1 2021-10-12 18:11:03