带有行继续的Python Pandas read_table

Question

Is it possible for pandas to read a text file that contains line continuation? 大熊猫有可能读取包含换行符的文本文件吗？

For example, say I have a text file, 'read_table.txt', that looks like this: 例如，假设我有一个文本文件“ read_table.txt”，看起来像这样：

col1, col2
a, a string
b, a very long \
   string
c, another string

If I invoke read_table on the file I get this: 如果我在文件上调用read_table得到以下信息：

>>> pandas.read_table('read_table.txt', delimiter=',')
        col1             col2
0          a         a string
1          b    a very long \
2     string              NaN
3          c   another string

I'd like to get this: 我想得到这个：

        col1                  col2
0          a              a string
1          b    a very long string
2          c        another string

Answer 1

Use escapechar : 使用escapechar ：

df = pd.read_table('in.txt', delimiter=',',escapechar="\\")

That will include the newline as DSM pointed out, you can remove the newlines with df.col2 = df.col2.str.replace("\\n\\s*","") 如DSM所指出的，其中将包括换行符，您可以使用df.col2 = df.col2.str.replace("\\n\\s*","")删除换行符

Answer 2

I couldn't get the escapechar option to work as Padraic suggested, probably because I'm stuck on a Windows box at the moment (tell-tale \\r ): 我无法按照Padraic的建议使用escapechar选项，可能是因为此刻我被困在Windows框上（告诉\\r ）：

        col1             col2
0          a         a string
1          b   a very long \r
2     string              NaN
3          c   another string

What I did get to work correctly was a regex pass: 我确实能够正常工作的是正则表达式：

import pandas as pd
import re
import StringIO    # python 2 on this machine, embarrassingly

with open('read_table.txt') as f_in:
    file_string = f_in.read()

subbed_str = re.sub('\\\\\n\s*', '', file_string)

df = pd.read_table(StringIO.StringIO(subbed_str), delimiter=',')

This yielded your desired output: 这产生了您想要的输出：

  col1                 col2
0    a             a string
1    b   a very long string
2    c       another string

Very cool question. 很酷的问题。 Thanks for sharing it! 多谢分享！

带有行继续的Python Pandas read_table

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-08-19 18:30:21

解决方案2
1 2015-08-19 19:17:23

带有行继续的Python Pandas read_table

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-08-19 18:30:21

解决方案2 1 2015-08-19 19:17:23

解决方案1
2 已采纳 2015-08-19 18:30:21

解决方案2
1 2015-08-19 19:17:23