简体   繁体   English

带有行继续的Python Pandas read_table

[英]Python Pandas read_table with line continuation

Is it possible for pandas to read a text file that contains line continuation? 大熊猫有可能读取包含换行符的文本文件吗?

For example, say I have a text file, 'read_table.txt', that looks like this: 例如,假设我有一个文本文件“ read_table.txt”,看起来像这样:

col1, col2
a, a string
b, a very long \
   string
c, another string

If I invoke read_table on the file I get this: 如果我在文件上调用read_table得到以下信息:

>>> pandas.read_table('read_table.txt', delimiter=',')
        col1             col2
0          a         a string
1          b    a very long \
2     string              NaN
3          c   another string

I'd like to get this: 我想得到这个:

        col1                  col2
0          a              a string
1          b    a very long string
2          c        another string

Use escapechar : 使用escapechar

df = pd.read_table('in.txt', delimiter=',',escapechar="\\")

That will include the newline as DSM pointed out, you can remove the newlines with df.col2 = df.col2.str.replace("\\n\\s*","") 如DSM所指出的,其中将包括换行符,您可以使用df.col2 = df.col2.str.replace("\\n\\s*","")删除换行符

I couldn't get the escapechar option to work as Padraic suggested, probably because I'm stuck on a Windows box at the moment (tell-tale \\r ): 我无法按照Padraic的建议使用escapechar选项,可能是因为此刻我被困在Windows框上(告诉\\r ):

        col1             col2
0          a         a string
1          b   a very long \r
2     string              NaN
3          c   another string

What I did get to work correctly was a regex pass: 我确实能够正常工作的是正则表达式:

import pandas as pd
import re
import StringIO    # python 2 on this machine, embarrassingly

with open('read_table.txt') as f_in:
    file_string = f_in.read()

subbed_str = re.sub('\\\\\n\s*', '', file_string)

df = pd.read_table(StringIO.StringIO(subbed_str), delimiter=',')

This yielded your desired output: 这产生了您想要的输出:

  col1                 col2
0    a             a string
1    b   a very long string
2    c       another string

Very cool question. 很酷的问题。 Thanks for sharing it! 多谢分享!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM