[英]Python Pandas read_table with line continuation
Is it possible for pandas to read a text file that contains line continuation? 大熊猫有可能读取包含换行符的文本文件吗?
For example, say I have a text file, 'read_table.txt', that looks like this: 例如,假设我有一个文本文件“ read_table.txt”,看起来像这样:
col1, col2
a, a string
b, a very long \
string
c, another string
If I invoke read_table
on the file I get this: 如果我在文件上调用read_table
得到以下信息:
>>> pandas.read_table('read_table.txt', delimiter=',')
col1 col2
0 a a string
1 b a very long \
2 string NaN
3 c another string
I'd like to get this: 我想得到这个:
col1 col2
0 a a string
1 b a very long string
2 c another string
Use escapechar
: 使用escapechar
:
df = pd.read_table('in.txt', delimiter=',',escapechar="\\")
That will include the newline as DSM pointed out, you can remove the newlines with df.col2 = df.col2.str.replace("\\n\\s*","")
如DSM所指出的,其中将包括换行符,您可以使用df.col2 = df.col2.str.replace("\\n\\s*","")
删除换行符
I couldn't get the escapechar
option to work as Padraic suggested, probably because I'm stuck on a Windows box at the moment (tell-tale \\r
): 我无法按照Padraic的建议使用escapechar
选项,可能是因为此刻我被困在Windows框上(告诉\\r
):
col1 col2
0 a a string
1 b a very long \r
2 string NaN
3 c another string
What I did get to work correctly was a regex pass: 我确实能够正常工作的是正则表达式:
import pandas as pd
import re
import StringIO # python 2 on this machine, embarrassingly
with open('read_table.txt') as f_in:
file_string = f_in.read()
subbed_str = re.sub('\\\\\n\s*', '', file_string)
df = pd.read_table(StringIO.StringIO(subbed_str), delimiter=',')
This yielded your desired output: 这产生了您想要的输出:
col1 col2
0 a a string
1 b a very long string
2 c another string
Very cool question. 很酷的问题。 Thanks for sharing it! 多谢分享!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.