[英]Complex delimite columns in Pandas read_csv
I'm trying to read some log files using Pandas, where the columns are delimited by whitespace, and some columns consist of single quoted strings with whitespace (eg 'string '
). 我正在尝试使用Pandas读取一些日志文件,其中的列由空格分隔,并且某些列由带引号的带空格的字符串组成(例如
'string '
)。 I am having a hard time reading these files with read_csv
. 我很难用
read_csv
读取这些文件。 For example (using some dummy data): 例如(使用一些虚拟数据):
import pandas as pd
from io import StringIO
data = StringIO("""\
1 2 'asdf ' 3
4 5 'asdfg ' 4
""")
columns = ['a','b','c','d']
df = pd.read_csv(data, delim_whitespace=True, names=columns)
For the first row, this results in columns 1
, 2
, 'asdf
, '
, 3
, where I would prefer to have it as 1
, 2
, asdf
, 3
. 对于第一行,这导致列
1
, 2
, 'asdf
, '
, 3
,在这里我更愿意把它当作1
, 2
, asdf
, 3
。 The behavior makes total sense, but I can't find a way to make read_csv
parse such files "correctly" (as I want it). 这种行为是完全合理的,但是我无法找到一种方法来使
read_csv
“正确”解析此类文件(如我所愿)。
Is this at all possible? 这是可能吗?
You have to use the quotechar
argument while parsing from read_csv
从
read_csv
解析时,必须使用quotechar
参数
df = pd.read_csv(filename, quotechar = "'", delim_whitespace=True, names=columns)
Although this will result in column c
having extra whitespaces. 尽管这将导致列
c
具有额外的空格。 You can get rid of those using 您可以摆脱那些使用
df.c = df.c.str.strip()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.