熊猫中的复杂定界符read_csv

Question

I'm trying to read some log files using Pandas, where the columns are delimited by whitespace, and some columns consist of single quoted strings with whitespace (eg 'string ' ). 我正在尝试使用Pandas读取一些日志文件，其中的列由空格分隔，并且某些列由带引号的带空格的字符串组成（例如'string ' ）。 I am having a hard time reading these files with read_csv . 我很难用read_csv读取这些文件。 For example (using some dummy data): 例如（使用一些虚拟数据）：

import pandas as pd
from io import StringIO

data = StringIO("""\
  1   2   'asdf    ' 3
  4   5   'asdfg   ' 4  
""")

columns = ['a','b','c','d']
df = pd.read_csv(data, delim_whitespace=True, names=columns)

For the first row, this results in columns 1 , 2 , 'asdf , ' , 3 , where I would prefer to have it as 1 , 2 , asdf , 3 . 对于第一行，这导致列1 ， 2 ， 'asdf ， ' ， 3 ，在这里我更愿意把它当作1 ， 2 ， asdf ， 3 。 The behavior makes total sense, but I can't find a way to make read_csv parse such files "correctly" (as I want it). 这种行为是完全合理的，但是我无法找到一种方法来使read_csv “正确”解析此类文件（如我所愿）。

Is this at all possible? 这是可能吗？

Answer 1

You have to use the quotechar argument while parsing from read_csv 从read_csv解析时，必须使用quotechar参数

df = pd.read_csv(filename, quotechar = "'", delim_whitespace=True, names=columns)

Although this will result in column c having extra whitespaces. 尽管这将导致列c具有额外的空格。 You can get rid of those using 您可以摆脱那些使用

df.c = df.c.str.strip()

熊猫中的复杂定界符read_csv

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-01-18 19:21:55

熊猫中的复杂定界符read_csv

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-01-18 19:21:55

解决方案1
3 已采纳 2018-01-18 19:21:55