在pandas DataFrame中加载逗号不均的文本文件

Question

15/09/2017, 10:20 - Jatin: Robin is the meeting on???
15/09/2017, 10:23 - Robin: No
15/09/2017, 10:23 - Robin: Thanks for the update
15/09/2017, 10:23 - Robin: can we expect it soon
15/09/2017, 10:24 - Jatin: it will be this weekend, most likely
15/09/2017, 10:24 - Jatin: kindly be prepared
15/09/2017, 10:24 - Robin: Sure no issues
15/09/2017, 10:26 - Jatin: good luck

我有一个看起来像这样的数据文件。 我打算将其加载到pandas数据框中。 问题是，如果我这样做

pd.read_csv("file.txt")

它抛出一个错误：

标记数据时出错。 C错误：第695行中应有2个字段，看到了3个

有人可以建议用熊猫做这件事的最简单方法吗？

Answer 1

它似乎是您尝试加载的watsapp电子邮件聊天文件。 我做了类似的工作，这是对我有用的代码。

atempt_load=pd.read_table("WhatsApp Chat with Panda.txt")
atempt_load.columns=["namesake"] # this will load the entire message ina single column and we are just giving it a convenient name, in order to use it later
name=[]
message=[]
for i in range(len(atempt_load)):
#now there are 20 characters in front of each line before a name appears,
# we can use this and use the following coed to separate it

    name.append((atempt_load["namesake"][i])[20:25]) #since both the names are of same length this will take out the string from 20:25 words
    message.append((atempt_load["namesake"][i])[26:len(atempt_load["namesake"][i])])

如果还需要时间戳，则可以执行类似的操作。

局限性：如果名称的长度不同，它将无法正常工作，我找到了解决方法，可以在将文件导入电子邮件之前更改聊天中的联系人姓名。

我相信有人会提供更动态更清洁的修复程序

Answer 2

或者，更明确地指定分隔符：

pd.read_csv('test.txt', names=['timestamp', 'text'], sep=' - ')

这将引发有关回退到python引擎的警告。 这只是警告，可能会降低非常大文件的性能。

在pandas DataFrame中加载逗号不均的文本文件

问题描述

2 个解决方案

解决方案1
0 已采纳 2018-06-18 17:03:25

解决方案2
0 2018-06-18 17:21:15

在pandas DataFrame中加载逗号不均的文本文件

问题描述

2 个解决方案

解决方案1 0 已采纳 2018-06-18 17:03:25

解决方案2 0 2018-06-18 17:21:15

解决方案1
0 已采纳 2018-06-18 17:03:25

解决方案2
0 2018-06-18 17:21:15