[英]Removing Timestamp from Pandas Column
I'm working with the following DataFrame column containing Date |TimeStamp | Name | Message
我正在使用以下 DataFrame 列,其中包含
Date |TimeStamp | Name | Message
Date |TimeStamp | Name | Message
Date |TimeStamp | Name | Message
as a string字符串形式的
Date |TimeStamp | Name | Message
59770 [08/10/18, 5:57:43 PM] Luke: Message
59771 [08/10/18, 5:57:48 PM] Luke: Message
59772 [08/10/18, 5:57:50 PM] Luke: Message
I'm trying to remove the timestamp from the column and my expected output is:我正在尝试从列中删除时间戳,我预期的 output 是:
59770 Luke: Message
59771 Luke: Message
59772 Luke: Message
I tried using我尝试使用
import re
df.iloc[:,0] = list(map(lambda x : re.sub(".*\d:\d\d\s[a|p]m","", x)[12:],df.iloc[:,0]))
But since the length of each string is different this method makes makes it worse.但是由于每个字符串的长度不同,这种方法会使情况变得更糟。
Please Advise.请指教。
You can use the extract string function.您可以使用提取字符串 function。 A couple of options depending on quite how you want to have the results.
几个选项取决于您希望获得结果的方式。
df = pd.DataFrame({'text':['5:57:43 PM] Luke: Message',
'5:57:48 PM] Luke: Message',
'5:57:50 PM] Luke: Message']})
import re
import pandas as pd
df['text'].str.extract(r'\s*(.{10})](.*)')
0 1
0 5:57:43 PM Luke: Message
1 5:57:48 PM Luke: Message
2 5:57:50 PM Luke: Message
Or if you want the name and messages separate:或者,如果您希望将名称和消息分开:
df['text'].str.extract(r'\s*(.{10})](.*):(.*)')
0 1 2
0 5:57:43 PM Luke Message
1 5:57:48 PM Luke Message
2 5:57:50 PM Luke Message
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.