从pandas和python中的csv列复制一系列文本

Question

I have a csv file that I have imported into Pandas. 我有一个已导入Pandas的csv文件。 Now it has almost 45 columns of data and each column has more than 100 lines of information. 现在它有近45列数据，每列有100多行信息。 Now I need to select only the range of text that starts with a Date Stamp at the start and ends with a Date Stamp. 现在我只需要选择以开头的日期戳开头并以日期戳结束的文本范围。

Ex : 例如：

<GMT2015-09-01 00:03:29GMT> Hi Rajiv<GMT2015-09-01 19:08:15GMT> Hi Ram <GMT2015-09-01 19:08:15GMT>

So, in such structure I need to select only the first paragraph of datestamp to datestamp into a new data frame. 因此，在这种结构中，我只需要选择datestamp的第一段到datestamp到一个新的数据框。

Answer 1

I think you can split data in column Ticket Description by <> and then select output DataFrame by iloc . 我想你可以split列数据Ticket Description由<>然后选择输出DataFrame由iloc 。 Last you can strip starts and ends whitespaces. 最后你可以strip开始和结束空格。

Notice: It works nice if <> are only in start and end od each datetime. 注意：如果<>仅在每个日期时间的开始和结束时它都很好用。

import pandas as pd

df = pd.DataFrame({'Ticket Description':['<GMT2015-09-01 00:03:29GMT> Hi Rajiv<GMT2015-09-01 19:08:15GMT> Hi Ram <GMT2015-09-01 19:08:15GMT> ']})
print (df)
                                  Ticket Description
0  <GMT2015-09-01 00:03:29GMT> Hi Rajiv<GMT2015-0...

print (df['Ticket Description'].str.split(r'[<>]', expand=True).iloc[:, 2].str.strip())
0    Hi Rajiv
Name: 2, dtype: object

Answer 2

Regex and pandas apply should achieve what you want. 正则表达式和大熊猫适用应该达到你想要的。 I'm assuming u want only the text between the very first and second timestamp. 我假设你只想要第一个和第二个时间戳之间的文本。 I have created a dataframe with your message, except the second one starts with 2. >(.+?)< in the regex searches for the any number of characters surrounded by a > and < . 我已经用你的消息创建了一个数据帧，除了第二个以2. >(.+?)<在正则表达式中搜索由>和<包围的任意数量的字符。 The ? 的? makes it non greedy so it doesnt go from the first timestamp all the way to the last and stops at the first match. 使它不贪心，所以它不会从第一个时间戳一直到最后一个，并在第一个匹配时停止。

Sample code below: 示例代码如下：

import pandas as pd
import re

data = pd.DataFrame({"id":[1,2],"ticket_desc":[r"<GMT2015-09-01 00:03:29GMT> Hi Rajiv, As part of our job Request for your approval. Thanks <GMT2015-09-01 19:08:15GMT> Hi Ram, Request Approved Thanks <GMT2015-09-01 19:08:15GMT>.",r"<GMT2015-09-01 00:03:29GMT> 2Hi Rajiv, As part of our job Request for your approval. Thanks <GMT2015-09-01 19:08:15GMT> Hi Ram, Request Approved Thanks <GMT2015-09-01 19:08:15GMT>."]})
def finder(x):
    return re.findall(">(.+?)<",x)[0]
data["ticket_desc"] = data["ticket_desc"].apply(finder)
print data["ticket_desc"][0]
print data["ticket_desc"][1]

Output: 输出：

Hi Rajiv, As part of our job Request for your approval. Thanks 


 2Hi Rajiv, As part of our job Request for your approval. Thanks

从pandas和python中的csv列复制一系列文本

问题描述

2 个解决方案

解决方案1
0 已采纳 2016-06-17 07:02:22

解决方案2
0 2016-06-18 04:47:40

从pandas和python中的csv列复制一系列文本

问题描述

2 个解决方案

解决方案1 0 已采纳 2016-06-17 07:02:22

解决方案2 0 2016-06-18 04:47:40

解决方案1
0 已采纳 2016-06-17 07:02:22

解决方案2
0 2016-06-18 04:47:40