[英]Copy a range of text from csv column in pandas and python
I have a csv file that I have imported into Pandas. 我有一个已导入Pandas的csv文件。 Now it has almost 45 columns of data and each column has more than 100 lines of information. 现在它有近45列数据,每列有100多行信息。 Now I need to select only the range of text that starts with a Date Stamp at the start and ends with a Date Stamp. 现在我只需要选择以开头的日期戳开头并以日期戳结束的文本范围。
Ex : 例如:
<GMT2015-09-01 00:03:29GMT> Hi Rajiv<GMT2015-09-01 19:08:15GMT> Hi Ram <GMT2015-09-01 19:08:15GMT>
So, in such structure I need to select only the first paragraph of datestamp to datestamp into a new data frame. 因此,在这种结构中,我只需要选择datestamp的第一段到datestamp到一个新的数据框。
I think you can split
data in column Ticket Description
by <>
and then select output DataFrame
by iloc
. 我想你可以split
列数据Ticket Description
由<>
然后选择输出DataFrame
由iloc
。 Last you can strip
starts and ends whitespaces. 最后你可以strip
开始和结束空格。
Notice: It works nice if <>
are only in start and end od each datetime. 注意:如果<>
仅在每个日期时间的开始和结束时它都很好用。
import pandas as pd
df = pd.DataFrame({'Ticket Description':['<GMT2015-09-01 00:03:29GMT> Hi Rajiv<GMT2015-09-01 19:08:15GMT> Hi Ram <GMT2015-09-01 19:08:15GMT> ']})
print (df)
Ticket Description
0 <GMT2015-09-01 00:03:29GMT> Hi Rajiv<GMT2015-0...
print (df['Ticket Description'].str.split(r'[<>]', expand=True).iloc[:, 2].str.strip())
0 Hi Rajiv
Name: 2, dtype: object
Regex and pandas apply should achieve what you want. 正则表达式和大熊猫适用应该达到你想要的。 I'm assuming u want only the text between the very first and second timestamp. 我假设你只想要第一个和第二个时间戳之间的文本。 I have created a dataframe with your message, except the second one starts with 2. >(.+?)<
in the regex searches for the any number of characters surrounded by a >
and <
. 我已经用你的消息创建了一个数据帧,除了第二个以2. >(.+?)<
在正则表达式中搜索由>
和<
包围的任意数量的字符。 The ?
的?
makes it non greedy so it doesnt go from the first timestamp all the way to the last and stops at the first match. 使它不贪心,所以它不会从第一个时间戳一直到最后一个,并在第一个匹配时停止。
Sample code below: 示例代码如下:
import pandas as pd
import re
data = pd.DataFrame({"id":[1,2],"ticket_desc":[r"<GMT2015-09-01 00:03:29GMT> Hi Rajiv, As part of our job Request for your approval. Thanks <GMT2015-09-01 19:08:15GMT> Hi Ram, Request Approved Thanks <GMT2015-09-01 19:08:15GMT>.",r"<GMT2015-09-01 00:03:29GMT> 2Hi Rajiv, As part of our job Request for your approval. Thanks <GMT2015-09-01 19:08:15GMT> Hi Ram, Request Approved Thanks <GMT2015-09-01 19:08:15GMT>."]})
def finder(x):
return re.findall(">(.+?)<",x)[0]
data["ticket_desc"] = data["ticket_desc"].apply(finder)
print data["ticket_desc"][0]
print data["ticket_desc"][1]
Output: 输出:
Hi Rajiv, As part of our job Request for your approval. Thanks
2Hi Rajiv, As part of our job Request for your approval. Thanks
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.