Pandas DataFrame：在我想要保留的内容之前和之后从字符串中删除不需要的部分

Question

In my data_cleaner dataset I have the column (feature) 'Project ID'. 在我的data_cleaner数据集中，我有列（特征）'项目ID'。 This identifies the project and it has a format 'code/YEAR/code'. 这标识了项目，它的格式为“code / YEAR / code”。 I'm only interested in the project's year so I want to get rid of everything before the first / and everything after the second /. 我只对这个项目的一年感兴趣，所以我想在第一个之前摆脱一切/在第二个之后摆脱一切/。

Project ID  
AGPG/2013/1 
AGPG/2013/10
AGPG/2013/12
AGPG/2013/18
AGPG/2013/19

The closest I got was to strip what's before with 我得到的最接近的是剥离之前的东西

data_cleaner['Project ID'] = data_cleaner['Project ID'].str.strip("AGPG")

(but down the line there are other letters so this is not escalable) （但是在线下还有其他字母，所以这不可升级）

And then I did 然后我做了

data_cleaner['Project ID'] = data_cleaner['Project ID'].str.strip('/')

This got rid of the first bit, I can't manage to get rid of what's after the year. 这摆脱了第一位，我无法摆脱一年后的情况。

Project ID  
2013/1  
2013/10
2013/12
2013/18
2013/19

I read this post but didn't help me Pandas DataFrame: remove unwanted parts from strings in a column 我读过这篇文章，但没有帮助我Pandas DataFrame：从列中的字符串中删除不需要的部分

Answer 1

I believe need split and select second lists : 我认为需要split并选择第二个lists ：

data_cleaner['Project ID'] = data_cleaner['Project ID'].str.split('/').str[1]

Or extract by regex - /(\\d{4})/ means get numeric with length 4 between // : 或者通过正则表达式extract - /(\\d{4})/表示在//之间获取长度为4数字：

data_cleaner['Project ID'] = data_cleaner['Project ID'].str.extract('/(\d{4})/', expand=False)

print (data_cleaner)
  Project ID
0       2013
1       2013
2       2013
3       2013
4       2013

Pandas DataFrame：在我想要保留的内容之前和之后从字符串中删除不需要的部分

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-04-10 12:53:20

Pandas DataFrame：在我想要保留的内容之前和之后从字符串中删除不需要的部分

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-04-10 12:53:20

解决方案1
2 已采纳 2018-04-10 12:53:20