简体   繁体   English

Python Pandas DataFrame正则表达式从对象中提取子字符串

[英]Python pandas dataframe regex to extract substring from object

I created a dataframe in python using pandas module from a csv file. 我使用csv文件中的pandas模块在python中创建了一个数据框。 Pandas by default converted string into object type. 熊猫默认将字符串转换为对象类型。 Now from that string, I wanted to create another column which I am trying to create using regex. 现在,从该字符串开始,我想创建另一列,我正尝试使用正则表达式创建该列。 However, because the column is object I am getting error 但是,由于列是对象,所以我得到了错误

data = pd.read_csv(r'Desktop\train.csv')
desig = re.search(r'(\w+), (\w+). (\w+)',data['Name']).group(1)

TypeError: expected string or buffer TypeError:预期的字符串或缓冲区

How can I extract the portion from the object? 如何从对象中提取部分?

Thanks. 谢谢。

You want to use the vectorised operations contained in the str methods of the dataframe: 您要使用数据框的str方法中包含的矢量化操作:

data['desig'] = data['Name'].str.extract(r'(\w+), (\w+). (\w+)')

This will actually return a dataframe with three columns corresponding to the three groups. 实际上,这将返回一个数据帧,其中包含与三组相对应的三列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM