简体   繁体   English

在Python Pandas的数据框上使用字符串方法吗?

[英]Using string methods on dataframes in Python Pandas?

I have a dataframe with the following string format. 我有一个具有以下字符串格式的数据框。

data.description[4000]=['Conduit, PVC Utility Type DB 60 TC-6, 1-1/2"                                   LF   .050   $.86   $1.90   $2.76']

the string varies in size but I would like be broken up splitting the string at the ' LF ' substring. 字符串的大小不同,但我想在“ LF”子字符串处拆分字符串。 The desired output would be 所需的输出将是

data2=['Conduit, PVC Utility Type DB 60 TC-6,1 -1/2"','LF',.050,'$.86','$1.90','$2.76]

If I were to have a list of units 如果我要有单位清单

units=['CLF','LF','EA']

How could I search the dataframe string and break the string in the aforementioned format? 我如何搜索数据框字符串并以上述格式将字符串分隔? It seems splitting with unit delimiter would kinda work but I would lose the units. 用单位定界符分割似乎可以解决问题,但我会丢失单位。 This gives me 2 strings which can be further split but it seems that it would require a row by row function. 这给了我2个可以进一步拆分的字符串,但似乎需要逐行函数。

Is there a better way to do this? 有一个更好的方法吗?

You can use the string method split directly on the column with the text: 您可以使用直接在带有文本的列上split的字符串方法:

df['text'].str.split('(CLF|LF|EA)')

You can use capturing parentheses to keep the delimiter 您可以使用捕获括号来保留定界符

Example: 例:

units ='(CLF|LF|EA)'
df =pd.DataFrame({'text':['aaaaaaa LF bbbbbbbb','123456 CLF 78910','!!!!!!!! EA @@@@@@@@@@']})
df.text.str.split(units)

returns: 收益:

0       [aaaaaaa , LF,  bbbbbbbb]
1          [123456 , CLF,  78910]
2    [!!!!!!!! , EA,  @@@@@@@@@@]
Name: text, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM