在Python Pandas的数据框上使用字符串方法吗？

Question

I have a dataframe with the following string format. 我有一个具有以下字符串格式的数据框。

data.description[4000]=['Conduit, PVC Utility Type DB 60 TC-6, 1-1/2"                                   LF   .050   $.86   $1.90   $2.76']

the string varies in size but I would like be broken up splitting the string at the ' LF ' substring. 字符串的大小不同，但我想在“ LF”子字符串处拆分字符串。 The desired output would be 所需的输出将是

data2=['Conduit, PVC Utility Type DB 60 TC-6,1 -1/2"','LF',.050,'$.86','$1.90','$2.76]

If I were to have a list of units 如果我要有单位清单

units=['CLF','LF','EA']

How could I search the dataframe string and break the string in the aforementioned format? 我如何搜索数据框字符串并以上述格式将字符串分隔？ It seems splitting with unit delimiter would kinda work but I would lose the units. 用单位定界符分割似乎可以解决问题，但我会丢失单位。 This gives me 2 strings which can be further split but it seems that it would require a row by row function. 这给了我2个可以进一步拆分的字符串，但似乎需要逐行函数。

Is there a better way to do this? 有一个更好的方法吗？

Answer 1

You can use the string method split directly on the column with the text: 您可以使用直接在带有文本的列上split的字符串方法：

df['text'].str.split('(CLF|LF|EA)')

You can use capturing parentheses to keep the delimiter 您可以使用捕获括号来保留定界符

Example: 例：

units ='(CLF|LF|EA)'
df =pd.DataFrame({'text':['aaaaaaa LF bbbbbbbb','123456 CLF 78910','!!!!!!!! EA @@@@@@@@@@']})
df.text.str.split(units)

returns: 收益：

0       [aaaaaaa , LF,  bbbbbbbb]
1          [123456 , CLF,  78910]
2    [!!!!!!!! , EA,  @@@@@@@@@@]
Name: text, dtype: object

在Python Pandas的数据框上使用字符串方法吗？

问题描述

1 个解决方案

解决方案1
1 2015-02-22 04:28:07

在Python Pandas的数据框上使用字符串方法吗？

问题描述

1 个解决方案

解决方案1 1 2015-02-22 04:28:07

解决方案1
1 2015-02-22 04:28:07