[英]Using string methods on dataframes in Python Pandas?
I have a dataframe with the following string format. 我有一个具有以下字符串格式的数据框。
data.description[4000]=['Conduit, PVC Utility Type DB 60 TC-6, 1-1/2" LF .050 $.86 $1.90 $2.76']
the string varies in size but I would like be broken up splitting the string at the ' LF ' substring. 字符串的大小不同,但我想在“ LF”子字符串处拆分字符串。 The desired output would be
所需的输出将是
data2=['Conduit, PVC Utility Type DB 60 TC-6,1 -1/2"','LF',.050,'$.86','$1.90','$2.76]
If I were to have a list of units 如果我要有单位清单
units=['CLF','LF','EA']
How could I search the dataframe string and break the string in the aforementioned format? 我如何搜索数据框字符串并以上述格式将字符串分隔? It seems splitting with unit delimiter would kinda work but I would lose the units.
用单位定界符分割似乎可以解决问题,但我会丢失单位。 This gives me 2 strings which can be further split but it seems that it would require a row by row function.
这给了我2个可以进一步拆分的字符串,但似乎需要逐行函数。
Is there a better way to do this? 有一个更好的方法吗?
You can use the string method split
directly on the column with the text: 您可以使用直接在带有文本的列上
split
的字符串方法:
df['text'].str.split('(CLF|LF|EA)')
You can use capturing parentheses to keep the delimiter 您可以使用捕获括号来保留定界符
Example: 例:
units ='(CLF|LF|EA)'
df =pd.DataFrame({'text':['aaaaaaa LF bbbbbbbb','123456 CLF 78910','!!!!!!!! EA @@@@@@@@@@']})
df.text.str.split(units)
returns: 收益:
0 [aaaaaaa , LF, bbbbbbbb]
1 [123456 , CLF, 78910]
2 [!!!!!!!! , EA, @@@@@@@@@@]
Name: text, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.