[英]How to remove everything after the last occurence of a character in a Dataframe?
I have a dataframe DF
that looks like this (This is a sample): 我有一个看起来像这样的数据帧
DF
(这是一个示例):
EQ1 EQ2 EQ3
0 Apple.fruit Oranage.eatable.fruit NaN
1 Pear.eatable.fruit Banana.fruit NaN
2 Orange.fruit Tomato.eatable Potato.eatable.vegetable
3 Kiwi.eatable Pear.fruit Cabbage.vegetable
<And so on.. It is a large Dataframe>
I would like to remove everything AFTER the LAST occurrence of the dot .
我希望在最后一次出现点之后删除所有内容
.
in every element of DF
and save it under a different name,say df_temp
. 在
DF
每个元素中,并以不同的名称保存,例如df_temp
。
Desired ouput: 期望的输出:
EQ1 EQ2 EQ3
0 Apple Oranage.eatable NaN
1 Pear.eatable Banana NaN
2 Orange Tomato Potato.eatable
3 Kiwi Pear Cabbage
<And so on>
This is what I tried: df_temp=".".join(DF.split(".")[:-1])
. 这就是我尝试过的:
df_temp=".".join(DF.split(".")[:-1])
。
Unfortunately this seems to work only with strings and not Dataframe. 不幸的是,这似乎只适用于字符串而不是Dataframe。 Do I have to tweak this line a bit to achieve what I want?
我是否需要稍微调整一下这条线来实现我想要的? Someone please help!
有人请帮忙!
You could do: 你可以这样做:
df_temp = df.apply(lambda x: x.str.split('.').str[:-1].str.join('.'))
output: 输出:
EQ1 EQ2 EQ3
0 Apple Oranage.eatable NaN
1 Pear.eatable Banana NaN
2 Orange Tomato Potato.eatable
3 Kiwi Pear Cabbage
see the string method docs 请参阅字符串方法docs
You could use extract. 你可以使用提取物。
df_temp = df.apply(lambda x: x.str.extract(r'.*\.([^\.]*)', expand=False))
df_new = df.apply(lambda x: x.str.extract(r'(.*)\.[^\.]*', expand=False))
df_temp
looks like: df_temp
看起来像:
EQ1 EQ2 EQ3
0 fruit fruit NaN
1 fruit fruit NaN
2 fruit eatable vegetable
3 eatable fruit vegetable
df_new
looks like: df_new
看起来像:
EQ1 EQ2 EQ3
0 Apple Oranage.eatable NaN
1 Pear.eatable Banana NaN
2 Orange Tomato Potato.eatable
3 Kiwi Pear Cabbage
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.