简体   繁体   English

从 pandas dataframe 列中删除特定的单词字符串(前缀和后缀)

[英]Remove specific words string (both prefix and postfix) from a pandas dataframe column

I want to keep the latest rows with the same ID and also the rows that match certain column values.我想保留具有相同 ID 的最新行以及与某些列值匹配的行。

Sample Input:
ID                  Address
1                   PALLABI- F #1st Floor, SEC #10, Pallabi, MIRPUR
2                   H#22(2nd floor),Extended Rupnagar Area, Pallabi Mirpur, Dhaka.
3                   Uttar khan-F #3rd floor, Kuripara, Dhaka
4                   F-1,H-43,Chalabon,D.khan, Uttarkhan

PREFIX字首

ID 1 has a word PALLABI- i want to remove that part. ID 1有一个词PALLABI-我想删除该部分。 Similar goes for ID 3 where Uttar khan- should be removed. ID 3也是如此,应该删除Uttar khan- After removal Uttar khan or PALLABI part should add to the postfix of the string only if the prefix doesn't contain any of these words.删除Uttar khanPALLABI部分后,仅当前缀不包含任何这些单词时,才应添加到字符串的后缀。

POSTFIX后缀

Another part is to remove Dhaka at the end of the string.另一部分是删除字符串末尾的Dhaka

Output:
ID                  Address
1                   F #1st Floor, SEC #10, Pallabi, MIRPUR
2                   H#22(2nd floor),Extended Rupnagar Area, Pallabi Mirpur
3                   F #3rd floor, Kuripara, Uttar khan
4                   F-1,H-43,Chalabon,D.khan, Uttarkhan

Thanks in advance.提前致谢。

You can apply a function to the column;您可以将 function 应用于该列;

def my_function(string): 
    # if your word is the first in the string
    if 'your_word' == string.split()[0]:
        # replace it with your requirement (it can be nothing (''))
        string = string.replace('what/you/need/to/replace', 'replacement') 

    return string

df['column'] = df['column'].apply(my_function)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM