简体   繁体   English

熊猫:如何从另一列中删除出现在某个单词之前的字符串中的单词

[英]Pandas: How to remove words in string which appear before a certain word from another column

I have a large csv file with a column containing strings.我有一个包含字符串的列的大型 csv 文件。 At the beginning of these strings there are a set of id numbers which appear in another column as below.在这些字符串的开头有一组 id 编号,它们出现在下面的另一列中。

0      Home /buy /York /Warehouse /P000166770Ou...             P000166770
1      Home /buy /York /Plot /P000165923A plot of la...        P000165923
2      Home /buy /London /Commercial /P000165504A str...       P000165504
                         ...                        
804    Brand new apartment on the first floor, situat...       P000185616

I want to remove all text which appears before the ID number so here we would get:我想删除出现在 ID 号之前的所有文本,所以在这里我们会得到:

0      Ou...             
1      A plot of la...        
2      A str...       
                         ...                        
804    Brand new apartment on the first floor, situat...       

I tried something like我尝试了类似的东西

df['column_one'].str.split(df['column_two'])

and

df['column_one'].str.replace(df['column_two'],'')

You could replace the pattern using regex as follows:您可以使用正则表达式替换模式,如下所示:

>> my_pattern = "^(Alpha|Beta|QA|Prod)\s[A-Z0-9]{7}"
>> my_series = pd.Series(['Alpha P17089OText starts here'])
>> my_series.str.replace(my_pattern, '', regex=True)
0    Text starts here

There is a bit of work to be done to determine the nature of your pattern.需要做一些工作来确定模式的性质。 I would suggest experimenting a bit with https://regex101.com/我建议尝试一下https://regex101.com/

To extend your split() idea:扩展您的split()想法:

df.apply(lambda x: x['column_one'].split(x['column_two'])[1], axis=1)

0    Text starts here

I managed to get it to work using:我设法让它工作使用:

df.apply(lambda x: x['column1'].split(x['column2'])[1] if x['column2'] in x['column1'] else x['column1'], axis=1)

This also works when the ID is not in the description.当 ID 不在描述中时,这也有效。 Thanks for the help!谢谢您的帮助!

Here is one way to do it, by applying regex to each of the row based on the code这是一种方法,通过根据代码将正则表达式应用于每一行

import re
def ext(row):
    mch = re.findall(r"{0}(.*)".format(row['code']), row['txt'])
    if len(mch) >0:
        rtn = mch.pop()
    else:
        rtn = row['txt']
    return rtn

df['ext'] = df.apply(ext, axis=1) 
df
0                                               Ou...
1                                     A plot of la...
2                                            A str...
3    Brand new apartment on the first floor situat...


    x   txt                                                 code            ext
0   0   Home /buy /York /Warehouse / P000166770 Ou...        P000166770     Ou...
1   1   Home /buy /York /Plot /P000165923A plot of la...     P000165923     A plot of la...
2   2   Home /buy /London /Commercial /P000165504A str...    P000165504     A str...
3   804     Brand new apartment on the first floor situat... P000185616     Brand new apartment on the first floor situat...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何删除 pandas 数据框列中与另一列中的单词匹配的单词 - How to remove words in pandas data frame column which match with words in another column 如何从 pandas dataframe 中的列中删除某些字符串 - How to remove certain string from column in pandas dataframe 删除出现在其他列Pandas中的单词 - Remove words that appear in other column, Pandas 如何根据 Pandas 中具有三种不同条件的另一列中的单词检查一列是否包含单词? - How to check if a column has a word based on words from another column with three different conditions in Pandas? 从字符串中删除某些单词 - Remove certain words from string 如何从数据集中的行中删除某些单词 - Pandas - How to remove certain words from rows in a dataset - Pandas 从字符串中删除某些单词 - Remove certain word from string 计算一列中有多少字符出现在另一列中(熊猫) - Count how many characters from a column appear in another column (pandas) Pandas 正则表达式:将名称与以单词或字符串开头并以某些单词结尾的字符串分开 - Pandas Regex: Separate name from string that starts with word or start of string, and ends in certain words 如果一列的字符串包含 pandas dataframe 中另一列的单词,如何删除整行 - How to drop entire row if string of one column contains the word from another column in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM