[英]Concatenate two columns excepting strings from a list_pandas
對於列“單詞”中的字符串不存在於“句子”列中的情況,我將兩列連接起來。 我的代碼是:
def check(row):
df['sentence'] = df['sentence'].astype(str)
df['words'] = df['words'].astype(str)
left = row['sentence'].split()
right = row['words'].split()
unmatched = []
for y in left:
word = "".join([x for x in y.lower() if x not in string.punctuation])
if word not in [v.lower() for v in right]:
unmatched.append(y)
return " ".join(unmatched)
mask = df['type'] == 'Is there a match with the Words?'
df.loc[mask, 'new'] = df.loc[mask, :].apply(check, axis=1)
df['new'] = np.where(c, df['new'] + ' ' + df['words'], df['new'])
df['new'] = df['new'].str.replace('nan', '')
df['new'] = df['new'].fillna("")
此外,如果在“單詞”列中我有此列表中存在的字符串,我想限制每行的串聯:
restricted = ['not present', 'for sale', 'unknown']
這是一個結果應該如何的示例
words sentence output
0 unknown This is a new paint This is a new paint
1 brown This is a new item This is a new item brown
2 for sale The product is new The product is new
上面代碼給出的 Output 是:
output
This is a new paint unknown
This is a new item brown
The product is new for sale
鑒於:
words sentence
0 unknown This is a new paint
1 brown This is a new item
2 for sale The product is new
正在做:
restricted = ['not present', 'for sale', 'unknown']
mask = df.words.str.contains('|'.join(restricted))
df['output'] = df.sentence.where(mask, df.sentence + ' ' + df.words)
print(df)
Output:
words sentence output
0 unknown This is a new paint This is a new paint
1 brown This is a new item This is a new item brown
2 for sale The product is new The product is new
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.