[英]Remove all after specific characters in a dataframe Using Python
有一个特定字符的列表,我们需要删除它后面的所有字符。
输入数据:
text_dir
abc School, Uk
xyz College, USA
Pqr University, Berlin
Output 需要的值:
text_dir
abc School
xyz College
Pqr University
代码片段:
spl_character=['School', 'college', 'university']
df['text_dir'] = df['text_dir'].str.split(spl_character).str[0]
Gives Error:
TypeError: unhashable type: 'list'
国际大学联盟:
pat = f'(?i)^(.*)({"|".join(spl_character)}).*$'
df.text_dir.str.replace(pat, r'\1\2', regex=True)
0 abc School
1 xyz College
2 Pqr University
Name: text_dir, dtype: object
我修改了你的输入并尝试解决这个问题,我使用了正则表达式来解决这个问题。
import pandas as pd
import re
text_dir = ["abc School, Uk", "xyz College, USA", "Pqr University, Berlin"]
spl_character=['School,', 'College,', 'University,']
df = pd.DataFrame()
df['text_dir'] = text_dir
final_list = []
for item in df.text_dir:
for character in spl_character:
if j in i.split(' '):
val_re = re.compile("^(.*)"+character+"")
val_match = val_re.search(item)
final_list.append(val_match.group())
df['text_dir'] = final_list
Output:
text_dir
0 abc School,
1 xyz College,
2 Pqr University,
import pandas as pd
text_dir = ["abc School, Uk", "xyz College, USA", "Pqr University, Berlin"]
df = pd.DataFrame()
df['text_dir'] = text_dir
text_dir
0 abc School, Uk
1 xyz College, USA
2 Pqr University, Berlin
使用 lambda function
# Reformat values for column "text_dir" using a lambda function
df['text_dir'] = df['text_dir'].apply(lambda x: x.split(',')[0])
Output
text_dir
0 abc School
1 xyz College
2 Pqr University
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.