[英]Efficient way to replace values in one column using another column in pandas
[英]Efficient way to replace values in column if part of those values are in predefined lists in pandas
所以我實際上已經解決了這個問題,但我這樣做的方式可能不是最有效的。
對於我的數據庫中的一列 - Industry
- 我想替換值。 如果一個值包含“技術”、“技術”或類似的詞,我想用“技術”這個詞替換那個值。
我使用apply
遵循下面的基本算法,它基本上循環通過預定義的列表(例如science
)並檢查當前Industry
單元格中是否存在任何值,如果存在則替換它們。
然后它對下一個列表執行相同的操作。 到目前為止,我只有兩個列表,但一旦完成,我可能會有十幾個。
def industry_convert(row):
science = ["research", "science", "scientific", "scientist", "academia", "education", "academic"]
tech = ["technology", "tech", "software"]
for v in science:
if v.lower() in row.Industry.lower():
row.Industry = "Research, Science, & Education"
for v in tech:
if v.lower() in row.Industry.lower():
row.Industry = "Technology"
return row
df = df.apply(industry_convert, axis = 1)
我只是想知道這是否是最好的方法,或者是否有更pythonic
或pandas
的方法?
編輯:
這是一些行業專欄的樣子:
Industry
Research Scientist
Science: Education
Tech
Technical Assistance
Technology
Medical
Hospitality
這是應用代碼后的樣子:
Industry
Research, Science, & Education
Research, Science, & Education
Technology
Technology
Technology
Medical
Hospitality
告訴我這是否可行,我在您的 function 中更新了 for 循環
science = list(map(lambda x:x.lower(),["research", "science", "scientific", "scientist", "academia", "education", "academic"]))
tech = list(map(lambda x:x.lower(),["technology", "tech", "software"]))
def industry_convert(row):
global science,tech
if row.Industry.lower() in science:
row.Industry = "Research, Science, & Education"
if row.Industry.lower() in science:
row.Industry = "Technology"
return row
df = df.apply(industry_convert, axis = 1)
我計算的列表只降低了一次,因此它不會被重新計算並且for循環的計算被保存希望它工作快樂編碼^-^
就個人而言,我會使用str.contains
和.loc
來分配新值。
這將比單獨循環檢查每一行快很多倍。 (這是關於 pandas API 的反模式)
science = ["research", "science", "scientific", "scientist", "academia", "education", "academic"]
tech = ["technology", "tech", "software"]
df.loc[df['Industry'].str.contains(f"{'|'.join(science)}",regex=True,case=False),
'industry_new'] = "Research, Science, & Education"
df.loc[df['Industry'].str.contains(f"{'|'.join(tech)}",regex=True,case=False),
'industry_new'] = "Technology"
df['industry_new'] = df['industry_new'].fillna(df['Industry'])
print(df)
Industry industry_new
0 Research Scientist Research, Science, & Education
1 Science: Education Research, Science, & Education
2 Tech Technology
3 Technical Assistance Technology
4 Technology Technology
5 Medical Medical
6 Hospitality Hospitality
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.