[英]Split one column into multiple columns by multiple delimiters in Pandas
給定一個數據框,如下所示:
player score
0 Sergio Agüero Forward — Manchester City 209.98
1 Eden Hazard Midfield — Chelsea 274.04
2 Alexis Sánchez Forward — Arsenal 223.86
3 Yaya Touré Midfield — Manchester City 197.91
4 Angel María Midfield — Manchester United 132.23
如何將player
分成三個新的列name
、 position
和team
?
player score name position team
0 Sergio Agüero Forward — Manchester City 209.98 Sergio Forward Manchester City
1 Eden Hazard Midfield — Chelsea 274.04 Eden Midfield Chelsea
2 Alexis Sánchez Forward — Arsenal 223.86 Alexis Forward Arsenal
3 Yaya Touré Midfield — Manchester City 197.91 Yaya Midfield Manchester City
4 Angel María Midfield — Manchester United 132.23 Angel Midfield Manchester United
我考慮過用df[['name_position', 'team']] = df['player'].str.split(pat= ' — ', expand=True)
將它分成兩列,然后將name_position
拆分為name
和position
. 但是有沒有更好的解決方案?
非常感謝。
如果你想str.extract
,你也可以使用str.extract
:
print(df["player"].str.extract(r"(?P<name>.*?)\s.*?\s(?P<position>[A-Za-z]+)\s—\s(?P<team>.*)"))
name position team
0 Sergio Forward Manchester City
1 Eden Midfield Chelsea
2 Alexis Forward Arsenal
3 Yaya Midfield Manchester City
4 Angel Midfield Manchester United
您可以使用string.split()
按空格拆分 python 字符串。 這會將您的文本分解為'words'
,然后您可以簡單地訪問您喜歡的那個,如下所示:
string = "Sergio Agüero Forward — Manchester City"
name = string.split()[0]
position = string.split()[2]
team = string.split()[4] + (string.split().has_key(5) ? string.split()[5] : '')
對於更復雜的模式,您可以使用正則表達式,這是一個強大的字符串模式查找工具。
希望這有幫助:)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.