[英]Remove everything after the first whitespace in pandas dataframe
Here is the dataframe:这是dataframe:
State RegionName
0 NY New York
1 CA Los Angeles
2 IL Chicago 865
3 PA Philadelphia Wrin
4 AZ Phoenix City
I want the output to look like this:我希望 output 看起来像这样:
State RegionName
0 NY New
1 CA Los
2 IL Chicago
3 PA Philadelphia
4 AZ Phoenix
How to do it without using for loops??如何在不使用 for 循环的情况下做到这一点?
Use Series.str.split
with select first values by indexing:通过索引将Series.str.split
与 select 第一个值一起使用:
print (df['RegionName'].str.split())
0 [New, York]
1 [Los, Angeles]
2 [Chicago, 865]
3 [Philadelphia, Wrin]
4 [Phoenix, City]
Name: RegionName, dtype: object
df['RegionName'] = df['RegionName'].str.split().str[0]
print (df)
State RegionName
0 NY New
1 CA Los
2 IL Chicago
3 PA Philadelphia
4 AZ Phoeni
Here's an alternative using pd.Series.str.extract
这是使用pd.Series.str.extract
的替代方法
df['RegionName'] = df['RegionName'].str.extract(r'(.*)\s')
But my first instinct would be to use what @jezrael mentioned.但我的第一直觉是使用@jezrael提到的内容。
You could also str.extract
the start of the string but exclude space ^[^\s]+
using regex您也可以str.extract
字符串的开头,但使用正则表达式排除空格^[^\s]+
df['RegionName']=df['RegionName'].str.extract('(^[^\s]+)')
You can replace extra words by ''
using str.replace
您可以使用str.replace
将多余的单词替换为''
df["RegionName"] = df.RegionName.str.replace('\s.*','')
df
RegionName state
0 New NY
1 Los CA
2 Chicago IL
3 Philadelphia PA
4 Phoenix AZ
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.