[英]How do I create a new column in a dataframe from an existing column using conditions?
I have one column containing all the data which looks something like this (values that need to be separated have a mark like (c)): 我有一列包含所有看起来像这样的数据(需要分隔的值有一个像(c)这样的标记):
UK (c)
London
Wales
Liverpool
US (c)
Chicago
New York
San Francisco
Seattle
Australia (c)
Sydney
Perth
And I want it split into two columns looking like this: 我希望它分成两列,如下所示:
London UK
Wales UK
Liverpool UK
Chicago US
New York US
San Francisco US
Seattle US
Sydney Australia
Perth Australia
Question 2: What if the countries did not have a pattern like (c)? 问题2:如果这些国家没有像(c)那样的模式怎么办?
Step by step with endswith
and ffill
+ str.strip
使用endswith
和ffill
+ str.strip
一步一步
df['country']=df.loc[df.city.str.endswith('(c)'),'city']
df.country=df.country.ffill()
df=df[df.city.ne(df.country)]
df.country=df.country.str.strip('(c)')
extract
and ffill
extract
和ffill
Start with extract
and ffill
, then remove redundant rows. 从extract
和ffill
开始,然后删除冗余行。
df['country'] = (
df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill())
df[~df['data'].str.contains('(c)', regex=False)].reset_index(drop=True)
data country
0 London UK
1 Wales UK
2 Liverpool UK
3 Chicago US
4 New York US
5 San Francisco US
6 Seattle US
7 Sydney Australia
8 Perth Australia
Where, 哪里,
df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill()
0 UK
1 UK
2 UK
3 UK
4 US
5 US
6 US
7 US
8 US
9 Australia
10 Australia
11 Australia
Name: country, dtype: object
The pattern '(.*)\\s+\\(c\\)'
matches strings of the form "country (c)" and extracts the country name. 模式'(.*)\\s+\\(c\\)'
匹配“country(c)”形式的字符串并提取国家/地区名称。 Anything not matching this pattern is replaced with NaN, so you can conveniently forward fill on rows. 任何与此模式不匹配的内容都将替换为NaN,因此您可以方便地向前填充行。
split
with np.where
and ffill
split
与np.where
和ffill
This splits on "(c)". 这分为“(c)”。
u = df['data'].str.split(r'\s+\(c\)')
df['country'] = pd.Series(np.where(u.str.len() == 2, u.str[0], np.nan)).ffill()
df[~df['data'].str.contains('(c)', regex=False)].reset_index(drop=True)
data country
0 London UK
1 Wales UK
2 Liverpool UK
3 Chicago US
4 New York US
5 San Francisco US
6 Seattle US
7 Sydney Australia
8 Perth Australia
You can first use str.extract
to locate the cities ending in (c)
and extract the country name, and ffill
to populate a new country
column. 您可以先使用str.extract
定位以(c)
结尾的城市并提取国家/地区名称,然后ffill
以填充新的country
列。
The same extracted matches can be use to locate the rows to be dropped, ie rows which are notna
: 可以使用相同的提取匹配来定位要删除的行,即notna
行:
m = df.city.str.extract('^(.*?)(?=\(c\)$)')
ix = m[m.squeeze().notna()].index
df['country'] = m.ffill()
df.drop(ix)
city country
1 London UK
2 Wales UK
3 Liverpool UK
5 Chicago US
6 New York US
7 San Francisco US
8 Seattle US
10 Sydney Australia
11 Perth Australia
You can use np.where
with str.contains
too: 您可以使用np.where
与str.contains
太:
mask = df['places'].str.contains('(c)', regex = False)
df['country'] = np.where(mask, df['places'], np.nan)
df['country'] = df['country'].str.replace('\(c\)', '').ffill()
df = df[~mask]
df
places country
1 London UK
2 Wales UK
3 Liverpool UK
5 Chicago US
6 New York US
7 San Francisco US
8 Seattle US
10 Sydney Australia
11 Perth Australia
The str contains looks for (c)
and if present will return True for that index. str包含(c)
查找,如果存在,将为该索引返回True。 Where this condition is True, the country value will be added to the country columns 如果此条件为True,则国家/地区值将添加到国家/地区列中
You could do the following: 您可以执行以下操作:
data = ['UK (c)','London','Wales','Liverpool','US (c)','Chicago','New York','San Francisco','Seattle','Australia (c)','Sydney','Perth']
df = pd.DataFrame(data, columns = ['city'])
df['country'] = df.city.apply(lambda x : x.replace('(c)','') if '(c)' in x else None)
df.fillna(method='ffill', inplace=True)
df = df[df['city'].str.contains('\(c\)')==False]
Output 产量
+-----+----------------+-----------+
| | city | country |
+-----+----------------+-----------+
| 1 | London | UK |
| 2 | Wales | UK |
| 3 | Liverpool | UK |
| 5 | Chicago | US |
| 6 | New York | US |
| 7 | San Francisco | US |
| 8 | Seattle | US |
| 10 | Sydney | Australia |
| 11 | Perth | Australia |
+-----+----------------+-----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.