如何使用条件从现有列在数据框中创建新列？

Question

I have one column containing all the data which looks something like this (values that need to be separated have a mark like (c)): 我有一列包含所有看起来像这样的数据（需要分隔的值有一个像（c）这样的标记）：

UK (c)
London
Wales
Liverpool
US (c)
Chicago
New York
San Francisco
Seattle
Australia (c)
Sydney
Perth

And I want it split into two columns looking like this: 我希望它分成两列，如下所示：

London          UK
Wales           UK
Liverpool       UK
Chicago         US
New York        US
San Francisco   US
Seattle         US
Sydney          Australia
Perth           Australia

Question 2: What if the countries did not have a pattern like (c)? 问题2：如果这些国家没有像（c）那样的模式怎么办？

Answer 1

Step by step with endswith and ffill + str.strip 使用endswith和ffill + str.strip一步一步

df['country']=df.loc[df.city.str.endswith('(c)'),'city']
df.country=df.country.ffill()
df=df[df.city.ne(df.country)]
df.country=df.country.str.strip('(c)')

Answer 2

`extract` and `ffill` `extract`和`ffill`

Start with extract and ffill , then remove redundant rows. 从extract和ffill开始，然后删除冗余行。

df['country'] = (
    df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill())
df[~df['data'].str.contains('(c)', regex=False)].reset_index(drop=True)

            data    country
0         London         UK
1          Wales         UK
2      Liverpool         UK
3        Chicago         US
4       New York         US
5  San Francisco         US
6        Seattle         US
7         Sydney  Australia
8          Perth  Australia

Where, 哪里，

df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill()

0            UK
1            UK
2            UK
3            UK
4            US
5            US
6            US
7            US
8            US
9     Australia
10    Australia
11    Australia
Name: country, dtype: object

The pattern '(.*)\\s+\\(c\\)' matches strings of the form "country (c)" and extracts the country name. 模式'(.*)\\s+\\(c\\)'匹配“country（c）”形式的字符串并提取国家/地区名称。 Anything not matching this pattern is replaced with NaN, so you can conveniently forward fill on rows. 任何与此模式不匹配的内容都将替换为NaN，因此您可以方便地向前填充行。

`split` with `np.where` and `ffill` `split`与`np.where`和`ffill`

This splits on "(c)". 这分为“（c）”。

u = df['data'].str.split(r'\s+\(c\)')
df['country'] = pd.Series(np.where(u.str.len() == 2, u.str[0], np.nan)).ffill()

df[~df['data'].str.contains('(c)', regex=False)].reset_index(drop=True)

            data    country
0         London         UK
1          Wales         UK
2      Liverpool         UK
3        Chicago         US
4       New York         US
5  San Francisco         US
6        Seattle         US
7         Sydney  Australia
8          Perth  Australia

Answer 3

You can first use str.extract to locate the cities ending in (c) and extract the country name, and ffill to populate a new country column. 您可以先使用str.extract定位以(c)结尾的城市并提取国家/地区名称，然后ffill以填充新的country列。

The same extracted matches can be use to locate the rows to be dropped, ie rows which are notna : 可以使用相同的提取匹配来定位要删除的行，即notna行：

m = df.city.str.extract('^(.*?)(?=\(c\)$)')
ix = m[m.squeeze().notna()].index
df['country'] = m.ffill()
df.drop(ix)

            city     country
1          London         UK 
2           Wales         UK 
3       Liverpool         UK 
5         Chicago         US 
6        New York         US 
7   San Francisco         US 
8         Seattle         US 
10         Sydney  Australia 
11          Perth  Australia

Answer 4

You can use np.where with str.contains too: 您可以使用np.where与str.contains太：

mask = df['places'].str.contains('(c)', regex = False)
df['country'] = np.where(mask, df['places'], np.nan)
df['country'] = df['country'].str.replace('\(c\)', '').ffill()
df = df[~mask]
df
            places     country
1          London         UK 
2           Wales         UK 
3       Liverpool         UK 
5         Chicago         US 
6        New York         US 
7   San Francisco         US 
8         Seattle         US 
10         Sydney  Australia 
11          Perth  Australia

The str contains looks for (c) and if present will return True for that index. str包含(c)查找，如果存在，将为该索引返回True。 Where this condition is True, the country value will be added to the country columns 如果此条件为True，则国家/地区值将添加到国家/地区列中

Answer 5

You could do the following: 您可以执行以下操作：

data = ['UK (c)','London','Wales','Liverpool','US (c)','Chicago','New York','San Francisco','Seattle','Australia (c)','Sydney','Perth']
df = pd.DataFrame(data, columns = ['city'])
df['country'] = df.city.apply(lambda x : x.replace('(c)','') if '(c)' in x else None)
df.fillna(method='ffill', inplace=True)
df = df[df['city'].str.contains('\(c\)')==False]

Output 产量

+-----+----------------+-----------+
|     |     city       |  country  |
+-----+----------------+-----------+
|  1  | London         | UK        |
|  2  | Wales          | UK        |
|  3  | Liverpool      | UK        |
|  5  | Chicago        | US        |
|  6  | New York       | US        |
|  7  | San Francisco  | US        |
|  8  | Seattle        | US        |
| 10  | Sydney         | Australia |
| 11  | Perth          | Australia |
+-----+----------------+-----------+

如何使用条件从现有列在数据框中创建新列？

问题描述

5 个解决方案

解决方案1
10 已采纳 2019-06-27 14:10:49

解决方案2
7 2019-06-27 14:03:07

`extract` and `ffill` `extract`和`ffill`

`split` with `np.where` and `ffill` `split`与`np.where`和`ffill`

解决方案3
6 2019-06-27 14:03:33

解决方案4
5 2019-06-27 14:08:37

解决方案5
3 2019-06-27 14:06:45

如何使用条件从现有列在数据框中创建新列？

问题描述

5 个解决方案

解决方案1 10 已采纳 2019-06-27 14:10:49

解决方案2 7 2019-06-27 14:03:07

extract and ffill extract和ffill

split with np.where and ffill split与np.where和ffill

解决方案3 6 2019-06-27 14:03:33

解决方案4 5 2019-06-27 14:08:37

解决方案5 3 2019-06-27 14:06:45

解决方案1
10 已采纳 2019-06-27 14:10:49

解决方案2
7 2019-06-27 14:03:07

`extract` and `ffill` `extract`和`ffill`

`split` with `np.where` and `ffill` `split`与`np.where`和`ffill`

解决方案3
6 2019-06-27 14:03:33

解决方案4
5 2019-06-27 14:08:37

解决方案5
3 2019-06-27 14:06:45