简体   繁体   English

如何使用条件从现有列在数据框中创建新列?

[英]How do I create a new column in a dataframe from an existing column using conditions?

I have one column containing all the data which looks something like this (values that need to be separated have a mark like (c)): 我有一列包含所有看起来像这样的数据(需要分隔的值有一个像(c)这样的标记):

UK (c)
London
Wales
Liverpool
US (c)
Chicago
New York
San Francisco
Seattle
Australia (c)
Sydney
Perth

And I want it split into two columns looking like this: 我希望它分成两列,如下所示:

London          UK
Wales           UK
Liverpool       UK
Chicago         US
New York        US
San Francisco   US
Seattle         US
Sydney          Australia
Perth           Australia

Question 2: What if the countries did not have a pattern like (c)? 问题2:如果这些国家没有像(c)那样的模式怎么办?

Step by step with endswith and ffill + str.strip 使用endswithffill + str.strip一步一步

df['country']=df.loc[df.city.str.endswith('(c)'),'city']
df.country=df.country.ffill()
df=df[df.city.ne(df.country)]
df.country=df.country.str.strip('(c)')

extract and ffill extractffill

Start with extract and ffill , then remove redundant rows. extractffill开始,然后删除冗余行。

df['country'] = (
    df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill())
df[~df['data'].str.contains('(c)', regex=False)].reset_index(drop=True)

            data    country
0         London         UK
1          Wales         UK
2      Liverpool         UK
3        Chicago         US
4       New York         US
5  San Francisco         US
6        Seattle         US
7         Sydney  Australia
8          Perth  Australia

Where, 哪里,

df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill()

0            UK
1            UK
2            UK
3            UK
4            US
5            US
6            US
7            US
8            US
9     Australia
10    Australia
11    Australia
Name: country, dtype: object

The pattern '(.*)\\s+\\(c\\)' matches strings of the form "country (c)" and extracts the country name. 模式'(.*)\\s+\\(c\\)'匹配“country(c)”形式的字符串并提取国家/地区名称。 Anything not matching this pattern is replaced with NaN, so you can conveniently forward fill on rows. 任何与此模式不匹配的内容都将替换为NaN,因此您可以方便地向前填充行。


split with np.where and ffill splitnp.whereffill

This splits on "(c)". 这分为“(c)”。

u = df['data'].str.split(r'\s+\(c\)')
df['country'] = pd.Series(np.where(u.str.len() == 2, u.str[0], np.nan)).ffill()

df[~df['data'].str.contains('(c)', regex=False)].reset_index(drop=True)

            data    country
0         London         UK
1          Wales         UK
2      Liverpool         UK
3        Chicago         US
4       New York         US
5  San Francisco         US
6        Seattle         US
7         Sydney  Australia
8          Perth  Australia

You can first use str.extract to locate the cities ending in (c) and extract the country name, and ffill to populate a new country column. 您可以先使用str.extract定位以(c)结尾的城市并提取国家/地区名称,然后ffill以填充新的country列。

The same extracted matches can be use to locate the rows to be dropped, ie rows which are notna : 可以使用相同的提取匹配来定位要删除的行,即notna行:

m = df.city.str.extract('^(.*?)(?=\(c\)$)')
ix = m[m.squeeze().notna()].index
df['country'] = m.ffill()
df.drop(ix)

            city     country
1          London         UK 
2           Wales         UK 
3       Liverpool         UK 
5         Chicago         US 
6        New York         US 
7   San Francisco         US 
8         Seattle         US 
10         Sydney  Australia 
11          Perth  Australia 

You can use np.where with str.contains too: 您可以使用np.wherestr.contains太:

mask = df['places'].str.contains('(c)', regex = False)
df['country'] = np.where(mask, df['places'], np.nan)
df['country'] = df['country'].str.replace('\(c\)', '').ffill()
df = df[~mask]
df
            places     country
1          London         UK 
2           Wales         UK 
3       Liverpool         UK 
5         Chicago         US 
6        New York         US 
7   San Francisco         US 
8         Seattle         US 
10         Sydney  Australia 
11          Perth  Australia 

The str contains looks for (c) and if present will return True for that index. str包含(c)查找,如果存在,将为该索引返回True。 Where this condition is True, the country value will be added to the country columns 如果此条件为True,则国家/地区值将添加到国家/地区列中

You could do the following: 您可以执行以下操作:

data = ['UK (c)','London','Wales','Liverpool','US (c)','Chicago','New York','San Francisco','Seattle','Australia (c)','Sydney','Perth']
df = pd.DataFrame(data, columns = ['city'])
df['country'] = df.city.apply(lambda x : x.replace('(c)','') if '(c)' in x else None)
df.fillna(method='ffill', inplace=True)
df = df[df['city'].str.contains('\(c\)')==False]

Output 产量

+-----+----------------+-----------+
|     |     city       |  country  |
+-----+----------------+-----------+
|  1  | London         | UK        |
|  2  | Wales          | UK        |
|  3  | Liverpool      | UK        |
|  5  | Chicago        | US        |
|  6  | New York       | US        |
|  7  | San Francisco  | US        |
|  8  | Seattle        | US        |
| 10  | Sydney         | Australia |
| 11  | Perth          | Australia |
+-----+----------------+-----------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我想使用多个条件基于现有的旧列创建新列如何做到这一点 - I want to create new column based on existing old column using multiple conditions how to do that 如何根据另一个数据框的条件创建新的数据框列? - How do I create a new dataframe column based on conditions of another dataframe? 如何从 pandas dataframe 中的现有列创建新列 - How to create a new column from an existing column in a pandas dataframe 如何从现有列值创建新列? - How do I create new columns from existing column values? 如何基于将现有列值与值列表匹配来简洁地创建新的 dataframe 列? - how do I succinctly create a new dataframe column based on matching existing column values with list of values? 如何根据 Jupyter 中的现有列在数据框中创建新列? - How do I create new column in dataframe based on an existing column in Jupyter? 如何使用 split() 方法从现有字符串列创建新的 Dataframe 列? - How to create a new Dataframe column from an existing string column using split() method? 如何根据其他列的条件在数据框中创建新列? - How do I create a new column in a dataframe based on conditions of other columns? 如何向现有 dataframe 添加新列并用另一列的部分数据填充它? - How do I add a new column to an existing dataframe and fill it with partial data from another column? 根据现有列中的条件在 dataframe 中创建新列 - Create new column in dataframe based on conditions in existing columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM