简体   繁体   English

python dataframe 基于另一列创建一列

[英]python dataframe create one column based on another column

I would like to create another column in a dataframe.我想在 dataframe 中创建另一个列。

The dataframe is like the following, sub_id is part of the id, say id is the 'parent' for sub_id, it includes id itself and some items included in id. dataframe如下,sub_id是id的一部分,说id是sub_id的'parent',它包括id本身和id中包含的一些项目。

id has no name but sub_id has corresponding name id 没有名字,但 sub_id 有对应的名字

I would like to check id with sub_id's name, and then create id's name我想用 sub_id 的名字检查 id,然后创建 id 的名字

df = pd.DataFrame({'id':[1,1,1,2,2],
                    'sub_id':[12,1,13,23,2],
                    'name':['pear','fruit','orange','cat','animal']})
   id  sub_id    name
0   1      12    pear
1   1       1   fruit
2   1      13  orange
3   2      23     cat
4   2       2  animal

I would like to create another column id_name, to get:我想创建另一个列 id_name,以获得:

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

I have no idea how it could be achieved efficiently, I only think of to merge the dataframe twice, but I think there is a better way.我不知道如何有效地实现它,我只想合并 dataframe 两次,但我认为有更好的方法。

If replace not matched id with sub_id to misisng values in Series.where then GroupBy.transform with first working, because return first non missing values:如果将不匹配的id与 sub_id 替换为在sub_id中的Series.where值,则GroupBy.transformfirst工作,因为返回第一个非缺失值:

df['id_name'] = (df['name'].where(df['id'].eq(df['sub_id']))
                           .groupby(df['id'])
                           .transform('first'))

Or filter rows by mask and mapping helper Series by Series.map :或者通过Series.map掩码和映射助手 Series 过滤行:

s = df[df['id'].eq(df['sub_id'])].set_index('id')['name']
df['id_name'] = df['id'].map(s)
print (df)
   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

Details :详情

print (df['name'].where(df['id'].eq(df['sub_id'])))
0       NaN
1     fruit
2       NaN
3       NaN
4    animal
Name: name, dtype: object


print (s)
id
1     fruit
2    animal
Name: name, dtype: object

Are your IDs unique?你的ID是独一无二的吗?

You use GroupBy.transform to get the min id per group and map this to the existing id :您使用GroupBy.transform获取每个组的min id 并将map用于现有id

df['id_name'] = (df.groupby('id')['sub_id'].transform('min')
                   .map(df.set_index('sub_id')['name'])
                )

output: output:

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python数据框:基于另一列创建列 - Python Dataframe: Create columns based on another column 基于另一列平均一个 python dataframe 列 - Average a python dataframe column based on another column 根据另一个数据框的列值创建一个数据框 - Create a dataframe based on column values of another dataframe 在 Python 中,如何根据另一列更改 dataframe 的一列? - In Python, how do I change one column of a dataframe based on another? 如何根据行中的另一个值在 dataframe 中创建列(Python) - How to create a column in a dataframe based on another value in the row (Python) 根据条件将一个 dataframe 列的值分配给另一个 dataframe 列 - assign values of one dataframe column to another dataframe column based on condition 根据另一个 dataframe 的列值打印一个 dataframe 的列值 - print column values of one dataframe based on the column values of another dataframe 一个 dataframe 列与另一个 dataframe 列的倍数基于条件 - One dataframe column multiple with another dataframe column based on condition 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? Pandas - 根据另一个填充一个数据框列 - Pandas - populate one dataframe column based on another
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM