[英]Create a column that has the same length of the longest column in the data at the same time
[英]Create new column with data that has same column
我有 DataFrame 与此类似。 如何在其中一列中添加具有相同值的行名称的新列? 例如:
拥有这个:
name building
a blue
b white
c blue
d red
e blue
f red
如何得到这个?
name building in_building_with
a blue [c, e]
b white []
c blue [a, e]
d red [f]
e blue [a, c]
f red [d]
这是我只能想到的方法(最差):
r = df.groupby('building')['name'].agg(dict)
df['in_building_with'] = df.apply(lambda x: [r[x['building']][i] for i in (r[x['building']].keys()-[x.name])], axis=1)
东风:
name building in_building_with
0 a blue [c, e]
1 b white []
2 c blue [a, e]
3 d red [f]
4 e blue [a, c]
5 f red [d]
方法:
building
blue {0: 'a', 2: 'c', 4: 'e'}
red {3: 'd', 5: 'f'}
white {1: 'b'}
dtype: object
r[x['building']].keys()-[x.name]
如果顺序不重要,您可以这样做:
# create groups
groups = df.groupby('building').transform(dict.fromkeys).squeeze()
# remove value from each group
df['in_building_with'] = [list(group.keys() - (e,)) for e, group in zip(df['name'], groups)]
print(df)
Output
name building in_building_with
0 a blue [e, c]
1 b white []
2 c blue [e, a]
3 d red [f]
4 e blue [a, c]
5 f red [d]
可能有点晚了,但这是更简洁的方式,并且没有迭代对象(for-loops)。
感谢@Pygirl的回答并对其进行了改进:
r = df.groupby('building')['name'].agg(set)
df['in_building_with']= df.apply( lambda x: list(r[x['building']] - {x['name']}) , axis=1)
print(df)
Output:
name building in_building_with
0 a blue [e, c]
1 b white []
2 c blue [e, a]
3 d red [f]
4 e blue [a, c]
5 f red [d]
让我们在list comprehension
中使用boolean 索引和loc
从具有相同building
值的行中获取names
:
df['in_building_with'] = [
[*df.loc[df['building'].eq(y) & df['name'].ne(x), 'name']] for x, y in df.to_numpy()]
结果:
name building in_building_with
0 a blue [c, e]
1 b white []
2 c blue [a, e]
3 d red [f]
4 e blue [a, c]
5 f red [d]
代码:
import pandas as pd
# creating an extra column in the data frame
df['in_building_with'] = "";
for index, row in df.iterrows():
data = []
# below line will return the index that has the same color as that of row
buildingList = df[df['building'] == row['building']].index.tolist();
for i in buildingList:
# the list also contains the current row index, hence adding a check below
if(df['name'][i] != row['name']):
data.append(df['name'][i]);
df['in_building_with'][index] = data;
print(df)
结果:
name building in_building_with
0 a blue [c, e]
1 b white []
2 c blue [a, e]
3 d red [f]
4 e blue [a, c]
5 f red [d]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.