繁体   English   中英

使用具有相同列的数据创建新列

[英]Create new column with data that has same column

我有 DataFrame 与此类似。 如何在其中一列中添加具有相同值的行名称的新列? 例如:

拥有这个:

  name  building 
  a     blue
  b     white
  c     blue
  d     red
  e     blue
  f     red

如何得到这个?

  name  building  in_building_with
  a     blue      [c, e]
  b     white     []
  c     blue      [a, e]
  d     red       [f]
  e     blue      [a, c]
  f     red       [d]

这是我只能想到的方法(最差):

r = df.groupby('building')['name'].agg(dict)
df['in_building_with'] = df.apply(lambda  x: [r[x['building']][i] for i in (r[x['building']].keys()-[x.name])], axis=1)

东风:

name    building    in_building_with
0   a   blue    [c, e]
1   b   white   []
2   c   blue    [a, e]
3   d   red     [f]
4   e   blue    [a, c]
5   f   red     [d]

方法:

  1. 制作一本字典,它将为您提供建筑物所在的索引。

building
blue     {0: 'a', 2: 'c', 4: 'e'}
red              {3: 'd', 5: 'f'}
white                    {1: 'b'}
dtype: object

  1. 从列表中减去当前建筑物的索引,因为您正在查看除它之外的元素以获取外观索引。

r[x['building']].keys()-[x.name]

  1. 获取这些索引处的值并将它们放入列表中。

如果顺序不重要,您可以这样做:

# create groups
groups = df.groupby('building').transform(dict.fromkeys).squeeze()

# remove value from each group
df['in_building_with'] = [list(group.keys() - (e,)) for e, group in zip(df['name'], groups)]

print(df)

Output

  name building in_building_with
0    a     blue           [e, c]
1    b    white               []
2    c     blue           [e, a]
3    d      red              [f]
4    e     blue           [a, c]
5    f      red              [d]

可能有点晚了,但这是更简洁的方式,并且没有迭代对象(for-loops)。

感谢@Pygirl的回答并对其进行了改进:

r = df.groupby('building')['name'].agg(set)
df['in_building_with']= df.apply( lambda x: list(r[x['building']] - {x['name']}) , axis=1)

print(df)

Output:

    name building in_building_with
0    a     blue           [e, c]
1    b    white               []
2    c     blue           [e, a]
3    d      red              [f]
4    e     blue           [a, c]
5    f      red              [d]

让我们在list comprehension中使用boolean 索引loc从具有相同building值的行中获取names

df['in_building_with'] = [
    [*df.loc[df['building'].eq(y) & df['name'].ne(x), 'name']] for x, y in df.to_numpy()]

结果:

  name building in_building_with
0    a     blue           [c, e]
1    b    white               []
2    c     blue           [a, e]
3    d      red              [f]
4    e     blue           [a, c]
5    f      red              [d]

代码:

import pandas as pd
# creating an extra column in the data frame
df['in_building_with'] = "";
for index, row in df.iterrows():
   data = []
   # below line will return the index that has the same color as that of row
   buildingList = df[df['building'] == row['building']].index.tolist();
   for i in buildingList:
      # the list also contains the current row index, hence adding a check below
      if(df['name'][i] != row['name']):
          data.append(df['name'][i]);
   df['in_building_with'][index] = data;

print(df)

结果:

  name building in_building_with
0    a     blue           [c, e]
1    b    white               []
2    c     blue           [a, e]
3    d      red              [f]
4    e     blue           [a, c]
5    f      red              [d]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM