使用具有相同列的数据创建新列

Question

我有 DataFrame 与此类似。 如何在其中一列中添加具有相同值的行名称的新列？ 例如：

拥有这个：

  name  building 
  a     blue
  b     white
  c     blue
  d     red
  e     blue
  f     red

如何得到这个？

  name  building  in_building_with
  a     blue      [c, e]
  b     white     []
  c     blue      [a, e]
  d     red       [f]
  e     blue      [a, c]
  f     red       [d]

Answer 1

这是我只能想到的方法（最差）：

r = df.groupby('building')['name'].agg(dict)
df['in_building_with'] = df.apply(lambda  x: [r[x['building']][i] for i in (r[x['building']].keys()-[x.name])], axis=1)

东风：

name    building    in_building_with
0   a   blue    [c, e]
1   b   white   []
2   c   blue    [a, e]
3   d   red     [f]
4   e   blue    [a, c]
5   f   red     [d]

方法：

制作一本字典，它将为您提供建筑物所在的索引。

building
blue     {0: 'a', 2: 'c', 4: 'e'}
red              {3: 'd', 5: 'f'}
white                    {1: 'b'}
dtype: object

从列表中减去当前建筑物的索引，因为您正在查看除它之外的元素以获取外观索引。

r[x['building']].keys()-[x.name]

获取这些索引处的值并将它们放入列表中。

Answer 2

如果顺序不重要，您可以这样做：

# create groups
groups = df.groupby('building').transform(dict.fromkeys).squeeze()

# remove value from each group
df['in_building_with'] = [list(group.keys() - (e,)) for e, group in zip(df['name'], groups)]

print(df)

Output

  name building in_building_with
0    a     blue           [e, c]
1    b    white               []
2    c     blue           [e, a]
3    d      red              [f]
4    e     blue           [a, c]
5    f      red              [d]

Answer 3

可能有点晚了，但这是更简洁的方式，并且没有迭代对象（for-loops）。

感谢@Pygirl的回答并对其进行了改进：

r = df.groupby('building')['name'].agg(set)
df['in_building_with']= df.apply( lambda x: list(r[x['building']] - {x['name']}) , axis=1)

print(df)

Output：

    name building in_building_with
0    a     blue           [e, c]
1    b    white               []
2    c     blue           [e, a]
3    d      red              [f]
4    e     blue           [a, c]
5    f      red              [d]

Answer 4

让我们在list comprehension中使用boolean 索引和loc从具有相同building值的行中获取names ：

df['in_building_with'] = [
    [*df.loc[df['building'].eq(y) & df['name'].ne(x), 'name']] for x, y in df.to_numpy()]

结果：

  name building in_building_with
0    a     blue           [c, e]
1    b    white               []
2    c     blue           [a, e]
3    d      red              [f]
4    e     blue           [a, c]
5    f      red              [d]

Answer 5

代码：

import pandas as pd
# creating an extra column in the data frame
df['in_building_with'] = "";
for index, row in df.iterrows():
   data = []
   # below line will return the index that has the same color as that of row
   buildingList = df[df['building'] == row['building']].index.tolist();
   for i in buildingList:
      # the list also contains the current row index, hence adding a check below
      if(df['name'][i] != row['name']):
          data.append(df['name'][i]);
   df['in_building_with'][index] = data;

print(df)

结果：

  name building in_building_with
0    a     blue           [c, e]
1    b    white               []
2    c     blue           [a, e]
3    d      red              [f]
4    e     blue           [a, c]
5    f      red              [d]

使用具有相同列的数据创建新列

问题描述

5 个解决方案

解决方案1
4 已采纳 2020-12-07 13:22:14

解决方案2
4 2020-12-07 13:35:35

解决方案3
2 2020-12-22 22:26:45

解决方案4
1 2020-12-07 14:18:25

解决方案5
0 2020-12-07 14:29:34

使用具有相同列的数据创建新列

问题描述

5 个解决方案

解决方案1 4 已采纳 2020-12-07 13:22:14

解决方案2 4 2020-12-07 13:35:35

解决方案3 2 2020-12-22 22:26:45

解决方案4 1 2020-12-07 14:18:25

解决方案5 0 2020-12-07 14:29:34

解决方案1
4 已采纳 2020-12-07 13:22:14

解决方案2
4 2020-12-07 13:35:35

解决方案3
2 2020-12-22 22:26:45

解决方案4
1 2020-12-07 14:18:25

解决方案5
0 2020-12-07 14:29:34