如何在 Pandas 中创建新列，条件是重复另一列的值？

Question

I'm beginner in Python, I have a big DataFrame which looks like that:我是 Python 的初学者，我有一个大的 DataFrame，它看起来像这样：

import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
                    'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
                    'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df[["Total", "Type", "Count"]]
df

Output:输出：

   Total    Type    Count
0   10     Child    4
1   10       Boy    5
2   10      Girl    1
3   10     Senior   0
4   10      
5   10      
6   10      
7   10      
8   10      
9   10

I want to have something like that:我想要这样的东西：

    Total   Type    Count   New
0   10     Child       4    Child
1   10       Boy       5    Child
2   10      Girl       1    Child
3   10    Senior       0    Child
4   10                      Boy
5   10                      Boy
6   10                      Boy
7   10                      Boy
8   10                      Boy
9   10                      Girl

I don't know how I can create a new column with a condition to repeat Type ntime as the number of Count .我不知道如何创建一个新列，条件是重复Type ntime 作为Count 。

Thanks!谢谢！

Answer 1

Using repeat , replace the blank to 0 in Count使用repeat ， replace Count的空白replace为 0

df['New']=df.Type.repeat(df.Count.replace('',0)).values
df
Out[657]: 
  Count  Total    Type    New
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

Answer 2

Not sure if this is the fastest way but it is a simple one:不确定这是否是最快的方法，但它很简单：

from itertools import chain
import pandas as pd

df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
                    'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
                    'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df['New'] = list(chain.from_iterable([t] * c for t, c in zip(df.Type, df.Count) if c))
print(df)

Output:输出：

  Count  Total    Type    New
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

Answer 3

try this,试试这个，

df['New']= sum((df[df['Type']!=''].apply(lambda x: x['Count']*[x['Type']],axis=1)).values,[])

Output:输出：

  Count  Total    Type repeat
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

Answer 4

Try the below code, i multiplied df['Type'] to df['Count'] then flat out the list then create a new column for the flat list:试试下面的代码，我将df['Type']乘以df['Type'] df['Count']然后展开列表，然后为平面列表创建一个新列：

import numpy as np
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
                    'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
                    'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
dropped = [str((x+' ')*y).split() for x,y in list(zip(df['Type'].tolist(),df['Count'].tolist())) if type(x) and type(y) != str]
df['New'] = sum(dropped, [])
print(df)

Output:输出：

     Count Total Type   New
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

Answer 5

This is one way using itertools.chain and itertools.repeat :这是使用itertools.chain和itertools.repeat一种方式：

from itertools import chain, repeat

# calculate number of non-blank rows
n = (df['Type'] != '').sum()

# extract values for these rows
vals = df[['Type', 'Count']].iloc[:n].values

# iterate and repeat values
df['New'] = list(chain.from_iterable(repeat(*row) for row in vals))

print(df)

  Count  Total    Type    New
0     4     10   Child  Child
1     5     10     Boy  Child
2     1     10    Girl  Child
3     0     10  Senior  Child
4           10            Boy
5           10            Boy
6           10            Boy
7           10            Boy
8           10            Boy
9           10           Girl

如何在 Pandas 中创建新列，条件是重复另一列的值？

问题描述

5 个解决方案

解决方案1
8 已采纳 2018-06-07 13:46:21

解决方案2
2 2018-06-07 11:40:56

解决方案3
1 2018-06-07 11:54:33

解决方案4
1 2018-06-07 12:24:18

解决方案5
1 2018-06-07 12:43:57

如何在 Pandas 中创建新列，条件是重复另一列的值？

问题描述

5 个解决方案

解决方案1 8 已采纳 2018-06-07 13:46:21

解决方案2 2 2018-06-07 11:40:56

解决方案3 1 2018-06-07 11:54:33

解决方案4 1 2018-06-07 12:24:18

解决方案5 1 2018-06-07 12:43:57

解决方案1
8 已采纳 2018-06-07 13:46:21

解决方案2
2 2018-06-07 11:40:56

解决方案3
1 2018-06-07 11:54:33

解决方案4
1 2018-06-07 12:24:18

解决方案5
1 2018-06-07 12:43:57