[英]How to create new column in Pandas with condition to repeat by a value of another column?
I'm beginner in Python, I have a big DataFrame which looks like that:我是 Python 的初学者,我有一个大的 DataFrame,它看起来像这样:
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df[["Total", "Type", "Count"]]
df
Output:输出:
Total Type Count
0 10 Child 4
1 10 Boy 5
2 10 Girl 1
3 10 Senior 0
4 10
5 10
6 10
7 10
8 10
9 10
I want to have something like that:我想要这样的东西:
Total Type Count New
0 10 Child 4 Child
1 10 Boy 5 Child
2 10 Girl 1 Child
3 10 Senior 0 Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
I don't know how I can create a new column with a condition to repeat Type
ntime as the number of Count
.我不知道如何创建一个新列,条件是重复
Type
ntime 作为Count
。
Thanks!谢谢!
Using repeat
, replace
the blank to 0 in Count
使用
repeat
, replace
Count
的空白replace
为 0
df['New']=df.Type.repeat(df.Count.replace('',0)).values
df
Out[657]:
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
Not sure if this is the fastest way but it is a simple one:不确定这是否是最快的方法,但它很简单:
from itertools import chain
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df['New'] = list(chain.from_iterable([t] * c for t, c in zip(df.Type, df.Count) if c))
print(df)
Output:输出:
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
try this,试试这个,
df['New']= sum((df[df['Type']!=''].apply(lambda x: x['Count']*[x['Type']],axis=1)).values,[])
Output:输出:
Count Total Type repeat
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
Try the below code, i multiplied df['Type']
to df['Count']
then flat out the list then create a new column for the flat list:试试下面的代码,我将
df['Type']
乘以df['Type']
df['Count']
然后展开列表,然后为平面列表创建一个新列:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
dropped = [str((x+' ')*y).split() for x,y in list(zip(df['Type'].tolist(),df['Count'].tolist())) if type(x) and type(y) != str]
df['New'] = sum(dropped, [])
print(df)
Output:输出:
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
This is one way using itertools.chain
and itertools.repeat
:这是使用
itertools.chain
和itertools.repeat
一种方式:
from itertools import chain, repeat
# calculate number of non-blank rows
n = (df['Type'] != '').sum()
# extract values for these rows
vals = df[['Type', 'Count']].iloc[:n].values
# iterate and repeat values
df['New'] = list(chain.from_iterable(repeat(*row) for row in vals))
print(df)
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.