[英]How to create new column in Pandas with condition to repeat by a value of another column?
我是 Python 的初學者,我有一個大的 DataFrame,它看起來像這樣:
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df[["Total", "Type", "Count"]]
df
輸出:
Total Type Count
0 10 Child 4
1 10 Boy 5
2 10 Girl 1
3 10 Senior 0
4 10
5 10
6 10
7 10
8 10
9 10
我想要這樣的東西:
Total Type Count New
0 10 Child 4 Child
1 10 Boy 5 Child
2 10 Girl 1 Child
3 10 Senior 0 Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
我不知道如何創建一個新列,條件是重復Type
ntime 作為Count
。
謝謝!
使用repeat
, replace
Count
的空白replace
為 0
df['New']=df.Type.repeat(df.Count.replace('',0)).values
df
Out[657]:
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
不確定這是否是最快的方法,但它很簡單:
from itertools import chain
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
df['New'] = list(chain.from_iterable([t] * c for t, c in zip(df.Type, df.Count) if c))
print(df)
輸出:
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
試試這個,
df['New']= sum((df[df['Type']!=''].apply(lambda x: x['Count']*[x['Type']],axis=1)).values,[])
輸出:
Count Total Type repeat
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
試試下面的代碼,我將df['Type']
乘以df['Type']
df['Count']
然后展開列表,然后為平面列表創建一個新列:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Total': [10, 10, 10, 10, 10, 10, 10, 10, 10, 10], \
'Type': ['Child', 'Boy', 'Girl', 'Senior', '', '', '', '', '', ''], \
'Count': [4, 5, 1, 0, '', '', '', '', '', '']})
dropped = [str((x+' ')*y).split() for x,y in list(zip(df['Type'].tolist(),df['Count'].tolist())) if type(x) and type(y) != str]
df['New'] = sum(dropped, [])
print(df)
輸出:
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
這是使用itertools.chain
和itertools.repeat
一種方式:
from itertools import chain, repeat
# calculate number of non-blank rows
n = (df['Type'] != '').sum()
# extract values for these rows
vals = df[['Type', 'Count']].iloc[:n].values
# iterate and repeat values
df['New'] = list(chain.from_iterable(repeat(*row) for row in vals))
print(df)
Count Total Type New
0 4 10 Child Child
1 5 10 Boy Child
2 1 10 Girl Child
3 0 10 Senior Child
4 10 Boy
5 10 Boy
6 10 Boy
7 10 Boy
8 10 Boy
9 10 Girl
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.