Python Pandas：如果條件為真，則將現有列值放入新列

Question

我想修改我的Pandas 數據框，因此如果一個can列值 = 'Group Total'，則同一行的cv1和cvs1值將放置在我的數據框中上述行的新pv1和pvs1列中。 如果pty_n = 'Independent'，我希望 pv1 和 pvs1 值與同一行中的 'cv1' 和 'csv1' 值相同。 這是一個插圖：

然而，現在，我收到的東西是這樣的：

{'rg': {0: 'Oceania', 1: 'Oceania', 2: 'Oceania', 3: 'Oceania', 4: 'Oceania', 5: 'Oceania', 6: 'Oceania', 7: 'Oceania', 8: 'Oceania', 9: 'Oceania'}, 'ctr_n': {0: 'Australia', 1: 'Australia', 2: 'Australia', 3: 'Australia', 4: 'Australia', 5: 'Australia', 6: 'Australia', 7: 'Australia', 8: 'Australia', 9: 'Australia'}, 'ctr': {0: '', 1: '', 2: '', 3: '', 4: '', 5: '', 6: '', 7: '', 8: '', 9: ''}, 'yr': {0: '2019', 1: '2019', 2: '2019', 3: '2019', 4: '2019', 5: '2019', 6: '2019', 7: '2019', 8: '2019', 9: '2019'}, 'mn': {0: '06', 1: '06', 2: '06', 3: '06', 4: '06', 5: '06', 6: '06', 7: '06', 8: '06', 9: '06'}, 'sub': {0: '-990', 1: '-990', 2: '-990', 3: '-990', 4: '-990', 5: '-990', 6: '-990', 7: '-990', 8: '-990', 9: '-990'}, 'cst_n': {0: 'Canberra, ACT', 1: 'Canberra, ACT', 2: 'Canberra, ACT', 3: 'Canberra, ACT', 4: 'Canberra, ACT', 5: 'Canberra, ACT', 6: 'Canberra, ACT', 7: 'Canberra, ACT', 8: 'Canberra, ACT', 9: 'Canberra, ACT'}, 'cst': {0: '', 1: '', 2: '', 3: '', 4: '', 5: '', 6: '', 7: '', 8: '', 9: ''}, 'can': {0: 'Ticket Votes', 1: 'SESELJA, Zed', 2: 'GUNNING, Robert', 3: 'Group Total', 4: 'Ticket Votes', 5: 'KYBURZ, Penny', 6: 'DAVIDSON, Emma', 7: 'Group Total', 8: 'Ticket Votes', 9: 'PESEC, Anthony'}, 'pty_n': {0: 'Liberal', 1: 'Liberal', 2: 'Liberal', 3: 'Liberal', 4: 'The Greens', 5: 'The Greens', 6: 'The Greens', 7: 'The Greens', 8: '\xa0', 9: '\xa0'}, 'cv1': {0: '21,209', 1: '2,142', 2: '1,001', 3: '24,352', 4: '14,637', 5: '5,719', 6: '875', 7: '21,231', 8: '1,404', 9: '3,225'}, 'cvs1': {0: '24.15', 1: '2.44', 2: '1.14', 3: '27.73', 4: '16.67', 5: '6.51', 6: '1.00', 7: '24.17', 8: '1.60', 9: '3.67'}, 'vv1': {0: '87,828', 1: '87,828', 2: '87,828', 3: '87,828', 4: '87,828', 5: '87,828', 6: '87,828', 7: '87,828', 8: '87,828', 9: '87,828'}, 'pv1': {0: '24,352', 1: '24,352', 2: '24,352', 3: '24,352', 4: '24,352', 5: '24,352', 6: '24,352', 7: '24,352', 8: '24,352', 9: '24,352'}, 'pvs1': {0: '27.73', 1: '27.73', 2: '27.73', 3: '27.73', 4: '27.73', 5: '27.73', 6: '27.73', 7: '27.73', 8: '27.73', 9: '27.73'}}

如何修改我的代碼，使結果看起來像第一個圖像，而不是第二個？ 對於上下文，這將適用於熊貓DataFrame中的> 20,000行，其中'pty_n'值沒有模式的變化（例如，4行的自由主義，4行綠色，7排勞動，2行的公民，2排公民）。）謝謝！

aust19 = pd.DataFrame({
'rg' : region,
'ctr_n' : ctrname,
'ctr' : ctrcode,
'yr' : year,
'mn' : month,
'sub' : sub,
'cst_n': constituencies,
'cst' : cstcode,
'can': candidates,
'pty_n': partynames,
'cv1': canvotes,
'cvs1': canshare,
'vv1': totalvotes 
})

real_pv1 = None
real_pvs1 = None

for idx, row in aust19.iloc[::-1].iterrows():
    if row.can == "Group Total":
        real_pv1 = row.cv1
        real_pvs1 = row.cvs1
    else:
        aust19.loc[idx].pv1 = real_pv1
        aust19.loc[idx].pvs1 = real_pvs1

    aust19['pv1'] = real_pv1
    aust19['pvs1'] = real_pvs1
    
aust19.to_csv("austtbd.csv")

Answer 1

我用於這種事情的一般模式是：

dataframe.loc[condition, destination columns] = dataframe.loc[condition, source columns]

這利用了矢量化的熊貓運算符

更具體地說，對於您的用例，這可以分兩步完成，例如：

aust19.loc[aust19["can"] == "Group Total", ["pv1", "pvs1"]] = aust19.loc[aust19["can"] == "Group Total", ["cv1", "cvs1"]]

aust19.loc[aust19["pty_n"] == "Independent", ["pv1", "pvs1"]] = aust19.loc[aust19["pty_n"] == "Independent", ["cv1", "cvs1"]]

編輯：

我能夠在數據幀的多次傳遞中完成此操作以滿足條件

aust19 = pd.DataFrame({'rg': {0: 'Oceania', 1: 'Oceania', 2: 'Oceania', 3: 'Oceania', 4: 'Oceania', 5: 'Oceania', 6: 'Oceania', 7: 'Oceania', 8: 'Oceania', 9: 'Oceania'}, 'ctr_n': {0: 'Australia', 1: 'Australia', 2: 'Australia', 3: 'Australia', 4: 'Australia', 5: 'Australia', 6: 'Australia', 7: 'Australia', 8: 'Australia', 9: 'Australia'}, 'ctr': {0: '', 1: '', 2: '', 3: '', 4: '', 5: '', 6: '', 7: '', 8: '', 9: ''}, 'yr': {0: '2019', 1: '2019', 2: '2019', 3: '2019', 4: '2019', 5: '2019', 6: '2019', 7: '2019', 8: '2019', 9: '2019'}, 'mn': {0: '06', 1: '06', 2: '06', 3: '06', 4: '06', 5: '06', 6: '06', 7: '06', 8: '06', 9: '06'}, 'sub': {0: '-990', 1: '-990', 2: '-990', 3: '-990', 4: '-990', 5: '-990', 6: '-990', 7: '-990', 8: '-990', 9: '-990'}, 'cst_n': {0: 'Canberra, ACT', 1: 'Canberra, ACT', 2: 'Canberra, ACT', 3: 'Canberra, ACT', 4: 'Canberra, ACT', 5: 'Canberra, ACT', 6: 'Canberra, ACT', 7: 'Canberra, ACT', 8: 'Canberra, ACT', 9: 'Canberra, ACT'}, 'cst': {0: '', 1: '', 2: '', 3: '', 4: '', 5: '', 6: '', 7: '', 8: '', 9: ''}, 'can': {0: 'Ticket Votes', 1: 'SESELJA, Zed', 2: 'GUNNING, Robert', 3: 'Group Total', 4: 'Ticket Votes', 5: 'KYBURZ, Penny', 6: 'DAVIDSON, Emma', 7: 'Group Total', 8: 'Ticket Votes', 9: 'PESEC, Anthony'}, 'pty_n': {0: 'Liberal', 1: 'Liberal', 2: 'Liberal', 3: 'Liberal', 4: 'The Greens', 5: 'The Greens', 6: 'The Greens', 7: 'The Greens', 8: '\xa0', 9: '\xa0'}, 'cv1': {0: '21,209', 1: '2,142', 2: '1,001', 3: '24,352', 4: '14,637', 5: '5,719', 6: '875', 7: '21,231', 8: '1,404', 9: '3,225'}, 'cvs1': {0: '24.15', 1: '2.44', 2: '1.14', 3: '27.73', 4: '16.67', 5: '6.51', 6: '1.00', 7: '24.17', 8: '1.60', 9: '3.67'}, 'vv1': {0: '87,828', 1: '87,828', 2: '87,828', 3: '87,828', 4: '87,828', 5: '87,828', 6: '87,828', 7: '87,828', 8: '87,828', 9: '87,828'}, 'pv1': {0: '24,352', 1: '24,352', 2: '24,352', 3: '24,352', 4: '24,352', 5: '24,352', 6: '24,352', 7: '24,352', 8: '24,352', 9: '24,352'}, 'pvs1': {0: '27.73', 1: '27.73', 2: '27.73', 3: '27.73', 4: '27.73', 5: '27.73', 6: '27.73', 7: '27.73', 8: '27.73', 9: '27.73'}})

# convert from string
aust19['cv1'] = aust19['cv1'].str.replace(",","").astype(int)
aust19['cvs1'] = aust19['cvs1'].str.replace(",","").astype(float)
aust19['pv1'] = aust19['pv1'].str.replace(",","").astype(int)
aust19['pvs1'] = aust19['pvs1'].str.replace(",","").astype(float)

# set cache to zero
cv1_sum = 0.
cvs1_sum = 0.
group = 0
aust19['group_n'] = None

for i in aust19.index:

    # Ignore rows that are Independent
    if aust19.loc[i, "pty_n"] != "Independent":
        
        # For group totals, write the cache and reset to zero
        if  aust19.loc[i, "can"] == "Group Total":
            aust19.loc[i, "pv1"] = cv1_sum 
            aust19.loc[i, "pvs1"] = cvs1_sum 
            aust19.loc[i, "group_n"] = group
            cv1_sum = 0
            cv1s_sum = 0
            group += 1 # increment group
        
        
        # For non group totals, add the current row to the cache
        else:
            cv1_sum += aust19.loc[i, "cv1"]
            cvs1_sum += aust19.loc[i, "cvs1"]
            aust19.loc[i, "group_n"] = group

## Second pass
aust19['group_n'] = aust19['group_n'].astype(int)
for g in range(group):
    
    aust19.loc[(aust19["group_n"] == g) & (aust19["can"] != "Group Total"), "pv1"] = int(aust19.loc[(aust19["group_n"] == g) & (aust19["can"] == "Group Total"), "pv1"])
    aust19.loc[(aust19["group_n"] == g) & (aust19["can"] != "Group Total"), "pvs1"] = float(aust19.loc[(aust19["group_n"] == g) & (aust19["can"] == "Group Total"), "pvs1"])


# handle independents
aust19.loc[aust19["pty_n"] == "Independent", ["pv1", "pvs1"]] = aust19.loc[aust19["pty_n"] == "Independent", ["cv1", "cvs1"]]

Python Pandas：如果條件為真，則將現有列值放入新列

問題描述

1 個解決方案

解決方案1
0 已采納 2020-11-02 16:16:26

Python Pandas：如果條件為真，則將現有列值放入新列

問題描述

1 個解決方案

解決方案1 0 已采納 2020-11-02 16:16:26

解決方案1
0 已采納 2020-11-02 16:16:26