繁体   English   中英

计算熊猫数据框的列中某个字符的出现

[英]Count occurrences of a character in a column of dataframe in pandas

我有一个具有以下结构的数据框

Debtor_ID    | Loan_ID    | Pattern_of_payments
Uncle Sam      Loan1        11111AAA11555
Uncle Sam      Loan2        11222A339999
Uncle Joe      Loan3        1111111111111
Uncle Joe      Loan4        111222222233333
Aunt Annie     Loan5        1
Aunt Chloe     Loan6        555555555

“ Pattern_of_payments”列中的每个字符都标记了按时付款(例如1)或延迟(所有其余部分)。 我想做的是计算“ Pattern_of_payments”列中每一行中每个字符的出现次数,并将该数字分配给数据框中的相应列,如下所示:

Debtor_ID    | Loan_ID    | On_time_payment    | 1_29_days_delay    | 30_59_days_delay    | 60_89_days_delay    | 90_119_days_delay    | Over_120_days_delay    | Bailiff_prosecution
Uncle Sam      Loan1        7                    3                    0                     0                     0                      3                        0
Uncle Sam      Loan2        2                    1                    3                     2                     0                      3                        4
Uncle Joe      Loan3        13                   0                    0                     0                     0                      0                        0
Uncle Joe      Loan4        3                    0                    7                     4                     0                      0                        0
Aunt Annie     Loan5        1                    0                    0                     0                     0                      0                        0
Aunt Chloe     Loan6        0                    0                    0                     0                     0                      9                        0

我的代码以这种方式完成任务:

list_of_counts_of_1 = []
list_of_counts_of_A = []
list_of_counts_of_2 = []
list_of_counts_of_3 = []
list_of_counts_of_4 = []
list_of_counts_of_5 = []
list_of_counts_of_8 = []
list_of_counts_of_9 = []
for value in df_account.Pattern_of_payments.values:
    iter_string = str(value)
    count1 = iter_string.count("1")
    countA = iter_string.count("A")
    count2 = iter_string.count("2")
    count3 = iter_string.count("3")
    count4 = iter_string.count("4")
    count5 = iter_string.count("5")
    count8 = iter_string.count("8")
    count9 =  iter_string.count("9")
    list_of_counts_of_1.append(count1)
    list_of_counts_of_A.append(countA)
    list_of_counts_of_2.append(count2)
    list_of_counts_of_3.append(count3)
    list_of_counts_of_4.append(count4)
    list_of_counts_of_5.append(count5)
    list_of_counts_of_9.append(count9)
df_account["On_time_payment"] = list_of_counts_of_1
df_account["1_29_days_delay"] = list_of_counts_of_A
df_account["30_59_days_delay"] = list_of_counts_of_2
df_account["60_89_days_delay"] = list_of_counts_of_3
df_account["90_119_days_delay"] = list_of_counts_of_4
df_account["Over_120_days_delay"] = list_of_counts_of_5
df_account["Bailiff_prosecution"] = list_of_counts_of_9

我意识到我的代码根本不是“ pythonic”的。 必须有一种以更简洁的方式表达这一点的方法(甚至可能是一些花哨的单线)。 请告知最佳编码实践如何?

第一步是在列表理解中通过Counter创建DataFrame ,然后使用reindex添加缺少的类别和更改列的顺序,通过dict rename列,并通过join添加到原始DataFrame

from collections import Counter

df1 = pd.DataFrame([Counter(list(x)) for x in df['Pattern_of_payments']], index=df.index)
order = list('1A23459')

d = {'1': "On_time_payment",
     'A': "1_29_days_delay",
     '2':"30_59_days_delay",
     '3':"60_89_days_delay",
     '4':"90_119_days_delay",
     '5':"Over_120_days_delay",
     '9':"Bailiff_prosecution"}

df2 = df1.fillna(0).astype(int).reindex(columns=order, fill_value=0).rename(columns=d)
df = df.join(df2)

print (df)
    Debtor_ID Loan_ID Pattern_of_payments  On_time_payment  1_29_days_delay  \
0   Uncle Sam   Loan1       11111AAA11555                7                3   
1   Uncle Sam   Loan2        11222A339999                2                1   
2   Uncle Joe   Loan3       1111111111111               13                0   
3   Uncle Joe   Loan4     111222222233333                3                0   
4  Aunt Annie   Loan5                   1                1                0   
5  Aunt Chloe   Loan6           555555555                0                0   

   30_59_days_delay  60_89_days_delay  90_119_days_delay  Over_120_days_delay  \
0                 0                 0                  0                    3   
1                 3                 2                  0                    0   
2                 0                 0                  0                    0   
3                 7                 5                  0                    0   
4                 0                 0                  0                    0   
5                 0                 0                  0                    9   

   Bailiff_prosecution  
0                    0  
1                    4  
2                    0  
3                    0  
4                    0  
5                    0  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM