简体   繁体   English

计算熊猫数据框的列中某个字符的出现

[英]Count occurrences of a character in a column of dataframe in pandas

I have a dataframe with the following structure 我有一个具有以下结构的数据框

Debtor_ID    | Loan_ID    | Pattern_of_payments
Uncle Sam      Loan1        11111AAA11555
Uncle Sam      Loan2        11222A339999
Uncle Joe      Loan3        1111111111111
Uncle Joe      Loan4        111222222233333
Aunt Annie     Loan5        1
Aunt Chloe     Loan6        555555555

Each character in the column "Pattern_of_payments" marks on-time payment (like 1, for instance) or delay(like all the rest). “ Pattern_of_payments”列中的每个字符都标记了按时付款(例如1)或延迟(所有其余部分)。 What I want to do is count the number of occurrence of each character in each row of "Pattern_of_payments" column and assign that number to a respective column in dataframe like this: 我想做的是计算“ Pattern_of_payments”列中每一行中每个字符的出现次数,并将该数字分配给数据框中的相应列,如下所示:

Debtor_ID    | Loan_ID    | On_time_payment    | 1_29_days_delay    | 30_59_days_delay    | 60_89_days_delay    | 90_119_days_delay    | Over_120_days_delay    | Bailiff_prosecution
Uncle Sam      Loan1        7                    3                    0                     0                     0                      3                        0
Uncle Sam      Loan2        2                    1                    3                     2                     0                      3                        4
Uncle Joe      Loan3        13                   0                    0                     0                     0                      0                        0
Uncle Joe      Loan4        3                    0                    7                     4                     0                      0                        0
Aunt Annie     Loan5        1                    0                    0                     0                     0                      0                        0
Aunt Chloe     Loan6        0                    0                    0                     0                     0                      9                        0

My code accomplishes the task in this manner: 我的代码以这种方式完成任务:

list_of_counts_of_1 = []
list_of_counts_of_A = []
list_of_counts_of_2 = []
list_of_counts_of_3 = []
list_of_counts_of_4 = []
list_of_counts_of_5 = []
list_of_counts_of_8 = []
list_of_counts_of_9 = []
for value in df_account.Pattern_of_payments.values:
    iter_string = str(value)
    count1 = iter_string.count("1")
    countA = iter_string.count("A")
    count2 = iter_string.count("2")
    count3 = iter_string.count("3")
    count4 = iter_string.count("4")
    count5 = iter_string.count("5")
    count8 = iter_string.count("8")
    count9 =  iter_string.count("9")
    list_of_counts_of_1.append(count1)
    list_of_counts_of_A.append(countA)
    list_of_counts_of_2.append(count2)
    list_of_counts_of_3.append(count3)
    list_of_counts_of_4.append(count4)
    list_of_counts_of_5.append(count5)
    list_of_counts_of_9.append(count9)
df_account["On_time_payment"] = list_of_counts_of_1
df_account["1_29_days_delay"] = list_of_counts_of_A
df_account["30_59_days_delay"] = list_of_counts_of_2
df_account["60_89_days_delay"] = list_of_counts_of_3
df_account["90_119_days_delay"] = list_of_counts_of_4
df_account["Over_120_days_delay"] = list_of_counts_of_5
df_account["Bailiff_prosecution"] = list_of_counts_of_9

I realize that my code isn't "pythonic" at all. 我意识到我的代码根本不是“ pythonic”的。 There has to be a way to express this in a way more succinct manner (maybe even some fancy one-liner). 必须有一种以更简洁的方式表达这一点的方法(甚至可能是一些花哨的单线)。 Please advise how would the best practice for coding look like? 请告知最佳编码实践如何?

First step is create DataFrame by Counter in list comprehension, then use reindex for add missing categories and change order of columns, rename columns by dict and add to original DataFrame by join : 第一步是在列表理解中通过Counter创建DataFrame ,然后使用reindex添加缺少的类别和更改列的顺序,通过dict rename列,并通过join添加到原始DataFrame

from collections import Counter

df1 = pd.DataFrame([Counter(list(x)) for x in df['Pattern_of_payments']], index=df.index)
order = list('1A23459')

d = {'1': "On_time_payment",
     'A': "1_29_days_delay",
     '2':"30_59_days_delay",
     '3':"60_89_days_delay",
     '4':"90_119_days_delay",
     '5':"Over_120_days_delay",
     '9':"Bailiff_prosecution"}

df2 = df1.fillna(0).astype(int).reindex(columns=order, fill_value=0).rename(columns=d)
df = df.join(df2)

print (df)
    Debtor_ID Loan_ID Pattern_of_payments  On_time_payment  1_29_days_delay  \
0   Uncle Sam   Loan1       11111AAA11555                7                3   
1   Uncle Sam   Loan2        11222A339999                2                1   
2   Uncle Joe   Loan3       1111111111111               13                0   
3   Uncle Joe   Loan4     111222222233333                3                0   
4  Aunt Annie   Loan5                   1                1                0   
5  Aunt Chloe   Loan6           555555555                0                0   

   30_59_days_delay  60_89_days_delay  90_119_days_delay  Over_120_days_delay  \
0                 0                 0                  0                    3   
1                 3                 2                  0                    0   
2                 0                 0                  0                    0   
3                 7                 5                  0                    0   
4                 0                 0                  0                    0   
5                 0                 0                  0                    9   

   Bailiff_prosecution  
0                    0  
1                    4  
2                    0  
3                    0  
4                    0  
5                    0  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM