[英]Count occurrences of a character in a column of dataframe in pandas
I have a dataframe with the following structure 我有一个具有以下结构的数据框
Debtor_ID | Loan_ID | Pattern_of_payments
Uncle Sam Loan1 11111AAA11555
Uncle Sam Loan2 11222A339999
Uncle Joe Loan3 1111111111111
Uncle Joe Loan4 111222222233333
Aunt Annie Loan5 1
Aunt Chloe Loan6 555555555
Each character in the column "Pattern_of_payments" marks on-time payment (like 1, for instance) or delay(like all the rest). “ Pattern_of_payments”列中的每个字符都标记了按时付款(例如1)或延迟(所有其余部分)。 What I want to do is count the number of occurrence of each character in each row of "Pattern_of_payments" column and assign that number to a respective column in dataframe like this:
我想做的是计算“ Pattern_of_payments”列中每一行中每个字符的出现次数,并将该数字分配给数据框中的相应列,如下所示:
Debtor_ID | Loan_ID | On_time_payment | 1_29_days_delay | 30_59_days_delay | 60_89_days_delay | 90_119_days_delay | Over_120_days_delay | Bailiff_prosecution
Uncle Sam Loan1 7 3 0 0 0 3 0
Uncle Sam Loan2 2 1 3 2 0 3 4
Uncle Joe Loan3 13 0 0 0 0 0 0
Uncle Joe Loan4 3 0 7 4 0 0 0
Aunt Annie Loan5 1 0 0 0 0 0 0
Aunt Chloe Loan6 0 0 0 0 0 9 0
My code accomplishes the task in this manner: 我的代码以这种方式完成任务:
list_of_counts_of_1 = []
list_of_counts_of_A = []
list_of_counts_of_2 = []
list_of_counts_of_3 = []
list_of_counts_of_4 = []
list_of_counts_of_5 = []
list_of_counts_of_8 = []
list_of_counts_of_9 = []
for value in df_account.Pattern_of_payments.values:
iter_string = str(value)
count1 = iter_string.count("1")
countA = iter_string.count("A")
count2 = iter_string.count("2")
count3 = iter_string.count("3")
count4 = iter_string.count("4")
count5 = iter_string.count("5")
count8 = iter_string.count("8")
count9 = iter_string.count("9")
list_of_counts_of_1.append(count1)
list_of_counts_of_A.append(countA)
list_of_counts_of_2.append(count2)
list_of_counts_of_3.append(count3)
list_of_counts_of_4.append(count4)
list_of_counts_of_5.append(count5)
list_of_counts_of_9.append(count9)
df_account["On_time_payment"] = list_of_counts_of_1
df_account["1_29_days_delay"] = list_of_counts_of_A
df_account["30_59_days_delay"] = list_of_counts_of_2
df_account["60_89_days_delay"] = list_of_counts_of_3
df_account["90_119_days_delay"] = list_of_counts_of_4
df_account["Over_120_days_delay"] = list_of_counts_of_5
df_account["Bailiff_prosecution"] = list_of_counts_of_9
I realize that my code isn't "pythonic" at all. 我意识到我的代码根本不是“ pythonic”的。 There has to be a way to express this in a way more succinct manner (maybe even some fancy one-liner).
必须有一种以更简洁的方式表达这一点的方法(甚至可能是一些花哨的单线)。 Please advise how would the best practice for coding look like?
请告知最佳编码实践如何?
First step is create DataFrame
by Counter
in list comprehension, then use reindex
for add missing categories and change order of columns, rename
columns by dict
and add to original DataFrame
by join
: 第一步是在列表理解中通过
Counter
创建DataFrame
,然后使用reindex
添加缺少的类别和更改列的顺序,通过dict
rename
列,并通过join
添加到原始DataFrame
:
from collections import Counter
df1 = pd.DataFrame([Counter(list(x)) for x in df['Pattern_of_payments']], index=df.index)
order = list('1A23459')
d = {'1': "On_time_payment",
'A': "1_29_days_delay",
'2':"30_59_days_delay",
'3':"60_89_days_delay",
'4':"90_119_days_delay",
'5':"Over_120_days_delay",
'9':"Bailiff_prosecution"}
df2 = df1.fillna(0).astype(int).reindex(columns=order, fill_value=0).rename(columns=d)
df = df.join(df2)
print (df)
Debtor_ID Loan_ID Pattern_of_payments On_time_payment 1_29_days_delay \
0 Uncle Sam Loan1 11111AAA11555 7 3
1 Uncle Sam Loan2 11222A339999 2 1
2 Uncle Joe Loan3 1111111111111 13 0
3 Uncle Joe Loan4 111222222233333 3 0
4 Aunt Annie Loan5 1 1 0
5 Aunt Chloe Loan6 555555555 0 0
30_59_days_delay 60_89_days_delay 90_119_days_delay Over_120_days_delay \
0 0 0 0 3
1 3 2 0 0
2 0 0 0 0
3 7 5 0 0
4 0 0 0 0
5 0 0 0 9
Bailiff_prosecution
0 0
1 4
2 0
3 0
4 0
5 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.