如何根據每行中的條件將多個字符串添加到 pandas dataframe 中的列中？

Question

我有一個包含客戶費用和合同費用的 dataframe。 我想比較每個客戶的相應費用並標記每個客戶不匹配的費用。 這是df的樣子：

居民	代碼	搬入	1張光盤	1xdisc_doc	租	租用文檔
馬庫斯	t0011009	2021 年 3 月 16 日	0.0	-500.0	0	1632
約書亞	t0011124	2021 年 3 月 20 日	0.0	0.0	1642	1642
伊馮娜	t0010940	2021 年 3 月 17 日	-500.0	-500.0	1655	1655
米拉波	t0011005	2021 年 3 月 19 日	-500.0	-500.0	1931年	1990
科約納	t0011084	2021 年 3 月 18 日	0.0	0.0	1600	1600
愛麗兒	t0010954	2021 年 3 月 22 日	-300.0	0.0	1300	1320

我想添加一列，其中包含每行的所有問題作為字符串。 這是我想要的 output，“問題”列包含每行的所有問題：

居民	代碼	搬入	1張光盤	1xdisc_doc	租	租用文檔	問題
馬庫斯	t0011009	2021 年 3 月 16 日	0.0	-500.0	0	1632	租金不匹配。 1xdisc 不匹配
約書亞	t0011124	2021 年 3 月 20 日	0.0	0.0	1642	1642
伊馮娜	t0010940	2021 年 3 月 17 日	-500.0	-500.0	1655	1655
米拉波	t0011005	2021 年 3 月 19 日	-500.0	-500.0	1931年	1990	租金不匹配。
科約納	t0011084	2021 年 3 月 18 日	0.0	0.0	1600	1600
愛麗兒	t0010954	2021 年 3 月 22 日	-300.0	0.0	1300	1320	租金不匹配。 1xdisc 不匹配

到目前為止，我正在嘗試

nonmatch["Problem"] = np.where(nonmatch['rent'] != nonmatch['rent_doc'],  "rent doesn't match", nonmatch["Problem"] + "")
nonmatch["Problem"] = np.where(nonmatch['1xdisc']!=nonmatch['1xdisc_doc']), " 1xdisc doesn't match.", "")
print(nonmatch[['Resident','Problem']])

但隨后單元格中已經存在的任何錯誤都會被覆蓋。 如果滿足條件，如何將字符串添加到單元格的內容中？

我也有一種預感，必須有一種更清潔的方法來apply lambda ，但我不確定如何。 我要檢查大約十個條件，但這是一個最小的示例。

Answer 1

您也可以嘗試使用 concat 和 groupby+agg。 正如 piR 所說，這可能是過度設計的：

c1 = df['rent'].ne(df['rent_doc'])
c2 = df['1xdisc'].ne(df['1xdisc_doc'])
choices= ["rent doesn't match"," 1xdisc doesn't match."]

s = pd.concat((c1,c2),keys=choices).swaplevel()
out = (df.assign(Problem=
      pd.DataFrame.from_records(s[s].index).groupby(0)[1].agg(" ".join)))

print(out)

   Resident     Tcode     MoveIn  1xdisc  1xdisc_doc  conpark  rent  rent_doc  \
0    Marcus  t0011009  3/16/2021     0.0      -500.0      0.0     0      1632   
1    Joshua  t0011124  3/20/2021     0.0         0.0      0.0  1642      1642   
2    Yvonne  t0010940  3/17/2021  -500.0      -500.0      0.0  1655      1655   
3  Mirabeau  t0011005  3/19/2021  -500.0      -500.0      0.0  1931      1990   
4   Keyonna  t0011084  3/18/2021     0.0         0.0      0.0  1600      1600   
5     Ariel  t0010954  3/22/2021  -300.0         0.0      0.0  1300      1320   

                                     Problem  
0  rent doesn't match  1xdisc doesn't match.  
1                                        NaN  
2                                        NaN  
3                         rent doesn't match  
4                                        NaN  
5  rent doesn't match  1xdisc doesn't match.

Answer 2

我對此的看法：

def get_match(c):
    def match(x):
        return f'{c} doesn\'t match.' if x else ''
    return match

onex = (df['1xdisc'] != df['1xdisc_doc']).map(get_match('1xdisc'))
rent = (df['rent']   != df['rent_doc']  ).map(get_match('rent'))

df.assign(Problem=(['  '.join(filter(bool, tup)) for tup in zip(rent, onex)]))

   Resident     Tcode     MoveIn  1xdisc  1xdisc_doc  conpark  rent  rent_doc                                     Problem
0    Marcus  t0011009  3/16/2021     0.0      -500.0      0.0     0      1632  rent doesn't match.  1xdisc doesn't match.
1    Joshua  t0011124  3/20/2021     0.0         0.0      0.0  1642      1642                                            
2    Yvonne  t0010940  3/17/2021  -500.0      -500.0      0.0  1655      1655                                            
3  Mirabeau  t0011005  3/19/2021  -500.0      -500.0      0.0  1931      1990                         rent doesn't match.
4   Keyonna  t0011084  3/18/2021     0.0         0.0      0.0  1600      1600                                            
5     Ariel  t0010954  3/22/2021  -300.0         0.0      0.0  1300      1320  rent doesn't match.  1xdisc doesn't match.

廣義的

docs = [s for s in [*df] if s.endswith('_doc')]
refs = [s.rsplit('_', 1)[0] for s in docs]

def col_match(c):
    return [f"{c.name} doesn't match" if x else "" for x in c]

problem_df = (df[refs] != df[docs].to_numpy()).apply(col_match)
problem = ['  '.join(filter(bool, tup)) for tup in zip(*map(problem_df.get, refs))]
df.assign(Problem=problem)

如何根據每行中的條件將多個字符串添加到 pandas dataframe 中的列中？

問題描述

2 個解決方案

解決方案1
2 2021-03-24 18:59:31

解決方案2
1 已采納 2021-03-24 18:54:20

如何根據每行中的條件將多個字符串添加到 pandas dataframe 中的列中？

問題描述

2 個解決方案

解決方案1 2 2021-03-24 18:59:31

解決方案2 1 已采納 2021-03-24 18:54:20

解決方案1
2 2021-03-24 18:59:31

解決方案2
1 已采納 2021-03-24 18:54:20