在 python 中组合没有“For Loop & If else”的行数据的替代选项

Question

I have to combine the rows based on the last word in the row, Like我必须根据行中的最后一个单词组合行，比如

Answer:回答：

Should combine like below应该像下面这样组合

I have written the below code & it's working fine as expected, however, it becomes very slow when I have huge data (10K+ rows).我已经编写了下面的代码，它按预期工作，但是，当我有大量数据（10K+ 行）时，它变得非常慢。

#split the string & take the last word #拆分字符串并取最后一个单词

df["last_Word"] = df["Donor"].str.split().str[-1].str.lower()
df["Match_end"] = df["last_Word"].isin(align["KeyWords_end"].str.lower())

Add two new columns in a data frame在数据框中添加两个新列

df["Cleaned"]= ""
df["Mark"]= ""

Align the text based on the last word & mark delete rows as "delete"根据最后一个单词对齐文本并将删除行标记为“删除”

for i in range(len(df)):
    if ((df["Match_end"].iloc[i]== True) and (df["Match_end"].iloc[i+1]== True)):
        df["Mark"].iloc[i+1]= "delete"
        df["Mark"].iloc[i+2]= "delete"
        df["Cleaned"].iloc[i]= df["Donor"].iloc[i] + " " +df["Donor"].iloc[i+1]+ " " +df["Donor"].iloc[i+2]

Delete the mark rows删除标记行

df = df[~df['Mark'].str.contains("delete")]

Update the newly created column更新新创建的列

for i in range(len(df)):
    if len(df["Cleaned"].iloc[i])== 0:
    df["Cleaned"].iloc[i]= df["Donor"].iloc[i]

#Drop the unwanted columns #删除不需要的列

df.drop(["Donor","Mark","last_Word","Match_end"], axis = 1, inplace = True)

#Rename the newly created column #重命名新创建的列

df.rename(columns= {"Cleaned": "Donor"},inplace = True)

Answer 1

Assuming you want to combine the strings ending in "and" or "&", use a regex to identify those strings, then groupby.agg :假设您想组合以“and”或“&”结尾的字符串，请使用正则表达式来识别这些字符串，然后groupby.agg ：

m = ~df['donor'].str.contains(r'(?:\band|&)\s*$').shift(fill_value=False)

df.groupby(m.cumsum(), as_index=False).agg({'donor': ' '.join})

Example output:示例 output：

                      donor
0            ABC, DEF & GHI
1  JKL MNO and  PQR and STU

Used input:使用的输入：

          donor
0    ABC, DEF &
1           GHI
2  JKL MNO and 
3       PQR and
4           STU

在 python 中组合没有“For Loop & If else”的行数据的替代选项

问题描述

Add two new columns in a data frame在数据框中添加两个新列

Align the text based on the last word & mark delete rows as "delete"根据最后一个单词对齐文本并将删除行标记为“删除”

Delete the mark rows删除标记行

Update the newly created column更新新创建的列

1 个解决方案

解决方案1
0 已采纳 2022-08-02 06:04:59

在 python 中组合没有“For Loop &amp; If else”的行数据的替代选项

问题描述

Add two new columns in a data frame在数据框中添加两个新列

Align the text based on the last word & mark delete rows as "delete"根据最后一个单词对齐文本并将删除行标记为“删除”

Delete the mark rows删除标记行

Update the newly created column更新新创建的列

1 个解决方案

解决方案1 0 已采纳 2022-08-02 06:04:59

在 python 中组合没有“For Loop & If else”的行数据的替代选项

解决方案1
0 已采纳 2022-08-02 06:04:59