Python pandas，嵌套循环，根据另一行中的值从行中创建不同的列表

Question

我有一个包含 3 行的 Excel 文件。 第一行是原文，第二行是修正后的文本，第三行是每个句子的起点。

它看起来有点像这样（对不起，我不知道如何做到这一点）：

    A        B         C 
1  She                 x 
2  is
3  the 
4  besst     best
5  i         I         x
6  like
7  here      her

B 列中的某些单元格已合并，但我处理了这一点并取消了它们的合并。 并且只有在必须更正某些内容时，列中有一个值，否则为空。 我需要的最终结果是一个文件，其中错误和正确的句子是完整的并且彼此相邻，如下所示：

She is the besst.    She is the best.
i like here.         I like her.

我试图在彼此内嵌套两个循环，因此如果列中的单元格不为空，它将收集所有值，直到列 c 中的下一个单元格具有值（可以这么说）。 它适用于错误的句子（来自 A 列的值），但我无法让它与 B 列一起使用。

 for i in range(len(df)): print(df.loc[i, "A"], df.loc[i, "B"]) if i in value_in_columnB: print(df.loc[i, "B"]) o = df.loc[i, "B"] correctsentence.append(o) else : print(df.loc[i, "A"]) m = df.loc[i, "A"] correctsentence.append(m) print(correctsentence) correctsentence = [y for y in correctsentence if str(y) != 'nan'] print(correctsentence)

上面的代码以这种方式工作，我可以在一个长列表中获得所有正确的句子（A 列和 B 列的混合），但不会拆分成单个句子。 同样是我可以做对，如果它只是第一列，我只是遍历行，只要在整数列表中有一个匹配的值显示 C 列中是否有 x，这意味着有一个新句子的开始。

但不知何故，我不能把两者放在一起。 我只需要将这两者结合起来。 我可以尝试什么？ 我已经尝试过 for 和 while 循环，但似乎没有任何帮助。

Answer 1

因此，当您输入数据时，如下所示：

a = 'She is the besst i like here'
b = ['', '', '', 'best', 'I', '', 'her']
c = ['x', '', '' , '', 'x', '', '']

df = pd.DataFrame({'A':a.split(), 'B':b, 'C': c})
print(df)

       A     B  C
0    She        x
1     is         
2    the         
3  besst  best   
4      i     I  x
5   like         
6   here   her

然后这个脚本：

df.loc[df['B'] == '', 'B'] = df[df['B'] == '']['A']
df.loc[df['C'] == 'x', 'C'] = 1
df['C'] = pd.to_numeric(df['C']).cumsum().ffill()

data = df.groupby('C')['A', 'B'].agg(list).to_dict('list')

with open('file.txt', 'w') as f_out:
    for incorrect, correct in zip(*data.values()):
        print('{}. {}.'.format(' '.join(incorrect), ' '.join(correct)), file=f_out)

将创建包含以下内容的file.txt ：

She is the besst. She is the best.
i like here. I like her.

编辑：具有NaN值的版本：

a = 'She is the besst i like here'
b = [np.nan, np.nan, np.nan, 'best', 'I', np.nan, 'her']
c = ['x', np.nan, np.nan , np.nan, 'x', np.nan, np.nan]

df = pd.DataFrame({'A':a.split(), 'B':b, 'C': c})

df.loc[df['B'].isna(), 'B'] = df[df['B'].isna()]['A']
df.loc[df['C'] == 'x', 'C'] = 1
df['C'] = pd.to_numeric(df['C']).cumsum().ffill()

data = df.groupby('C')['A', 'B'].agg(list).to_dict('list')

with open('file.txt', 'w') as f_out:
    for incorrect, correct in zip(*data.values()):
        print('{}. {}.'.format(' '.join(map(str, incorrect)), ' '.join(map(str, correct))), file=f_out)

Python pandas，嵌套循环，根据另一行中的值从行中创建不同的列表

问题描述

1 个解决方案

解决方案1
0 2020-01-19 14:34:01

Python pandas，嵌套循环，根据另一行中的值从行中创建不同的列表

问题描述

1 个解决方案

解决方案1 0 2020-01-19 14:34:01

解决方案1
0 2020-01-19 14:34:01