如何將一些行合並為一行

Question

抱歉，我應該刪除舊問題，然后創建新問題。 我有一個包含兩列的數據框。 df 如下所示：

                     Word   Tag
0                    Asam   O
1               instruksi   O
2                       -   O
3               instruksi   X
4                  bahasa   Y
5               Instruksi   P
6                       -   O
7               instruksi   O
8                  sebuah   Q
9                  satuan   K
10                      -   L
11                 satuan   O
12                   meja   W
13                   Tiap   Q
14                      -   O
15                   tiap   O
16               karakter   P
17                      -   O
18                     ke   O
19                      -   O
20               karakter   O

我想合並含有幾許一些行-一排。 所以輸出應該如下：

                     Word   Tag
0                    Asam   O
1     instruksi-instruksi   O
2                  bahasa   Y
3     Instruksi-instruksi   P
4                  sebuah   Q
5           satuan-satuan   K
6                    meja   W
7               Tiap-tiap   Q
8    karakter-ke-karakter   P

有任何想法嗎？ 提前致謝。 我試圖從雅各k上的答案，它的工作原理，然后我在數據集中發現，有超過一個-行之間。 我已經把預期的輸出，比如索引號 8

Jacob K 的解決方案：

# Import packages
import pandas as pd
import numpy as np

# Get 'Word' and 'Tag' columns as numpy arrays (for easy indexing)
words = df.Word.to_numpy()
tags = df.Tag.to_numpy()

# Create empty lists for new colums in output dataframe
newWords = []
newTags = []

# Use while (rather than for loop) since index i can change dynamically
i = 0                             # To not cause any issues with i-1 index
while (i < words.shape[0] - 1):
    if (words[i] == "-"):
        # Concatenate the strings above and below the "-"
        newWords.append(words[i-1] + "-" + words[i+1])
        newTags.append(tags[i-1])
        i += 2                         # Don't repeat any concatenated values
    else:
        if (words[i+1] != "-"):
            # If there is no "-" next, append the regular word and tag values
            newWords.append(words[i])
            newTags.append(tags[i])
        i += 1                         # Increment normally
        
# Create output dataframe output_df        
d2 = {'Word': newWords, 'Tag': newTags}
output_df = pd.DataFrame(data=d2)

Answer 1

我對GroupBy.agg處理方法：

#df['Word'] = df['Word'].str.replace(' ', '') #if necessary
blocks = df['Word'].shift().ne('-').mul(df['Word'].ne('-')).cumsum()
new_df = df.groupby(blocks, as_index=False).agg({'Word' : ''.join, 'Tag' : 'first'})
print(new_df)

輸出

                   Word Tag
0                  Asam   O
1   instruksi-instruksi   O
2                bahasa   Y
3   Instruksi-instruksi   P
4                sebuah   Q
5         satuan-satuan   K
6                  meja   W
7             Tiap-tiap   Q
8  karakter-ke-karakter   P

塊（細節）

print(blocks)
0     1
1     2
2     2
3     2
4     3
5     4
6     4
7     4
8     5
9     6
10    6
11    6
12    7
13    8
14    8
15    8
16    9
17    9
18    9
19    9
20    9
Name: Word, dtype: int64

Answer 2

這是一個循環版本：

import pandas as pd
# import data
DF = pd.read_csv("table.csv")
# creates a new DF
newDF = pd.DataFrame()
# iterate through rows
for i in range(len(DF)-1):
    # prepare prev row index (?dealing with private instance of first row)
    prev = i-1
    if (prev < 0):
        prev = 0
    # copy column if the row is not '-' and the next row is not '-'
    if (DF.loc[i+1, 'Word'] != '-'):
        if (DF.loc[i, 'Word'] != '-' and DF.loc[prev, 'Word'] != '-'):
            newDF = newDF.append(DF.loc[i, :])
    # units the three rows if the middle one is '-'
    else:
        row = {'Tag': [DF.loc[i, 'Tag']], 'Word': [DF.loc[i, 'Word']+DF.loc[i+1, 'Word']+DF.loc[i+2, 'Word']]} 
        newDF = newDF.append(pd.DataFrame(row))

如何將一些行合並為一行

問題描述

2 個解決方案

解決方案1
1 2020-09-04 07:16:27

解決方案2
0 2020-09-04 07:33:34

如何將一些行合並為一行

問題描述

2 個解決方案

解決方案1 1 2020-09-04 07:16:27

解決方案2 0 2020-09-04 07:33:34

解決方案1
1 2020-09-04 07:16:27

解決方案2
0 2020-09-04 07:33:34