Python Pandas Dataframe，在跳过重叠行时合并df行

Question

The title doesn't make a clear sense, but it was the best I could describe. 标题没有明确的意义，但这是我能描述的最好的。

I have DataFrames that look like these: 我有这样的DataFrames：

data1 = np.matrix([[4,75,2,5,84,2,6,5,554],[4,6,67,6,4,5,8,5,8]]).T
data2 = np.matrix([[3,46,4,555,556,557,558,559,560],[1,2,4,1,3,5,3,1,5]]).T

data1 = pd.DataFrame(data1)
data2 = pd.DataFrame(data2)

>>> data1
     0   1
0    4   4
1   75   6
2    2  67
3    5   6
4   84   4
5    2   5
6    6   8
7    5   5
8  554   8


>>> data2
     0  1
0    3  1
1   46  2
2    4  4
3  555  1
4  556  3
5  557  5
6  558  3
7  559  1
8  560  5

I want to append data2 to the bottom of data1 . 我想将data2追加到data1的底部。 However, I want to append rows of data2 whose column1 values are greater than or equal to the 554 , which is the last row of column1 of data1 但是，我想追加其行1的值大于或等于554的data2行，这是data1的column1的最后一行

Here is the output I want: 这是我想要的输出：

>>> merged_df
         0   1
    0    4   4
    1   75   6
    2    2  67
    3    5   6
    4   84   4
    5    2   5
    6    6   8
    7    5   5
    8  554   8
    9  555   1
   10  556   3
   11  557   5
   12  558   3
   13  559   1
   14  560   5

So, some of the first rows of data2 will be skipped when appending to data1 . 因此，在追加到data1时，将跳过某些第一行data2 。

It is assumed that the last row of data1 is its maximum, and the rows of data2 are sorted after the last value of data1 , which is 554 假设data1的最后一行是其最大值， data2的行在data1的最后一个值之后排序，即554

Is there any elegant way to do this job using Pandas toolbox? 使用Pandas工具箱有没有优雅的方法来完成这项工作？

Answer 1

Use concat with filter second DataFrame by boolean indexing : 通过boolean indexing使用concat和filter第二个DataFrame ：

print (data1.iloc[-1, 0])
554

df = pd.concat([data1, data2[data2[0] > data1.iloc[-1, 0]]], ignore_index=True)
print (df)
      0   1
0     4   4
1    75   6
2     2  67
3     5   6
4    84   4
5     2   5
6     6   8
7     5   5
8   554   8
9   555   1
10  556   3
11  557   5
12  558   3
13  559   1
14  560   5

Also for general solution compare by max value: 也用于通用解决方案比较max ：

df = pd.concat([data1, data2[data2[0] > data1[0].max()]], ignore_index=True)

Solution if custom columns names: 自定义列名称的解决方案：

data1 = pd.DataFrame(data1, columns=list('ab'))
data2 = pd.DataFrame(data2, columns=list('ab'))

print (data1.iloc[-1, data1.columns.get_loc('a')])
554

data22 = data2[data2['a'] > data1.iloc[-1, data1.columns.get_loc('a')]]
df = pd.concat([data1, data22], ignore_index=True)
print (df)
      a   b
0     4   4
1    75   6
2     2  67
3     5   6
4    84   4
5     2   5
6     6   8
7     5   5
8   554   8
9   555   1
10  556   3
11  557   5
12  558   3
13  559   1
14  560   5

Python Pandas Dataframe，在跳过重叠行时合并df行

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-04-07 09:17:28

Python Pandas Dataframe，在跳过重叠行时合并df行

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-04-07 09:17:28

解决方案1
3 已采纳 2018-04-07 09:17:28