简体   繁体   English

Python Pandas Dataframe,在跳过重叠行时合并df行

[英]Python Pandas Dataframe, merge df rows while skipping overlapping rows

The title doesn't make a clear sense, but it was the best I could describe. 标题没有明确的意义,但这是我能描述的最好的。

I have DataFrames that look like these: 我有这样的DataFrames:

data1 = np.matrix([[4,75,2,5,84,2,6,5,554],[4,6,67,6,4,5,8,5,8]]).T
data2 = np.matrix([[3,46,4,555,556,557,558,559,560],[1,2,4,1,3,5,3,1,5]]).T

data1 = pd.DataFrame(data1)
data2 = pd.DataFrame(data2)

>>> data1
     0   1
0    4   4
1   75   6
2    2  67
3    5   6
4   84   4
5    2   5
6    6   8
7    5   5
8  554   8


>>> data2
     0  1
0    3  1
1   46  2
2    4  4
3  555  1
4  556  3
5  557  5
6  558  3
7  559  1
8  560  5

I want to append data2 to the bottom of data1 . 我想将data2追加到data1的底部。 However, I want to append rows of data2 whose column1 values are greater than or equal to the 554 , which is the last row of column1 of data1 但是,我想追加其行1的值大于或等于554data2行,这是data1的column1的最后一行

Here is the output I want: 这是我想要的输出:

>>> merged_df
         0   1
    0    4   4
    1   75   6
    2    2  67
    3    5   6
    4   84   4
    5    2   5
    6    6   8
    7    5   5
    8  554   8
    9  555   1
   10  556   3
   11  557   5
   12  558   3
   13  559   1
   14  560   5

So, some of the first rows of data2 will be skipped when appending to data1 . 因此,在追加到data1时,将跳过某些第一行data2

It is assumed that the last row of data1 is its maximum, and the rows of data2 are sorted after the last value of data1 , which is 554 假设data1的最后一行是其最大值, data2的行在data1的最后一个值之后排序,即554

Is there any elegant way to do this job using Pandas toolbox? 使用Pandas工具箱有没有优雅的方法来完成这项工作?

Use concat with filter second DataFrame by boolean indexing : 通过boolean indexing使用concat和filter第二个DataFrame

print (data1.iloc[-1, 0])
554

df = pd.concat([data1, data2[data2[0] > data1.iloc[-1, 0]]], ignore_index=True)
print (df)
      0   1
0     4   4
1    75   6
2     2  67
3     5   6
4    84   4
5     2   5
6     6   8
7     5   5
8   554   8
9   555   1
10  556   3
11  557   5
12  558   3
13  559   1
14  560   5

Also for general solution compare by max value: 也用于通用解决方案比较max

df = pd.concat([data1, data2[data2[0] > data1[0].max()]], ignore_index=True)

Solution if custom columns names: 自定义列名称的解决方案:

data1 = pd.DataFrame(data1, columns=list('ab'))
data2 = pd.DataFrame(data2, columns=list('ab'))

print (data1.iloc[-1, data1.columns.get_loc('a')])
554

data22 = data2[data2['a'] > data1.iloc[-1, data1.columns.get_loc('a')]]
df = pd.concat([data1, data22], ignore_index=True)
print (df)
      a   b
0     4   4
1    75   6
2     2  67
3     5   6
4    84   4
5     2   5
6     6   8
7     5   5
8   554   8
9   555   1
10  556   3
11  557   5
12  558   3
13  559   1
14  560   5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM