[英]Python Pandas Dataframe, merge df rows while skipping overlapping rows
The title doesn't make a clear sense, but it was the best I could describe. 标题没有明确的意义,但这是我能描述的最好的。
I have DataFrames that look like these: 我有这样的DataFrames:
data1 = np.matrix([[4,75,2,5,84,2,6,5,554],[4,6,67,6,4,5,8,5,8]]).T
data2 = np.matrix([[3,46,4,555,556,557,558,559,560],[1,2,4,1,3,5,3,1,5]]).T
data1 = pd.DataFrame(data1)
data2 = pd.DataFrame(data2)
>>> data1
0 1
0 4 4
1 75 6
2 2 67
3 5 6
4 84 4
5 2 5
6 6 8
7 5 5
8 554 8
>>> data2
0 1
0 3 1
1 46 2
2 4 4
3 555 1
4 556 3
5 557 5
6 558 3
7 559 1
8 560 5
I want to append data2
to the bottom of data1
. 我想将
data2
追加到data1
的底部。 However, I want to append rows of data2
whose column1 values are greater than or equal to the 554
, which is the last row of column1 of data1
但是,我想追加其行1的值大于或等于
554
的data2
行,这是data1
的column1的最后一行
Here is the output I want: 这是我想要的输出:
>>> merged_df
0 1
0 4 4
1 75 6
2 2 67
3 5 6
4 84 4
5 2 5
6 6 8
7 5 5
8 554 8
9 555 1
10 556 3
11 557 5
12 558 3
13 559 1
14 560 5
So, some of the first rows of data2
will be skipped when appending to data1
. 因此,在追加到
data1
时,将跳过某些第一行data2
。
It is assumed that the last row of data1
is its maximum, and the rows of data2
are sorted after the last value of data1
, which is 554
假设
data1
的最后一行是其最大值, data2
的行在data1
的最后一个值之后排序,即554
Is there any elegant way to do this job using Pandas toolbox? 使用Pandas工具箱有没有优雅的方法来完成这项工作?
Use concat
with filter second DataFrame
by boolean indexing
: 通过
boolean indexing
使用concat
和filter第二个DataFrame
:
print (data1.iloc[-1, 0])
554
df = pd.concat([data1, data2[data2[0] > data1.iloc[-1, 0]]], ignore_index=True)
print (df)
0 1
0 4 4
1 75 6
2 2 67
3 5 6
4 84 4
5 2 5
6 6 8
7 5 5
8 554 8
9 555 1
10 556 3
11 557 5
12 558 3
13 559 1
14 560 5
Also for general solution compare by max
value: 也用于通用解决方案比较
max
:
df = pd.concat([data1, data2[data2[0] > data1[0].max()]], ignore_index=True)
Solution if custom columns names: 自定义列名称的解决方案:
data1 = pd.DataFrame(data1, columns=list('ab'))
data2 = pd.DataFrame(data2, columns=list('ab'))
print (data1.iloc[-1, data1.columns.get_loc('a')])
554
data22 = data2[data2['a'] > data1.iloc[-1, data1.columns.get_loc('a')]]
df = pd.concat([data1, data22], ignore_index=True)
print (df)
a b
0 4 4
1 75 6
2 2 67
3 5 6
4 84 4
5 2 5
6 6 8
7 5 5
8 554 8
9 555 1
10 556 3
11 557 5
12 558 3
13 559 1
14 560 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.