![](/img/trans.png)
[英]Python Pandas: Create new rows in dataFrame based on two columns
[英]Merge rows dataframe based on two columns - Python
我有一個數據框 df ,其中包含以下信息:
| Project | class | x |y | z |
| --- | --- | --- | --- | --- |
| Project_A | c.java |a1 | a2 | |
| Project_A | c.java | | | a3 |
| Project_b | t.java |b1 |b2 | |
如果 project 和 Class 的值相等,我需要合並這些行。
基於前面的例子,本例中的方面 output 為:
| Project | class | x |y | z |
| --- | --- | --- | --- | --- |
| Project_A | c.java |a1 |a2 | a3 |
| Project_b | t.java |b1 |b2 | |
同樣重要的是要注意數據集的構建方式,不存在重寫值的風險; 所以,換句話說,你永遠不會有這樣的情況:
| Project | class | x |y | z |
| --- | --- | --- | ---| --- |
| Project_A | c.java |a1 | a2 | |
| Project_A | c.java |a_x | a_y|a_z |
| Project_b | t.java |b1 |b2 | |
怎么辦?
這將按項目和 class 分組,然后找到分組的第一個值。 我不確定你的數據是否允許這樣的事情,因為如果項目/類組合的列中有另一個數據示例,它可能會弄亂你的一些數據
df.groupby(['Project', 'class'], as_index=False).agg('first')
您可以在兩列上使用groupby()
並對其他列求和:
X = pd.DataFrame({
'Project':["Project_A","Project_A",'Project_b'],
'class':["c.java","c.java","t.java"],
'x':["a1",None,"b1"],
'y':["a2",None,"b2"],
'z':[None,"a3",None]})
Y= X.groupby(['Project','class']).sum()
print(Y)
Output:
x y z
Project class
Project_A c.java a1 a2 a3
Project_b t.java b1 b2 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.