[英]Pandas- Add a new row based on a specific condition
I have a specific dataframe which looks like this:我有一个特定的 dataframe,它看起来像这样:
owner![]() |
name![]() |
col_name ![]() |
test_col1 ![]() |
test_col2 ![]() |
---|---|---|---|---|
svc![]() |
dmn_dmn ![]() |
A![]() |
1 ![]() |
"String1" ![]() |
svc![]() |
dmn_dmn ![]() |
B![]() |
2 ![]() |
"String12" ![]() |
svc![]() |
dmn_dmn ![]() |
C ![]() |
remain_constant_3 ![]() |
"String13" ![]() |
svc![]() |
dmn_dmn ![]() |
D![]() |
remain_constant_4 ![]() |
"String14" ![]() |
svc![]() |
time1![]() |
E![]() |
5 ![]() |
"String1123" ![]() |
svc![]() |
time1![]() |
F ![]() |
6 ![]() |
"String123223" ![]() |
svc![]() |
sap![]() |
J![]() |
1 ![]() |
"String11" ![]() |
svc![]() |
sap![]() |
K![]() |
2 ![]() |
"String12" ![]() |
svc![]() |
sap![]() |
D![]() |
4 ![]() |
"String14" ![]() |
If the values "C" and "D" are not present in the column col_name then add "C" and "D" to its col_name.如果列 col_name 中不存在值“C”和“D”,则将“C”和“D”添加到其 col_name。 The final dataframe should look like this:
最终的 dataframe 应该是这样的:
owner![]() |
name![]() |
col_name ![]() |
test_col1 ![]() |
test_col2 ![]() |
---|---|---|---|---|
svc![]() |
dmn_dmn ![]() |
A![]() |
1 ![]() |
"String1" ![]() |
svc![]() |
dmn_dmn ![]() |
B![]() |
2 ![]() |
"String12" ![]() |
svc![]() |
dmn_dmn ![]() |
C ![]() |
remain_constant_3 ![]() |
"String13" ![]() |
svc![]() |
dmn_dmn ![]() |
D![]() |
remain_constant_4 ![]() |
"String14" ![]() |
svc![]() |
time1![]() |
E![]() |
5 ![]() |
"String1123" ![]() |
svc![]() |
time1![]() |
F ![]() |
6 ![]() |
"String123223" ![]() |
svc![]() |
time1![]() |
C ![]() |
remain_constant_3 ![]() |
"String13" ![]() |
svc![]() |
time1![]() |
D![]() |
remain_constant_4 ![]() |
"String14" ![]() |
svc![]() |
sap![]() |
J![]() |
1 ![]() |
"String11" ![]() |
svc![]() |
sap![]() |
K![]() |
2 ![]() |
"String12" ![]() |
svc![]() |
sap![]() |
C ![]() |
remain_constant_3 ![]() |
"String13" ![]() |
svc![]() |
sap![]() |
D![]() |
remain_constant_4 ![]() |
"String14" ![]() |
Edited: Please also note that there could be more columns in this dataframe. I didnt add the other columns as i thought it wouldnt matter with the code but then i saw there was some confusion已编辑:另请注意,此 dataframe 中可能会有更多列。我没有添加其他列,因为我认为这与代码无关,但后来我发现有些混乱
You could use groupby to check if 'C' and 'D' are in the 'col_name' column and add them if not.您可以使用 groupby 检查“C”和“D”是否在“col_name”列中,如果不在则添加它们。
df = pd.DataFrame([{'owner':'svc','name':'dmn_dmn','col_name':'A','test_col1':1,'test_col2':'String1'},{'owner':'svc','name':'dmn_dmn','col_name':'B','test_col1':2,'test_col2':'String12'},{'owner':'svc','name':'dmn_dmn','col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},{'owner':'svc','name':'dmn_dmn','col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},{'owner':'svc','name':'time1','col_name':'E','test_col1':5,'test_col2':'String1123'}])
for g,g_hold in df.groupby('name'):
if 'C' not in g_hold['col_name'].tolist():
df = df.append({'owner':'svc','name':g,'col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},ignore_index=True)
if 'D' not in g_hold['col_name'].tolist():
df = df.append({'owner':'svc','name':g,'col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},ignore_index=True)
print(df.sort_values(['name','col_name']))
The code would end up looking something like this.代码最终看起来像这样。
here are a straight forward approach这是一个直接的方法
import pandas as pd
import numpy as np
df = pd.DataFrame({"owner": ["svc"] * 9,
"name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
"col_name": ["A", "B", "C", "D", "E", "F", "J", "K", "D"],
"test_col1": ["1", "2", "remain_constant_3", "remain_constant_4", 5, 6, 1, 2, 4],
"test_col2": ["String1", "String12", "String13", "String14", "String1123", "String123223",
"String11", "String12", "String14"]})
list_of_element = ["C", "D"]
for owner in df.owner.unique():
for name in df.name.unique():
filtred = df[(df.owner == owner) & (df.name == name)]
differance = np.setdiff1d(list_of_element, filtred.col_name)
for diff in differance:
if diff == 'C':
df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_3 ',
'test_col2': "String13"}
if diff == 'D':
df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_4',
'test_col2': "String14"}
df = df.append(df2, ignore_index=True)
print(df)
A better way is to use更好的方法是使用
import pandas as pd
df = pd.DataFrame({"owner": ["owner"] * 9,
"name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
"col_name": ["A", "B", "C", "D", "A", "B", "A", "B", "D"]})
index = pd.MultiIndex.from_product([df.owner.unique(), df.name.unique(), df.col_name.unique()])
result = df.set_index(['owner', 'name', "col_name"]).reindex(index).reset_index()
print(result)
You could use the columns with which you want to perform the combination as index and craft a custom index to reindex.您可以使用要执行组合的列作为索引,并制作自定义索引以重新编制索引。 Then
groupby
and ffill
/ bfill
.然后
groupby
和ffill
/ bfill
。
df2 = df.set_index(['owner', 'name', 'col_name'])
idx = pd.MultiIndex.from_product([df['owner'].unique(),
df['name'].unique(),
['C', 'D'],
], names=['owner', 'name', 'col_name'])
(df2.reindex(df2.index.union(idx))
.groupby(level='col_name').ffill()
.groupby(level='col_name').bfill()
.reset_index()
)
output: output:
owner name col_name test_col1 test_col2
0 svc dmn_dmn A 1 "String1"
1 svc dmn_dmn B 2 "String12"
2 svc dmn_dmn C remain_constant_3 "String13"
3 svc dmn_dmn D remain_constant_4 "String14"
4 svc sap C remain_constant_3 "String13"
5 svc sap D 4 "String14"
6 svc sap J 1 "String11"
7 svc sap K 2 "String12"
8 svc time1 C remain_constant_3 "String13"
9 svc time1 D 4 "String14"
10 svc time1 E 5 "String1123"
11 svc time1 F 6 "String123223"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.