简体   繁体   English

Pandas-根据特定条件添加新行

[英]Pandas- Add a new row based on a specific condition

I have a specific dataframe which looks like this:我有一个特定的 dataframe,它看起来像这样:

owner所有者 name名称 col_name col_name test_col1 test_col1 test_col2 test_col2
svc服务 dmn_dmn dmn_dmn A一种 1 1个 "String1" “字符串 1”
svc服务 dmn_dmn dmn_dmn B 2 2个 "String12" “字符串 12”
svc服务 dmn_dmn dmn_dmn C C remain_constant_3 remain_constant_3 "String13" “字符串 13”
svc服务 dmn_dmn dmn_dmn D remain_constant_4 remain_constant_4 "String14" “字符串 14”
svc服务 time1时间1 E 5 5个 "String1123" “字符串 1123”
svc服务 time1时间1 F F 6 6个 "String123223" “字符串 123223”
svc服务 sap树液 J 1 1个 "String11" “字符串 11”
svc服务 sap树液 K 2 2个 "String12" “字符串 12”
svc服务 sap树液 D 4 4个 "String14" “字符串 14”

If the values "C" and "D" are not present in the column col_name then add "C" and "D" to its col_name.如果列 col_name 中不存在值“C”和“D”,则将“C”和“D”添加到其 col_name。 The final dataframe should look like this:最终的 dataframe 应该是这样的:

owner所有者 name名称 col_name col_name test_col1 test_col1 test_col2 test_col2
svc服务 dmn_dmn dmn_dmn A一种 1 1个 "String1" “字符串 1”
svc服务 dmn_dmn dmn_dmn B 2 2个 "String12" “字符串 12”
svc服务 dmn_dmn dmn_dmn C C remain_constant_3 remain_constant_3 "String13" “字符串 13”
svc服务 dmn_dmn dmn_dmn D remain_constant_4 remain_constant_4 "String14" “字符串 14”
svc服务 time1时间1 E 5 5个 "String1123" “字符串 1123”
svc服务 time1时间1 F F 6 6个 "String123223" “字符串 123223”
svc服务 time1时间1 C C remain_constant_3 remain_constant_3 "String13" “字符串 13”
svc服务 time1时间1 D remain_constant_4 remain_constant_4 "String14" “字符串 14”
svc服务 sap树液 J 1 1个 "String11" “字符串 11”
svc服务 sap树液 K 2 2个 "String12" “字符串 12”
svc服务 sap树液 C C remain_constant_3 remain_constant_3 "String13" “字符串 13”
svc服务 sap树液 D remain_constant_4 remain_constant_4 "String14" “字符串 14”

Edited: Please also note that there could be more columns in this dataframe. I didnt add the other columns as i thought it wouldnt matter with the code but then i saw there was some confusion已编辑:另请注意,此 dataframe 中可能会有更多列。我没有添加其他列,因为我认为这与代码无关,但后来我发现有些混乱

You could use groupby to check if 'C' and 'D' are in the 'col_name' column and add them if not.您可以使用 groupby 检查“C”和“D”是否在“col_name”列中,如果不在则添加它们。

df = pd.DataFrame([{'owner':'svc','name':'dmn_dmn','col_name':'A','test_col1':1,'test_col2':'String1'},{'owner':'svc','name':'dmn_dmn','col_name':'B','test_col1':2,'test_col2':'String12'},{'owner':'svc','name':'dmn_dmn','col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},{'owner':'svc','name':'dmn_dmn','col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},{'owner':'svc','name':'time1','col_name':'E','test_col1':5,'test_col2':'String1123'}])

for g,g_hold in df.groupby('name'):
    if 'C' not in g_hold['col_name'].tolist():
        df = df.append({'owner':'svc','name':g,'col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},ignore_index=True)
    if 'D' not in g_hold['col_name'].tolist():
        df = df.append({'owner':'svc','name':g,'col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},ignore_index=True)

print(df.sort_values(['name','col_name']))

The code would end up looking something like this.代码最终看起来像这样。

here are a straight forward approach这是一个直接的方法

import pandas as pd
import numpy as np

df = pd.DataFrame({"owner": ["svc"] * 9,
                   "name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
                   "col_name": ["A", "B", "C", "D", "E", "F", "J", "K", "D"],
                   "test_col1": ["1", "2", "remain_constant_3", "remain_constant_4", 5, 6, 1, 2, 4],
                   "test_col2": ["String1", "String12", "String13", "String14", "String1123", "String123223",
                                 "String11", "String12", "String14"]})
list_of_element = ["C", "D"]
for owner in df.owner.unique():
    for name in df.name.unique():
        filtred = df[(df.owner == owner) & (df.name == name)]
        differance = np.setdiff1d(list_of_element, filtred.col_name)
        for diff in differance:
            if diff == 'C':
                df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_3 ',
                       'test_col2': "String13"}
            if diff == 'D':
                df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_4',
                       'test_col2': "String14"}
            df = df.append(df2, ignore_index=True)
print(df)

A better way is to use更好的方法是使用

import pandas as pd
df = pd.DataFrame({"owner": ["owner"] * 9,
               "name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
               "col_name": ["A", "B", "C", "D", "A", "B", "A", "B", "D"]})
index = pd.MultiIndex.from_product([df.owner.unique(), df.name.unique(), df.col_name.unique()])
result = df.set_index(['owner', 'name', "col_name"]).reindex(index).reset_index()
print(result)

You could use the columns with which you want to perform the combination as index and craft a custom index to reindex.您可以使用要执行组合的列作为索引,并制作自定义索引以重新编制索引。 Then groupby and ffill / bfill .然后groupbyffill / bfill

df2 = df.set_index(['owner', 'name', 'col_name'])

idx = pd.MultiIndex.from_product([df['owner'].unique(),
                                  df['name'].unique(),
                                  ['C', 'D'],
                                 ], names=['owner', 'name', 'col_name'])

(df2.reindex(df2.index.union(idx))
    .groupby(level='col_name').ffill()
    .groupby(level='col_name').bfill()
    .reset_index()
)

output: output:

   owner     name col_name          test_col1       test_col2
0    svc  dmn_dmn        A                  1       "String1"
1    svc  dmn_dmn        B                  2      "String12"
2    svc  dmn_dmn        C  remain_constant_3      "String13"
3    svc  dmn_dmn        D  remain_constant_4      "String14"
4    svc      sap        C  remain_constant_3      "String13"
5    svc      sap        D                  4      "String14"
6    svc      sap        J                  1      "String11"
7    svc      sap        K                  2      "String12"
8    svc    time1        C  remain_constant_3      "String13"
9    svc    time1        D                  4      "String14"
10   svc    time1        E                  5    "String1123"
11   svc    time1        F                  6  "String123223"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM