Pandas-根据特定条件添加新行

Question

I have a specific dataframe which looks like this:我有一个特定的 dataframe，它看起来像这样：

owner所有者	name名称	col_name col_name	test_col1 test_col1	test_col2 test_col2
svc服务	dmn_dmn dmn_dmn	A一种	1 1个	"String1" “字符串 1”
svc服务	dmn_dmn dmn_dmn	B乙	2 2个	"String12" “字符串 12”
svc服务	dmn_dmn dmn_dmn	C C	remain_constant_3 remain_constant_3	"String13" “字符串 13”
svc服务	dmn_dmn dmn_dmn	D丁	remain_constant_4 remain_constant_4	"String14" “字符串 14”
svc服务	time1时间1	E乙	5 5个	"String1123" “字符串 1123”
svc服务	time1时间1	F F	6 6个	"String123223" “字符串 123223”
svc服务	sap树液	J杰	1 1个	"String11" “字符串 11”
svc服务	sap树液	K钾	2 2个	"String12" “字符串 12”
svc服务	sap树液	D丁	4 4个	"String14" “字符串 14”

If the values "C" and "D" are not present in the column col_name then add "C" and "D" to its col_name.如果列 col_name 中不存在值“C”和“D”，则将“C”和“D”添加到其 col_name。 The final dataframe should look like this:最终的 dataframe 应该是这样的：

owner所有者	name名称	col_name col_name	test_col1 test_col1	test_col2 test_col2
svc服务	dmn_dmn dmn_dmn	A一种	1 1个	"String1" “字符串 1”
svc服务	dmn_dmn dmn_dmn	B乙	2 2个	"String12" “字符串 12”
svc服务	dmn_dmn dmn_dmn	C C	remain_constant_3 remain_constant_3	"String13" “字符串 13”
svc服务	dmn_dmn dmn_dmn	D丁	remain_constant_4 remain_constant_4	"String14" “字符串 14”
svc服务	time1时间1	E乙	5 5个	"String1123" “字符串 1123”
svc服务	time1时间1	F F	6 6个	"String123223" “字符串 123223”
svc服务	time1时间1	C C	remain_constant_3 remain_constant_3	"String13" “字符串 13”
svc服务	time1时间1	D丁	remain_constant_4 remain_constant_4	"String14" “字符串 14”
svc服务	sap树液	J杰	1 1个	"String11" “字符串 11”
svc服务	sap树液	K钾	2 2个	"String12" “字符串 12”
svc服务	sap树液	C C	remain_constant_3 remain_constant_3	"String13" “字符串 13”
svc服务	sap树液	D丁	remain_constant_4 remain_constant_4	"String14" “字符串 14”

Edited: Please also note that there could be more columns in this dataframe. I didnt add the other columns as i thought it wouldnt matter with the code but then i saw there was some confusion已编辑：另请注意，此 dataframe 中可能会有更多列。我没有添加其他列，因为我认为这与代码无关，但后来我发现有些混乱

Answer 1

You could use groupby to check if 'C' and 'D' are in the 'col_name' column and add them if not.您可以使用 groupby 检查“C”和“D”是否在“col_name”列中，如果不在则添加它们。

df = pd.DataFrame([{'owner':'svc','name':'dmn_dmn','col_name':'A','test_col1':1,'test_col2':'String1'},{'owner':'svc','name':'dmn_dmn','col_name':'B','test_col1':2,'test_col2':'String12'},{'owner':'svc','name':'dmn_dmn','col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},{'owner':'svc','name':'dmn_dmn','col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},{'owner':'svc','name':'time1','col_name':'E','test_col1':5,'test_col2':'String1123'}])

for g,g_hold in df.groupby('name'):
    if 'C' not in g_hold['col_name'].tolist():
        df = df.append({'owner':'svc','name':g,'col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},ignore_index=True)
    if 'D' not in g_hold['col_name'].tolist():
        df = df.append({'owner':'svc','name':g,'col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},ignore_index=True)

print(df.sort_values(['name','col_name']))

The code would end up looking something like this.代码最终看起来像这样。

Answer 2

here are a straight forward approach这是一个直接的方法

import pandas as pd
import numpy as np

df = pd.DataFrame({"owner": ["svc"] * 9,
                   "name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
                   "col_name": ["A", "B", "C", "D", "E", "F", "J", "K", "D"],
                   "test_col1": ["1", "2", "remain_constant_3", "remain_constant_4", 5, 6, 1, 2, 4],
                   "test_col2": ["String1", "String12", "String13", "String14", "String1123", "String123223",
                                 "String11", "String12", "String14"]})
list_of_element = ["C", "D"]
for owner in df.owner.unique():
    for name in df.name.unique():
        filtred = df[(df.owner == owner) & (df.name == name)]
        differance = np.setdiff1d(list_of_element, filtred.col_name)
        for diff in differance:
            if diff == 'C':
                df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_3 ',
                       'test_col2': "String13"}
            if diff == 'D':
                df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_4',
                       'test_col2': "String14"}
            df = df.append(df2, ignore_index=True)
print(df)

Answer 3

A better way is to use更好的方法是使用

import pandas as pd
df = pd.DataFrame({"owner": ["owner"] * 9,
               "name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
               "col_name": ["A", "B", "C", "D", "A", "B", "A", "B", "D"]})
index = pd.MultiIndex.from_product([df.owner.unique(), df.name.unique(), df.col_name.unique()])
result = df.set_index(['owner', 'name', "col_name"]).reindex(index).reset_index()
print(result)

Answer 4

You could use the columns with which you want to perform the combination as index and craft a custom index to reindex.您可以使用要执行组合的列作为索引，并制作自定义索引以重新编制索引。 Then groupby and ffill / bfill .然后groupby和ffill / bfill 。

df2 = df.set_index(['owner', 'name', 'col_name'])

idx = pd.MultiIndex.from_product([df['owner'].unique(),
                                  df['name'].unique(),
                                  ['C', 'D'],
                                 ], names=['owner', 'name', 'col_name'])

(df2.reindex(df2.index.union(idx))
    .groupby(level='col_name').ffill()
    .groupby(level='col_name').bfill()
    .reset_index()
)

output: output：

   owner     name col_name          test_col1       test_col2
0    svc  dmn_dmn        A                  1       "String1"
1    svc  dmn_dmn        B                  2      "String12"
2    svc  dmn_dmn        C  remain_constant_3      "String13"
3    svc  dmn_dmn        D  remain_constant_4      "String14"
4    svc      sap        C  remain_constant_3      "String13"
5    svc      sap        D                  4      "String14"
6    svc      sap        J                  1      "String11"
7    svc      sap        K                  2      "String12"
8    svc    time1        C  remain_constant_3      "String13"
9    svc    time1        D                  4      "String14"
10   svc    time1        E                  5    "String1123"
11   svc    time1        F                  6  "String123223"

Pandas-根据特定条件添加新行

问题描述

4 个解决方案

解决方案1
1 已采纳 2021-10-06 16:26:52

解决方案2
0 2021-10-06 15:05:45

解决方案3
0 2021-10-06 15:36:41

解决方案4
0 2021-10-06 16:13:37

Pandas-根据特定条件添加新行

问题描述

4 个解决方案

解决方案1 1 已采纳 2021-10-06 16:26:52

解决方案2 0 2021-10-06 15:05:45

解决方案3 0 2021-10-06 15:36:41

解决方案4 0 2021-10-06 16:13:37

解决方案1
1 已采纳 2021-10-06 16:26:52

解决方案2
0 2021-10-06 15:05:45

解决方案3
0 2021-10-06 15:36:41

解决方案4
0 2021-10-06 16:13:37