简体   繁体   English

Pandas:在迭代行时有条件地将行插入DataFrame

[英]Pandas: Conditionally insert rows into DataFrame while iterating through rows

While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition. 在迭代Pandas DataFrame中特定列的行时,如果当前迭代行中的单元格满足某个条件,我想在当前迭代行下面添加一个新行。

Say for example: 比如说:

df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})

DataFrame: 数据帧:

      A     B
0  0.15  1500
1  0.15  1500
2  0.70  7000

Attempt: 尝试:

y = 100                             #An example scalar

i = 1

for x in df['A']:
    if x is not None:               #Values in 'A' are filled atm, but not necessarily.
        df.loc[i] = [None, x*y]     #Should insert None into 'A', and product into 'B'.
        df.index = df.index + 1     #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
    i = i + 1

df.sort_index(inplace=True)         #Sort index?

I haven't been able to succeed so far; 到目前为止我还没有成功; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way: 得到一个不从0开始的移位索引编号,并且似乎没有以有序的方式插入行:

      A     B
3  0.15  1500
4   NaN    70
5  0.70  7000

I tried various variants of this, trying to use applymap with a lambda function, but was not able to get it working. 我尝试了各种变体,尝试使用带有lambda函数的applymap ,但是无法使其正常工作。

Desired result: 期望的结果:

      A     B
0  0.15  1500
1  None  15
2  0.15  1500
3  None  15
4  0.70  7000
5  None  70

I believe you can use: 我相信你可以使用:

df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 
                          'B': [1500, 1500, 7000],
                          'C': [100, 200, 400]})

v = 100
L = []
for i, x in df.to_dict('index').items():
    print (x)
    #append dictionary
    L.append(x)
    #append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs 
    L.append({'A':x['A'] * v})

df = pd.DataFrame(L)
print (df)
       A       B      C
0   0.15  1500.0  100.0
1  15.00     NaN    NaN
2   0.15  1500.0  200.0
3  15.00     NaN    NaN
4   0.70  7000.0  400.0
5  70.00     NaN    NaN

It doesn't seem you need a manual loop here: 这似乎不需要手动循环:

df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})

y = 100

# copy slice of dataframe
df_extra = df.loc[df['A'].notnull()].copy()

# assign A and B series values
df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))

# increment index partially, required for sorting afterwards
df_extra.index += 0.5

# append, sort index, drop index
res = df.append(df_extra).sort_index().reset_index(drop=True)

print(res)

      A     B
0  0.15  1500
1   NaN    15
2  0.15  1500
3   NaN    15
4  0.70  7000
5   NaN    70

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM