使用条件在 Python 数据框中插入行

Question

I have a large data file as shown below.我有一个大数据文件，如下所示。

Edited to include an updated example:编辑以包含更新的示例：

I wanted to add two new columns (E and F) next to column D and move the suite # when applicable and City/State data in cell D3 and D4 to E2 and F2, respectively.我想在 D 列旁边添加两个新列（E 和 F），并将单元格 D3 和 D4 中的单元格 #（适用时）和城市/州数据分别移动到 E2 和 F2。 The challenge is not every entry has the suite number.挑战在于并非每个条目都有套房号。 I would need to insert a row first for those entries that don't have the suite number, only for them, not for those that already have the suite information.我需要先为那些没有套房号的条目插入一行，只为它们插入一行，而不是为那些已经有套房信息的条目插入一行。

I know how to do loops, but am having trouble to define the conditions.我知道如何进行循环，但无法定义条件。 One way is to count the length of the string.一种方法是计算字符串的长度。 How should I get started?我应该如何开始？ Much appreciate your help!非常感谢您的帮助！

Answer 1

This is how I would do it.我就是这样做的。 I don't recommend looping when using pandas.我不建议在使用 pandas 时循环。 There are a lot of tools that it is often not needed.有很多工具通常不需要。 Some caution on this.对此有些谨慎。 Your spreadsheet has NaN and I think that is actually numpy np.nan equivalent.你的电子表格有 NaN 我认为这实际上是 numpy np.nan 等价物。 You also have blanks I am thinking that it is a "" equivalent.你也有空白我认为它是一个“”等价物。

import pandas as pd
import numpy as np

# dictionary of your data
companies = {
    'Comp ID': ['C1', '', np.nan, 'C2', '', np.nan, 'C3',np.nan],
    'Address': ['10 foo', 'Suite A','foo city', '11 spam','STE 100','spam town', '12 ham', 'Myhammy'],
    'phone': ['888-321-4567', '', np.nan, '888-321-4567', '', np.nan, '888-321-4567',np.nan],
    'Type': ['W_sale', '', np.nan, 'W_sale', '', np.nan, 'W_sale',np.nan],
}
# make the frames needed. 
df = pd.DataFrame( companies)
df1 = pd.DataFrame() # blank frame for suite and town columns

# Edit here to TEST the data types 
for r in range(0, 5):
    v = df['Comp ID'].values[r]
    print(f'this "{v}" is a ', type(v))

# So this will tell us the data types so we can construct our where(). Back to prior answer....

# Need a where clause it is similar to a if() statement in excel
df1['Suite'] = np.where( df['Comp ID']=='', df['Address'], np.nan)
df1['City/State'] = np.where( df['Comp ID'].isna(), df['Address'], np.nan)
# copy values to rows above
df1 = df1[['Suite','City/State']].backfill()
# joint the frames together on index
df = df.join(df1)
df.drop_duplicates(subset=['City/State'], keep='first', inplace=True)
# set the column order to what you want
df = df[['Comp ID', 'Type', 'Address', 'Suite', 'City/State', 'phone' ]]

output输出

Comp ID公司编号	Type类型	Address地址	Suite套房	City/State市，州	phone电话
C1 C1	W_sale W_销售	10 foo 10英尺	Suite A套房A	foo city福城	888-321-4567 888-321-4567
C2 C2	W_sale W_销售	11 spam 11 垃圾邮件	STE 100科创100	spam town垃圾邮件镇	888-321-4567 888-321-4567
C3 C3	W_sale W_销售	12 ham 12个火腿		Myhammy米哈米	888-321-4567 888-321-4567

Edit: the numpy where statement:编辑：numpy where 语句：

numpy is brought in by the line import numpy as np at the top. numpy 由顶部的import numpy as np行引入。 We are creating calculated column that is based on the 'Comp ID' column.我们正在创建基于“Comp ID”列的计算列。 The numpy does this without loops. numpy 在没有循环的情况下执行此操作。 Think of the where like an excel IF() function.将 where 想象成一个 excel IF() 函数。

df1(return value) = np.where(df[test] > condition, true, false)

The pandas backfill Some times you have a value that is in a cell below and you want to duplicate it for the blank cell above it. Pandas 回填有时你有一个值在下面的单元格中，你想将它复制到它上面的空白单元格中。 So you backfill.所以你回填。 df1 = df1[['Suite','City/State']].backfill() . df1 = df1[['Suite','City/State']].backfill() 。

使用条件在 Python 数据框中插入行

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-12-02 22:14:58

使用条件在 Python 数据框中插入行

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-12-02 22:14:58

解决方案1
1 已采纳 2022-12-02 22:14:58