简体   繁体   English

使用条件在 Python 数据框中插入行

[英]Insert rows in Python dataframe with conditions

I have a large data file as shown below.我有一个大数据文件,如下所示。 示例数据文件

Edited to include an updated example:编辑以包含更新的示例:

更新示例

I wanted to add two new columns (E and F) next to column D and move the suite # when applicable and City/State data in cell D3 and D4 to E2 and F2, respectively.我想在 D 列旁边添加两个新列(E 和 F),并将单元格 D3 和 D4 中的单元格 #(适用时)和城市/州数据分别移动到 E2 和 F2。 The challenge is not every entry has the suite number.挑战在于并非每个条目都有套房号。 I would need to insert a row first for those entries that don't have the suite number, only for them, not for those that already have the suite information.我需要先为那些没有套房号的条目插入一行,只为它们插入一行,而不是为那些已经有套房信息的条目插入一行。

I know how to do loops, but am having trouble to define the conditions.我知道如何进行循环,但无法定义条件。 One way is to count the length of the string.一种方法是计算字符串的长度。 How should I get started?我应该如何开始? Much appreciate your help!非常感谢您的帮助!

This is how I would do it.我就是这样做的。 I don't recommend looping when using pandas.我不建议在使用 pandas 时循环。 There are a lot of tools that it is often not needed.有很多工具通常不需要。 Some caution on this.对此有些谨慎。 Your spreadsheet has NaN and I think that is actually numpy np.nan equivalent.你的电子表格有 NaN 我认为这实际上是 numpy np.nan 等价物。 You also have blanks I am thinking that it is a "" equivalent.你也有空白我认为它是一个“”等价物。

import pandas as pd
import numpy as np

# dictionary of your data
companies = {
    'Comp ID': ['C1', '', np.nan, 'C2', '', np.nan, 'C3',np.nan],
    'Address': ['10 foo', 'Suite A','foo city', '11 spam','STE 100','spam town', '12 ham', 'Myhammy'],
    'phone': ['888-321-4567', '', np.nan, '888-321-4567', '', np.nan, '888-321-4567',np.nan],
    'Type': ['W_sale', '', np.nan, 'W_sale', '', np.nan, 'W_sale',np.nan],
}
# make the frames needed. 
df = pd.DataFrame( companies)
df1 = pd.DataFrame() # blank frame for suite and town columns

# Edit here to TEST the data types 
for r in range(0, 5):
    v = df['Comp ID'].values[r]
    print(f'this "{v}" is a ', type(v))

# So this will tell us the data types so we can construct our where(). Back to prior answer....

# Need a where clause it is similar to a if() statement in excel
df1['Suite'] = np.where( df['Comp ID']=='', df['Address'], np.nan)
df1['City/State'] = np.where( df['Comp ID'].isna(), df['Address'], np.nan)
# copy values to rows above
df1 = df1[['Suite','City/State']].backfill()
# joint the frames together on index
df = df.join(df1)
df.drop_duplicates(subset=['City/State'], keep='first', inplace=True)
# set the column order to what you want
df = df[['Comp ID', 'Type', 'Address', 'Suite', 'City/State', 'phone' ]]

output输出

Comp ID公司编号 Type类型 Address地址 Suite套房 City/State市,州 phone电话
C1 C1 W_sale W_销售 10 foo 10英尺 Suite A套房A foo city福城 888-321-4567 888-321-4567
C2 C2 W_sale W_销售 11 spam 11 垃圾邮件 STE 100科创100 spam town垃圾邮件镇 888-321-4567 888-321-4567
C3 C3 W_sale W_销售 12 ham 12个火腿 Myhammy米哈米 888-321-4567 888-321-4567

Edit: the numpy where statement:编辑:numpy where 语句:

numpy is brought in by the line import numpy as np at the top. numpy 由顶部的import numpy as np行引入。 We are creating calculated column that is based on the 'Comp ID' column.我们正在创建基于“Comp ID”列的计算列。 The numpy does this without loops. numpy 在没有循环的情况下执行此操作。 Think of the where like an excel IF() function.将 where 想象成一个 excel IF() 函数。

df1(return value) = np.where(df[test] > condition, true, false)

The pandas backfill Some times you have a value that is in a cell below and you want to duplicate it for the blank cell above it. Pandas 回填有时你有一个值在下面的单元格中,你想将它复制到它上面的空白单元格中。 So you backfill.所以你回填。 df1 = df1[['Suite','City/State']].backfill() . df1 = df1[['Suite','City/State']].backfill()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM