[英]Need to assign values to a dataframe if any value >0 exists in another column
I'm working with a transaction database query set, and I wasn't able to pull specific dates for payments, so I'm trying to write sone code in python to assign the dates for me. 我正在使用交易数据库查询集,但无法提取特定的付款日期,因此我尝试用python编写sone代码为我分配日期。 My first thought was to do it in excel, but the dataset is 800,000+ rows X 100+ columns, so it's not practical to do this any other way. 我的第一个想法是在excel中执行此操作,但数据集是800,000+行X 100+列,因此以任何其他方式执行此操作都不切实际。 The dataset has values in some of the rows in the payments column, so I need to add a date column with dates only in the rows that contain a payment amount. 数据集在付款列的某些行中具有值,因此我需要添加一个日期列,该日期列仅在包含付款金额的行中具有日期。
I have created all of the columns to store the dates, and my idea was to loop through the rows and assign a date if that row contains a value greater than zero (as there are 0s in the columns, as well as NULL values). 我已经创建了所有列来存储日期,并且我的想法是遍历各行并分配一个日期(如果该行包含的值大于零)(因为列中有0以及NULL值)。
df['Payment Date] = ''
for value in df:
if value > 0 :
df['Payment Date'] = '06/01/2019'
I expect the output to have dates assigned to the rows from the payment date column that have actual values. 我希望输出将日期分配给付款日期列中具有实际值的行。
If I understand correctly, you are trying to (1) identify rows in your Dataframe with values that are greater than zero, and (2) assign a specific date to a new column for all of those rows. 如果我理解正确,则您尝试(1)识别数据框中具有大于零值的行,并且(2)为所有这些行的新列分配特定日期。
First, for reproducibility and clarity, let's generate some random data that is representative of your dataset: 首先,为了可重复性和清晰度,让我们生成一些代表数据集的随机数据:
import pandas as pd
# Generate a random 5x4 Dataframe
df = pd.DataFrame(np.random.randn(5,4), columns=list('ABCD'))
# Set many of the values to zero
df[df > 0] = 0
Now, we want to create a new column to store the desired dates: 现在,我们要创建一个新列来存储所需的日期:
df['Payment Date'] = ''
And finally, set that column to the date desired for all rows that contain any values greater than zero (note that this requires that the sum across all rows, skipping N/As, is greater than zero, which is the condition tested below): 最后,将该列设置为包含大于零的所有值的所有行的期望日期(请注意,这要求所有行的总和(不包括N / As)大于零,这是下面测试的条件):
row_inds = df.sum(axis=1, skipna=True)>0
df.loc[row_inds, 'Payment Date'] = '06/01/2019'
Which gives you the desired result. 这给您想要的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.