简体   繁体   English

使用pandas操作将列添加到pandas数据框

[英]Add a column to a pandas data frame using a pandas operation

If I have a data frame, and I need to perform some operation on a given column and produce a new column, is there a better way than the function below? 如果我有一个数据帧,并且需要在给定的列上执行一些操作并产生一个新的列,那么有没有比下面的函数更好的方法?

I do NOT want to alter the original column. 我不想更改原始列。 I want to keep appending new columns for this and any similar operations. 我想继续为此和任何类似的操作添加新列。

But in the code below, it seems there are just too many lines. 但是在下面的代码中,似乎有太多行。 That is, the rank() function in pandas is super convenient. 也就是说,pandas中的rank()函数非常方便。 Seems to me there should be some parameter somewhere that says to the data frame, "Hey, apply this function you already know about, but instead of doing to the original column itself, as you do it, make it a new column at the end of the data frame" 在我看来,应该在某个位置对数据框说一些参数,“嘿,应用您已经知道的此功能,但是不要像您那样对原始列本身进行处理,而是在最后添加一个新列数据帧”

Is there such a way? 有这种方法吗? Or is there any way to make the code below more brief/elegant and achieve the same result? 还是有什么办法可以使下面的代码更简洁/优美,并达到相同的结果? What I have just seems verbose. 我刚才的内容似乎很冗长。 I do this for other things too, eg I have the same type of function for cut(). 我也为其他事情执行此操作,例如,我具有相同类型的cut()函数。 I will be doing it for a few other ops. 我将在其他一些操作中这样做。 Seems so common it should be easier. 似乎很常见,应该会更容易。

Thanks! 谢谢!

def rank(pdfAll, nOldColIndex, sNewColName, sMethod, bAsc):
"""Appends a ranked column to a DataFrame based on an existing column.  

   nOldColIndex is the index of the column with the original data.
   sNewColName is the name of the new column.  
   sMethod goes to the pandas rank function to influence ranking behavior.
   bAsc goes to the pandas rank function to influence ranking behavior.
   pdfAll[nOldColIndex] must have numeric contents.

"""

serOldCol = pdfAll.ix[:,nOldColIndex]
serOldCol.name = sNewColName

serNewCol = serOldCol.rank(method=sMethod, ascending=bAsc)
pdfNewCol = pd.DataFrame(serNewCol)

pdfAll = pd.merge(pdfAll, pdfNewCol, left_index=True, right_index=True)

return pdfAll 

I'm not sure what this generalization is all about, but do you by any chance are trying to do something as 我不确定这种概括是什么,但是您是否有机会尝试做一些

df['newColumn'] = df.oldColumn.rank()

Generalizing the function, if you want to do something on a row basis, you can do 概括该功能,如果您想连续执行某项操作,则可以执行

df.apply(lambda x: x.oldColumn * x.otherOldColumn, axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM