[英]How to apply a function to each row of one column in a pandas dataframe?
I have a dataframe df
of stock prices of length ~600k, which I downloaded from here .我有一个 dataframe
df
的股票价格,长度约为 600k,我从这里下载。
I have renamed the last column name from 'Name' to 'ticks', and created a new blank column called 'Name':我已将最后一列名称从“名称”重命名为“刻度”,并创建了一个名为“名称”的新空白列:
df = df.rename(columns={'Name': 'Ticker'})
df['Name'] = ''
I have written the following function to return the company name for a given ticker symbol:我编写了以下 function 以返回给定股票代码的公司名称:
! pip3 install yfinance
import yfinance as yf
def return_company_name(ticker):
return yf.Ticker(ticker).info['longName']
return_company_name('MSFT')
>>> 'Microsoft Corporation'
Now, I want to populate the column 'Name' with the company name of the corresponding ticker symbols.现在,我想用相应股票代码的公司名称填充“名称”列。 For that, I have written the following lambda function:
为此,我编写了以下 lambda function:
df.Name = df.Ticker.apply(lambda x: return_company_name(x))
But this last line of code just keeps on running.但是最后一行代码只是继续运行。 Is there something going wrong?
有什么问题吗? If yes, how do I fix it?
如果是,我该如何解决?
I tried the same with map
instead of apply
, but same result.我尝试使用
map
而不是apply
,但结果相同。
First, you don't need a lambda
or apply
.首先,您不需要
lambda
或apply
。
df.Name = df.Ticker.map(return_company_name)
Is better.更好。 Second, as pointed out by others, this is grotesquely inefficient.
其次,正如其他人所指出的,这是非常低效的。 You are making the call 600000 times, even though your number of tickers is much smaller.
您拨打了 600000 次电话,即使您的代码数量要少得多。 The following sledgehammer approach will work:
以下大锤方法将起作用:
class my_return():
def __init__(self):
self.tickdict = {}
def __call__(self, ticker):
ans = self.tickdict.get(ticker, None)
if ans is not None:
return ans
else:
self.tickdict[ticker] = return_company_name(ticker)
return self.tickdict[ticker]
Then map my_return on your ticker column.然后 map my_return 在您的股票行情上。
Looking at the source from yfinance you can see here that the get_info
method calls _get_fundamentals
which in turn seems to do quite a few API calls to different sites to get the information it needs.查看来自 yfinance 的源代码,您可以在此处看到
get_info
方法调用_get_fundamentals
,这反过来似乎对不同站点进行了很多API 调用以获取所需的信息。
Since this is executed for every row you run into some trouble as the sites might rate limit you.由于这是针对每一行执行的,因此您会遇到一些麻烦,因为站点可能会限制您。 Maybe you could do a prestep of getting all the unique names and then looking them up once and saving them in some kind of lookup CSV or the like
也许您可以先获取所有唯一名称,然后查找它们一次并将它们保存在某种查找 CSV 等中
You can use pandas.apply()
to apply a function to each row/column in Dataframe.您可以使用
pandas.apply()
将 function 应用于 Dataframe 中的每一行/列。
You also can use lambda function to each column.您还可以对每一列使用 lambda function。 For example:
例如:
modDfObj = dfObj.apply(lambda x: x + 10)
Another example (Here, it only applies the function to the column z
):另一个例子(这里,它只将 function 应用于
z
列):
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.