简体   繁体   English

如何根据两个条件迭代 dataframe?

[英]How to iterate through a dataframe based on two conditions?

I have a sample of companies with financial figures which I would like to compare.我有一些公司的样本,我想比较这些公司的财务数据。 My data looks like this:我的数据如下所示:


    Cusip9      Issuer                       IPO Year   Total Assets    Long-Term Debt  Sales      SIC-Code
1   783755101   Ryerson Tull Inc                1996    9322000.0        2632000.0      633000.0   3661
2   826170102   Siebel Sys Inc                  1996    995010.0         0.0            50250.0    2456
3   894363100   Travis Boats & Motors Inc       1996    313500.0         43340.0        23830.0    3661
4   159186105   Channell Commercial Corp        1996    426580.0         3380.0         111100.0   7483
5   742580103   Printware Inc                   1996    145750.0         0.0            23830.0    8473

For every company I want to calculate a "similarity Score".我想为每家公司计算一个“相似度分数”。 This score should indicate the comparability with other companies.这个分数应该表明与其他公司的可比性。 Therefore I want to compare them in different financial figures.因此,我想用不同的财务数据来比较它们。 The comparability should be expressed as the euclidean distance, the square root of the sum of the squared differences between the financial figures, to the "closest company".可比性应表示为欧几里德距离,即财务数据与“最接近的公司”之间的平方差之和的平方根。 So I need to calculate the distance to every company, that fits these conditions, but only need the closest score.所以我需要计算到符合这些条件的每家公司的距离,但只需要最接近的分数。 Assets of Company 1 minus Assets of Company 2 plus Debt Company 1 minus Debt Comapny 2....公司 1 的资产减去公司 2 的资产加上债务公司 1 减去债务公司 2....

√((x_1-y_1 )^2+(x_2-y_2 )^2) 

This should only be computed for companies with the same SIC-Code and the IPO Year of the comparable companies should be smaller then for the company for which the "Similarity score" is computed.这应该只针对具有相同 SIC 代码的公司进行计算,并且可比公司的 IPO 年份应该小于为其计算“相似性分数”的公司。 I only want to compare these companies with already listed companies.我只想将这些公司与已经上市的公司进行比较。

Hopefully, my point get's clear.希望我的观点得到明确。 Has someone any idea where I can start?有人知道我可以从哪里开始吗? I am just starting with programming and completely lost with this.我刚开始编程,完全迷失了。

Thanks in advance.提前致谢。

I would first create different dataframes according to the SIC-code, so every new dataframe only contains companies with the same SIC-code.我首先会根据 SIC 代码创建不同的数据框,因此每个新的 dataframe 只包含具有相同 SIC 代码的公司。 Then for every of those dataframes, just double loop over the companies and compute the scores, and store them in a matrix.然后对于这些数据帧中的每一个,只需对公司进行两次循环并计算分数,并将它们存储在矩阵中。 (So you'll end up with a symmetrical matrix of scores.) (所以你最终会得到一个对称的分数矩阵。)

try this, Here I have taken Compare the company with IPO Year Equal to or Smaller then since You didn't give any company record with smaller IPO year) You can change it to only Smaller than (<) in statement Group=df[...]试试这个,我在这里比较了 IPO 年份等于或小于的公司,因为你没有提供任何 IPO 年份更的公司记录)你可以在语句Group=df[. ..]

def closestCompany(companyRecord):
    Group = df[(df['SIC-Code']==companyRecord['SIC-Code']) & (df['IPO Year'] <= companyRecord['IPO Year']) & (df['Issuer'] != companyRecord['Issuer'])]
    return (((Group['Total Assets']-companyRecord['Total Assets'])**2 + (Group['Long-Term Debt'] - companyRecord['Long-Term Debt'])**2)**0.5).min()

df['Closest Company Similarity Score']=df.apply(closestCompany, axis=1)
df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM