[英]Pandas Dataframe: Multiplying Two Columns
I am trying to multiply two columns (ActualSalary * FTE) within the dataframe (OPR) to create a new column (FTESalary), but somehow it has stopped at row 21357, I don't understand what went wrong or how to fix it.我正在尝试将数据框 (OPR) 中的两列 (ActualSalary * FTE) 相乘以创建一个新列 (FTESalary),但不知何故它已停在第 21357 行,我不明白出了什么问题或如何解决它。 The two columns came from importing a csv file using the line: OPR = pd.read_csv('OPR.csv', encoding='latin1')
这两列来自使用以下行导入 csv 文件: OPR = pd.read_csv('OPR.csv', encoding='latin1')
[In] OPR
[out]
ActualSalary FTE
44600 1
58,000.00 1
70,000.00 1
17550 1
34693 1
15674 0.4
[In] OPR["FTESalary"] = OPR["ActualSalary"].str.replace(",", "").astype("float")*OPR["FTE"]
[In] OPR
[out]
ActualSalary FTE FTESalary
44600 1 44600
58,000.00 1 58000
70,000.00 1 70000
17550 1 NaN
34693 1 NaN
15674 0.4 NaN
I am not expecting any NULL values as an output at all, I am really struggling with this.我根本不期望任何 NULL 值作为输出,我真的很挣扎。 I would really appreciate the help.我真的很感激你的帮助。 Many thanks in advance!提前谢谢了! (I am new to both coding and here, please let me know via message if I have made mistakes or can improve the way I post questions here) (我是编码和这里的新手,如果我犯了错误或可以改进我在此处发布问题的方式,请通过消息告诉我)
Sharing the data @oppresiveslayer分享数据@oppressiveslayer
[In] OPR[0:6].to_dict()
[out]
{'ActualSalary': {0: '44600',
1: '58,000.00',
2: '70,000.00',
3: '39,780.00',
4: '0.00',
5: '78,850.00'},
'FTE': {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0}}
For more information on the two columns @charlesreid1有关两列的更多信息@charlesreid1
[in] OPR['ActualSalary'].astype
[out]
Name: ActualSalary, Length: 21567, dtype: object>
[in] OPR['FTE'].astype
[out]
Name: FTE, Length: 21567, dtype: float64>
The version I am using: python: 3.7.3, pandas: 0.25.1 on jupyter Notebook 6.0.0我使用的版本:python: 3.7.3, pandas: 0.25.1 on jupyter Notebook 6.0.0
I believe that your ActualSalary
column is a mix of strings and integers.我相信您的ActualSalary
列是字符串和整数的混合。 That is the only way I've been able to recreate your error:这是我能够重现您的错误的唯一方法:
df = pd.DataFrame(
{'ActualSalary': ['44600', '58,000.00', '70,000.00', 17550, 34693, 15674],
'FTE': [1, 1, 1, 1, 1, 0.4]})
>>> df['ActualSalary'].str.replace(',', '').astype(float) * df['FTE']
0 44600.0
1 58000.0
2 70000.0
3 NaN
4 NaN
5 NaN
dtype: float64
The issue arises when you try to remove the commas:当您尝试删除逗号时会出现问题:
>>> df['ActualSalary'].str.replace(',', '')
0 44600
1 58000.00
2 70000.00
3 NaN
4 NaN
5 NaN
Name: ActualSalary, dtype: object
First convert them to strings, before converting back to floats.首先将它们转换为字符串,然后再转换回浮点数。
fte_salary = (
df['ActualSalary'].astype(str).str.replace(',', '') # Remove commas in string, e.g. '55,000.00' -> '55000.00'
.astype(float) # Convert string column to floats.
.mul(df['FTE']) # Multiply by new salary column by Full-Time-Equivalent (FTE) column.
)
>>> df.assign(FTESalary=fte_salary) # Assign new column to dataframe.
ActualSalary FTE FTESalary
0 44600 1.0 44600.0
1 58,000.00 1.0 58000.0
2 70,000.00 1.0 70000.0
3 17550 1.0 17550.0
4 34693 1.0 34693.0
5 15674 0.4 6269.6
This should work:这应该有效:
OTR['FTESalary'] = OTR.apply(lambda x: pd.to_numeric(x['ActualSalary'].replace(",", ""), errors='coerce') * x['FTE'], axis=1)
output输出
ActualSalary FTE FTESalary
0 44600 1.0 44600.0
1 58,000.00 1.0 58000.0
2 70,000.00 1.0 70000.0
3 17550 1.0 17550.0
4 34693 1.0 34693.0
5 15674 0.4 6269.6
ok, i think you need to do this:好的,我认为你需要这样做:
OTR['FTESalary'] = OTR.reset_index().apply(lambda x: pd.to_numeric(x['ActualSalary'].replace(",", ""), errors='coerce') * x['FTE'], axis=1).to_numpy().tolist()
I was able to do it in a couple steps, but with list comprehension which might be less readable for a beginner.我可以通过几个步骤来完成,但是列表理解对于初学者来说可能不太可读。 It makes an intermediate column, which does the float conversion, since your ActualSalary column is full of strings at the start.它创建了一个中间列,用于进行浮点转换,因为您的 ActualSalary 列在开始时充满了字符串。
OPR["X"] = [float(x.replace(",","")) for x in OPR["ActualSalary"]]
OPR["FTESalary"] = OPR["X"]*OPR["FTE"]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.