I have 2 columns I want to loop through, 'Volume_hedge' and 'Unit_hedge'. For each row, if the data in 'Unit_hedge' says "Thousands of Barrels per Day", I want to divide the number in "Volume_hedge" (which is in the same row as the 'Unit_hedge' that equals "Thousands of Barrels per Day") by 1000.
I've tried looping through both columns enumerated and an if statement afterwards. Like I said, I works for the first 2 rows but not for the rest.
df2 = DataFrame(x)
columns_to_select = ['Volume_hedge', 'Unit_hedge']
for i, row in enumerate(columns_to_select):
if df2['Unit_hedge'].loc[i] == 'Thousands of Barrels per Day':
new_row = df2['Volume_hedge'].loc[i] / 1000
else:
none
df2['Volume_hedge'].loc[i] = new_row
print(df2[columns_to_select].loc[0:8])
Expected results:
Volume_hedge Unit_hedge
0 0.03 Thousands of Barrels per Day
1 0.024 Thousands of Barrels per Day
2 0.024 Thousands of Barrels per Day
3 0.024 Thousands of Barrels per Day
4 0.024 Thousands of Barrels per Day
5 0.024 Thousands of Barrels per Day
6 0.024 Thousands of Barrels per Day
7 32850000 (MMBtu/Bbl)
8 4404000 (MMBtu/Bbl)
Actual Results:
Volume_hedge Unit_hedge
0 0.03 Thousands of Barrels per Day
1 0.024 Thousands of Barrels per Day
2 24 Thousands of Barrels per Day
3 24 Thousands of Barrels per Day
4 24 Thousands of Barrels per Day
5 24 Thousands of Barrels per Day
6 24 Thousands of Barrels per Day
7 32850000 (MMBtu/Bbl)
8 4404000 (MMBtu/Bbl)
You should use np.select
here:
import numpy as np
df2["Volume_hedge"] = np.select(
[df2["Unit_hedge"].eq("Thousands of Barrels per Day")],
[df2["Volume_hedge"].div(1000)],
df2["Volume_hedge"]
)
This will divide all rows where Unit_hedge
equals "Thousands of Barrels per Day" by 1000, and leave all the other rows the same.
This also has the advantage of not being done iteratively, which is faster when using pandas
and numpy
Columns to select is a two element list. When you enumerate it, i will vary from 0 to 1. This will only apply the function to the first two rows.
If you want to iterate through the rows, you should instead use the iterrows function. Do something like,
for i, row in df2.iterrows():
if row['Unit_hedge'] == 'Thousands of Barrels per Day':
new_row = row['Volume_hedge'] / 1000
df2['Volume_hedge'].iloc[i] = new_row
However, using apply rather than looping through each row is a better bet because iterating is very slow. Also setting column values while iterating through a dataframe is not preferred
df['volume_hedge'][df['Unit_hedge'] == 'Thousands of Barrels per Day'] =
df['volume_hedge'][df['Unit_hedge'] == 'Thousands of Barrels per Day']/1000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.