I have multiple column data frame with columns ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable'] .
In the energy supply column, I want to convert the unit of the column to Peta from Giga. But in the process energy['Energy Supply']*= energy['Energy Supply']
, when the value is like "...." (missing value is denoted by this), is also getting multiplied or say duplicated. Also, the string value in the column is also getting multiplied. (For eg original: Peta, after operation: PetaPetaPetaPeta...).
To stop this from happening, I am running this:
energy = pd.read_excel("Energy Indicators.xls",skiprows = 16, skip_footer = 38)
energy.drop(['Unnamed: 0','Unnamed: 1'],axis = 1, inplace = True)
energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
for i in energy['Energy Supply']:
if (isinstance(energy[i],int) == True):
energy['Energy Supply'][i]=energy['Energy Supply'][i]*1000000
return (energy)
But I am not getting the result ie to change the value of integer type variables only, and nothing is changing.
Where I think the problem lies in, the first two rows will give the false condition, as first rows are "String" and based on that, the program is not modifying the values, whereas I want to individually check if the value is of integer type and if it is, Multiplies the number by 1,000,000.
Input:
Country Energy Supply Energy Supply per Capita % Renewable
0 NaN Petajoules Gigajoules %
1 Afghanistan 321 10 78.6693
2 Albania 102 35 100
3 Algeria 1959 51 0.55101
4 American Samoa ... ... 0.641026
Expected Output:
Country Energy Supply Energy Supply per Capita % Renewable
0 NaN Petajoules Gigajoules %
1 Afghanistan 3210000 10 78.6693
2 Albania 1020000 35 100
3 Algeria 19590000 51 0.55101
4 American Samoa ... ... 0.641026
Current Output:
Country Energy Supply Energy Supply per Capita % Renewable
0 NaN PetajoulesPeta. Gigajoules %
1 Afghanistan 3210000 10 78.6693
2 Albania 1020000 35 100
3 Algeria 19590000 51 0.55101
4 American Samoa ........ ... 0.641026
This worked for me with a million values:
import pandas as pd
import numpy as np
data = {"Energy Supply":[1,30,"Petajoules",5,70]*2000000}
energy = pd.DataFrame(data)
input:
Energy Supply
0 1
1 30
2 Petajoules
3 5
4 70
5 1
6 30
7 Petajoules
8 5
9 70
10 1
11 30
12 Petajoules
13 5
14 70
15 1
16 30
17 Petajoules
18 5
19 70
20 1
21 30
22 Petajoules
23 5
24 70
25 1
26 30
27 Petajoules
28 5
29 70
...
[10000000 rows x 1 columns]
Then i transform the Series into an array and set the values:
arr = energy["Energy Supply"].values
for i in range(len(arr)):
if isinstance(arr[i],int):
arr[i] = arr[i]*1000000
else:
pass
The output looks like this:
Energy Supply
0 1000000
1 30000000
2 Petajoules
3 5000000
4 70000000
5 1000000
6 30000000
7 Petajoules
8 5000000
9 70000000
10 1000000
11 30000000
12 Petajoules
13 5000000
14 70000000
15 1000000
16 30000000
17 Petajoules
18 5000000
19 70000000
20 1000000
21 30000000
22 Petajoules
23 5000000
24 70000000
25 1000000
26 30000000
27 Petajoules
28 5000000
29 70000000
...
[10000000 rows x 1 columns]
This solution is about twice as fast as an apply:
Looping through an array:
loop: 100%|██████████| 10000000/10000000 [00:07<00:00, 1376439.75it/s]
Using Apply:
apply: 100%|██████████| 10000000/10000000 [00:14<00:00, 687420.00it/s]
If you convert the series to numeric then the string values become nan values. Using np.where you need about 5 seconds for both converting the series to numeric and multiplying the values:
import pandas as pd
import numpy as np
import time
data = {"Energy Supply":[1,30,"Petajoules",5,70]*2000000}
energy = pd.DataFrame(data)
t = time.time()
energy["Energy Supply"] = pd.to_numeric(energy["Energy Supply"],errors="coerce")
energy["Energy_Supply"] = np.where((energy["Energy Supply"]%1==0),energy["Energy Supply"]*100,energy["Energy Supply"])
t1 = time.time()
print(t1-t)
5.275099515914917
But you could also simply do this after using pd.to_numeric():
energy["Energy Supply"] = energy["Energy Supply"]*1000000
You can use str.isnumeric
to check if a string is numeric and then multiply.
energy['Energy Supply'] = energy['Energy Supply'].apply(lambda x: int(x) * 1000000 if str(x).isnumeric() else x)
print (energy)
Country Energy Supply Energy Supply per Capita % Renewable
0 NaN Petajoules Gigajoules %
1 Afghanistan 321000000 10 78.6693
2 Albania 102000000 35 100
3 Algeria 1959000000 51 0.55101
4 American Samoa ... .. 0.641026
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.