[英]How to iterate over each individual column values in multiple column dataframe?
我有多个列数据框,其中列['国家','能源供应','人均能源供应','可再生%'] 。
在能源供应栏中,我想将列的单位从千兆转换为Peta。 但是在过程中energy['Energy Supply']*= energy['Energy Supply']
,当值类似于“......”(缺失值由此表示)时,也会增加或说重复。 此外,列中的字符串值也会成倍增加。 (例如原版:Peta,术后:PetaPetaPetaPeta ......)。
为了阻止这种情况发生,我正在运行这个:
energy = pd.read_excel("Energy Indicators.xls",skiprows = 16, skip_footer = 38)
energy.drop(['Unnamed: 0','Unnamed: 1'],axis = 1, inplace = True)
energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
for i in energy['Energy Supply']:
if (isinstance(energy[i],int) == True):
energy['Energy Supply'][i]=energy['Energy Supply'][i]*1000000
return (energy)
但我没有得到结果,即只改变整数类型变量的值,没有任何变化。
我认为问题在于,前两行将给出错误条件,因为第一行是“String”并且基于此,程序不修改值,而我想单独检查值是否为整数类型,如果是,则将数字乘以1,000,000。
输入:
Country Energy Supply Energy Supply per Capita % Renewable
0 NaN Petajoules Gigajoules %
1 Afghanistan 321 10 78.6693
2 Albania 102 35 100
3 Algeria 1959 51 0.55101
4 American Samoa ... ... 0.641026
预期产出:
Country Energy Supply Energy Supply per Capita % Renewable
0 NaN Petajoules Gigajoules %
1 Afghanistan 3210000 10 78.6693
2 Albania 1020000 35 100
3 Algeria 19590000 51 0.55101
4 American Samoa ... ... 0.641026
电流输出:
Country Energy Supply Energy Supply per Capita % Renewable
0 NaN PetajoulesPeta. Gigajoules %
1 Afghanistan 3210000 10 78.6693
2 Albania 1020000 35 100
3 Algeria 19590000 51 0.55101
4 American Samoa ........ ... 0.641026
这对我来说有一百万个值:
import pandas as pd
import numpy as np
data = {"Energy Supply":[1,30,"Petajoules",5,70]*2000000}
energy = pd.DataFrame(data)
输入:
Energy Supply
0 1
1 30
2 Petajoules
3 5
4 70
5 1
6 30
7 Petajoules
8 5
9 70
10 1
11 30
12 Petajoules
13 5
14 70
15 1
16 30
17 Petajoules
18 5
19 70
20 1
21 30
22 Petajoules
23 5
24 70
25 1
26 30
27 Petajoules
28 5
29 70
...
[10000000 rows x 1 columns]
然后我将Series转换为数组并设置值:
arr = energy["Energy Supply"].values
for i in range(len(arr)):
if isinstance(arr[i],int):
arr[i] = arr[i]*1000000
else:
pass
输出如下所示:
Energy Supply
0 1000000
1 30000000
2 Petajoules
3 5000000
4 70000000
5 1000000
6 30000000
7 Petajoules
8 5000000
9 70000000
10 1000000
11 30000000
12 Petajoules
13 5000000
14 70000000
15 1000000
16 30000000
17 Petajoules
18 5000000
19 70000000
20 1000000
21 30000000
22 Petajoules
23 5000000
24 70000000
25 1000000
26 30000000
27 Petajoules
28 5000000
29 70000000
...
[10000000 rows x 1 columns]
此解决方案的速度约为应用程序的两倍:
循环遍历数组:
loop: 100%|██████████| 10000000/10000000 [00:07<00:00, 1376439.75it/s]
使用申请:
apply: 100%|██████████| 10000000/10000000 [00:14<00:00, 687420.00it/s]
如果将系列转换为数字,则字符串值将变为nan值。 使用np.where,您需要大约5秒钟才能将系列转换为数字并乘以值:
import pandas as pd
import numpy as np
import time
data = {"Energy Supply":[1,30,"Petajoules",5,70]*2000000}
energy = pd.DataFrame(data)
t = time.time()
energy["Energy Supply"] = pd.to_numeric(energy["Energy Supply"],errors="coerce")
energy["Energy_Supply"] = np.where((energy["Energy Supply"]%1==0),energy["Energy Supply"]*100,energy["Energy Supply"])
t1 = time.time()
print(t1-t)
5.275099515914917
但是您也可以在使用pd.to_numeric()后执行此操作:
energy["Energy Supply"] = energy["Energy Supply"]*1000000
您可以使用str.isnumeric
检查字符串是否为数字,然后相乘。
energy['Energy Supply'] = energy['Energy Supply'].apply(lambda x: int(x) * 1000000 if str(x).isnumeric() else x)
print (energy)
Country Energy Supply Energy Supply per Capita % Renewable
0 NaN Petajoules Gigajoules %
1 Afghanistan 321000000 10 78.6693
2 Albania 102000000 35 100
3 Algeria 1959000000 51 0.55101
4 American Samoa ... .. 0.641026
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.