简体   繁体   English

如何迭代多列数据框中的每个列值?

[英]How to iterate over each individual column values in multiple column dataframe?

I have multiple column data frame with columns ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable'] . 我有多个列数据框,其中列['国家','能源供应','人均能源供应','可再生%']

In the energy supply column, I want to convert the unit of the column to Peta from Giga. 在能源供应栏中,我想将列的单位从千兆转换为Peta。 But in the process energy['Energy Supply']*= energy['Energy Supply'] , when the value is like "...." (missing value is denoted by this), is also getting multiplied or say duplicated. 但是在过程中energy['Energy Supply']*= energy['Energy Supply'] ,当值类似于“......”(缺失值由此表示)时,也会增加或说重复。 Also, the string value in the column is also getting multiplied. 此外,列中的字符串值也会成倍增加。 (For eg original: Peta, after operation: PetaPetaPetaPeta...). (例如原版:Peta,术后:PetaPetaPetaPeta ......)。

To stop this from happening, I am running this: 为了阻止这种情况发生,我正在运行这个:

energy = pd.read_excel("Energy Indicators.xls",skiprows = 16, skip_footer = 38)
energy.drop(['Unnamed: 0','Unnamed: 1'],axis = 1, inplace = True)
energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
for i in energy['Energy Supply']:
    if (isinstance(energy[i],int) == True):
        energy['Energy Supply'][i]=energy['Energy Supply'][i]*1000000
return (energy)

But I am not getting the result ie to change the value of integer type variables only, and nothing is changing. 但我没有得到结果,即只改变整数类型变量的值,没有任何变化。

Where I think the problem lies in, the first two rows will give the false condition, as first rows are "String" and based on that, the program is not modifying the values, whereas I want to individually check if the value is of integer type and if it is, Multiplies the number by 1,000,000. 我认为问题在于,前两行将给出错误条件,因为第一行是“String”并且基于此,程序不修改值,而我想单独检查值是否为整数类型,如果是,则将数字乘以1,000,000。

Input: 输入:

    Country        Energy Supply    Energy Supply per Capita    % Renewable
0   NaN             Petajoules            Gigajoules                 %
1   Afghanistan        321                   10                  78.6693
2   Albania            102                   35                    100
3   Algeria            1959                  51                  0.55101
4   American Samoa      ...                 ...                  0.641026

Expected Output: 预期产出:

    Country        Energy Supply    Energy Supply per Capita    % Renewable
0   NaN             Petajoules            Gigajoules                 %
1   Afghanistan        3210000                10                     78.6693
2   Albania            1020000                35                      100
3   Algeria            19590000               51                     0.55101
4   American Samoa      ...                 ...                    0.641026

Current Output: 电流输出:

    Country        Energy Supply    Energy Supply per Capita    % Renewable
0   NaN             PetajoulesPeta.         Gigajoules               %
1   Afghanistan        3210000                10                   78.6693
2   Albania            1020000                35                    100
3   Algeria            19590000               51                   0.55101
4   American Samoa      ........                ...                0.641026

This worked for me with a million values: 这对我来说有一百万个值:

import pandas as pd
import numpy as np 

data = {"Energy Supply":[1,30,"Petajoules",5,70]*2000000}

energy = pd.DataFrame(data)

input: 输入:

Energy Supply
0                   1
1                  30
2          Petajoules
3                   5
4                  70
5                   1
6                  30
7          Petajoules
8                   5
9                  70
10                  1
11                 30
12         Petajoules
13                  5
14                 70
15                  1
16                 30
17         Petajoules
18                  5
19                 70
20                  1
21                 30
22         Petajoules
23                  5
24                 70
25                  1
26                 30
27         Petajoules
28                  5
29                 70
              ...
[10000000 rows x 1 columns]

Then i transform the Series into an array and set the values: 然后我将Series转换为数组并设置值:

arr = energy["Energy Supply"].values

for i in range(len(arr)):
    if isinstance(arr[i],int):
        arr[i] = arr[i]*1000000
    else:
        pass

The output looks like this: 输出如下所示:

        Energy Supply
0             1000000
1            30000000
2          Petajoules
3             5000000
4            70000000
5             1000000
6            30000000
7          Petajoules
8             5000000
9            70000000
10            1000000
11           30000000
12         Petajoules
13            5000000
14           70000000
15            1000000
16           30000000
17         Petajoules
18            5000000
19           70000000
20            1000000
21           30000000
22         Petajoules
23            5000000
24           70000000
25            1000000
26           30000000
27         Petajoules
28            5000000
29           70000000
              ...
[10000000 rows x 1 columns]

This solution is about twice as fast as an apply: 此解决方案的速度约为应用程序的两倍:

Looping through an array: 循环遍历数组:

loop: 100%|██████████| 10000000/10000000 [00:07<00:00, 1376439.75it/s]

Using Apply: 使用申请:

apply: 100%|██████████| 10000000/10000000 [00:14<00:00, 687420.00it/s]

If you convert the series to numeric then the string values become nan values. 如果将系列转换为数字,则字符串值将变为nan值。 Using np.where you need about 5 seconds for both converting the series to numeric and multiplying the values: 使用np.where,您需要大约5秒钟才能将系列转换为数字并乘以值:

import pandas as pd
import numpy as np 
import time

data = {"Energy Supply":[1,30,"Petajoules",5,70]*2000000}

energy = pd.DataFrame(data)
t = time.time()

energy["Energy Supply"] = pd.to_numeric(energy["Energy Supply"],errors="coerce")

energy["Energy_Supply"] = np.where((energy["Energy Supply"]%1==0),energy["Energy Supply"]*100,energy["Energy Supply"])
t1 = time.time()
print(t1-t)
5.275099515914917

But you could also simply do this after using pd.to_numeric(): 但是您也可以在使用pd.to_numeric()后执行此操作:

energy["Energy Supply"] = energy["Energy Supply"]*1000000

You can use str.isnumeric to check if a string is numeric and then multiply. 您可以使用str.isnumeric检查字符串是否为数字,然后相乘。

energy['Energy Supply'] = energy['Energy Supply'].apply(lambda x: int(x) * 1000000 if str(x).isnumeric() else x)

print (energy)

    Country         Energy Supply   Energy Supply per Capita    % Renewable
0             NaN    Petajoules           Gigajoules                     %
1     Afghanistan    321000000                10                   78.6693
2         Albania    102000000                35                       100
3         Algeria    1959000000               51                   0.55101 
4  American Samoa        ...                  ..                  0.641026

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM