转换pandas中列的值

Question

我有以下格式的csv

Used CPU    Used Memory Hard CPU    Hard Memory
    1       4Gi         50          24Gi
    0       0           0           0
    2       4Gi         4           8Gi
    2       4Gi         4           8Gi
    0       0           100m        128Mi
    51550m  39528Mi     56          47Gi

它们是字符串值。 在此表中，51550m表示我需要转换为核心的毫微量。 39528Mi是Mebibyte，我需要转换为gibibyte（左右）。 我想知道如何逐列读取每个值，如果我找到m （如51550m），将其转换为核心。 然后将列的所有值转换为整数，以便我可以将它们全部添加。

我想用大熊猫，但我对它很新。 我知道我可以尝试df["col_name"].astype("int")转换为整数但我还需要解释毫位值来转换它们。

任何帮助深表感谢。

预期输出：所有值必须为浮点数。 我从谈话中得到了以下内容

100 millicore = 1/10 cores
1 Mebibyte = 0.00104858 GB


  Used CPU  Used Memory Hard CPU    Hard Memory
        1       4.296       50          25.7698 
        2       4.296       4           8.592
        2       4.296       4           8.592
        0       0           .1          0.134218 
        51.550  41.448112   56          50.4659

Answer 1

你可以做这样的事情。

更新：

df = pd.read_csv("your_csv_file")

'''df = pd.DataFrame({'Used CPU':['1','0','2','2','0','51550m'], \
                   'Used Memory':['4Gi','0','4Gi','4Gi','0', '39528Mi'], \
                   'Hard CPU':['50','0','4','4','100m','56'], \
                   'Hard Memory':['24Gi','0','8Gi', '8Gi', '128Mi', '47Gi']})'''

units = {'m':0.001,'Mi':0.00104858,'Gi':1.0737425}
def conversion(x):
    for key in units.keys():
        if key in str(x):
            x = x.split(key)[0]
            x = (int(x)*units[key])
            return x
    return str(x)

df = df.applymap(conversion)
df = df.apply(lambda x: x.astype(np.float64), axis=1)
print(df)

INPUT：

   Hard CPU  Hard Memory  Used CPU  Used Memory
0  50        24Gi         1         4Gi
1  0         0            0         0
2  4         8Gi          2         4Gi
3  4         8Gi          2         4Gi
4  100m      128Mi        0         0
5  56        47Gi         51550m    39528Mi

OUTPUT：

    Hard CPU  Hard Memory  Used CPU  Used Memory
0   50.0      25.76980     1.00      4.29497
1   0.0       0.000000     0.00      0.00000
2   4.0       8.589940     2.00      4.29497
3   4.0       8.589940     2.00      4.29497
4   0.1       0.134218     0.00      0.00000
5   56.0      50.465898    51.55     41.44827

他们在Float64。 现在你可以使用df['Hard Memory'] + df['Used Memory']

Answer 2

我没有找到任何简单的方法，这里有一种脏的方法基本上，你的列包含不同的字符串（Gi和Mi），需要单独计算。 所以，你可以做这样的事情。 另外，我在这里缺少Hard CPU列的计算，但是这个想法是相同的，基本上你可以使用相同的模式（如使用的CPU列）。

df['Used CPU'] = np.where(df['Used CPU'].str.contains('m'),
                          pd.to_numeric(df['Used CPU'].map(lambda x:str(x)[:-1])) /1000,
                          df['Used CPU'])

df['Used Memory'] = np.where(df['Used Memory'].str.contains('Mi'),
                          pd.to_numeric(df['Used Memory'].map(lambda x:str(x)[:-2])) * 0.00104858,
                          df['Used Memory'])

df['Hard Memory'] = np.where(df['Hard Memory'].str.contains('Gi'),
                          pd.to_numeric(df['Hard Memory'].map(lambda x:str(x)[:-2])) *(use math conversion here),
                          df['Hard Memory'])

现在，对于第二列，也有Gi值，因此您可以像这样重复相同的值

df['Used Memory'] = np.where(df['Used Memory'].str.contains('Gi'),
                          pd.to_numeric(df['Used Memory'].map(lambda x:str(x)[:-2])) * (do math conversion here),
                          df['Used Memory'])

由于列中的每个项目都需要不同的数学转换，因此存在此类字符串。 我能想到的简单可能的解决方案就是这样 对于那个很抱歉

Answer 3

在熊猫中制作自定义功能非常容易。 也许你可以试试这些：

# import
import pandas as pd
# reading file
df = pd.read_csv("PATH_TO_CSV_FILE")

def func_CPU(x):
    """ function for CPU related columns"""
    if x[-1] == "m":
        return float(x[:-1])/1000
    else: return x

def func_Memory(x):
    """ function for Memory related columns"""
    if x[-2:] == "Gi":
        return float(x[:-2]) * 1024 *0.00104858
    elif x[-2:] == "Mi":
        return float(x[:-2]) * 0.00104858
    else: return x



df["Used_CPU"] = df["Used_CPU"].apply(func_CPU)
df["Used_Memory"] = df["Used_Memory"].apply(func_Memory)
df["Hard_CPU"] = df["Hard_CPU"].apply(func_CPU)
df["Hard_Memory"] = df["Hard_Memory"].apply(func_Memory)
print(df)

转换pandas中列的值

问题描述

3 个解决方案

解决方案1
2 2018-06-20 13:24:00

解决方案2
1 2018-06-20 12:51:00

解决方案3
1 已采纳 2018-06-20 13:20:10

转换pandas中列的值

问题描述

3 个解决方案

解决方案1 2 2018-06-20 13:24:00

解决方案2 1 2018-06-20 12:51:00

解决方案3 1 已采纳 2018-06-20 13:20:10

解决方案1
2 2018-06-20 13:24:00

解决方案2
1 2018-06-20 12:51:00

解决方案3
1 已采纳 2018-06-20 13:20:10