简体   繁体   English

如何从范围中替换 CSV 列中的值?

[英]How can I replace values in a CSV column from a range?

I am attempting to change the values of two columns in my dataset from specific numeric values (2, 10, 25 etc.) to single values (1, 2, 3 or 4) based on the percentile of the specific value within the dataset.我试图根据数据集中特定值的百分位将数据集中两列的值从特定数值(2、10、25 等)更改为单个值(1、2、3 或 4)。

Using the pandas quantile() function I have got the ranges I wish to replace between, but I haven't figured out a working method to do so.使用 pandas quantile()函数,我得到了我希望在其间替换的范围,但我还没有想出一种工作方法来这样做。

age1 = datasetNB.Age.quantile(0.25)
age2 = datasetNB.Age.quantile(0.5)
age3 = datasetNB.Age.quantile(0.75)

fare1 = datasetNB.Fare.quantile(0.25)
fare2 = datasetNB.Fare.quantile(0.5)
fare3 = datasetNB.Fare.quantile(0.75)

My current solution attempt for this problem is as follows:我目前针对这个问题的解决方案尝试如下:

for elem in datasetNB['Age']:
    if elem <= age1:
        datasetNB[elem].replace(to_replace = elem, value = 1)
        print("set to 1")
    elif (elem > age1) & (elem <= age2):
        datasetNB[elem].replace(to_replace = elem, value = 2)
        print("set to 2")
    elif (elem > age2) & (elem <= age3):
        datasetNB[elem].replace(to_replace = elem, value = 3)
        print("set to 3")
    elif elem > age3:
        datasetNB[elem].replace(to_replace = elem, value = 4)
        print("set to 4")
    else:
        pass

for elem in datasetNB['Fare']:
    if elem <= fare1:
        datasetNB[elem] = 1
    elif (elem > fare1) & (elem <= fare2):
        datasetNB[elem] = 2
    elif (elem > fare2) & (elem <= fare3):
        datasetNB[elem] = 3
    elif elem > fare3:
        datasetNB[elem] = 4
    else:
        pass

What should I do to get this working?我应该怎么做才能让它发挥作用?

pandas already has one function to do that, pandas.qcut . pandas已经有一个函数可以做到这一点, pandas.qcut

You can simply do你可以简单地做

q_list = [0, 0.25, 0.5, 0.75, 1]
labels = range(1, 5)

df['Age'] = pd.qcut(df['Age'], q_list, labels=labels) 
df['Fare'] = pd.qcut(df['Fare'], q_list, labels=labels) 

Input输入

import numpy as np
import pandas as pd

# Generate fake data for the sake of example 
df = pd.DataFrame({
    'Age': np.random.randint(10, size=6),
    'Fare': np.random.randint(10, size=6)
})

>>> df 

   Age  Fare
0    1     6
1    8     2
2    0     0
3    1     9
4    9     6
5    2     2

Output输出

DataFrame after running the above code运行上述代码后的DataFrame

>>> df

  Age Fare
0   1    3
1   4    1
2   1    1
3   1    4
4   4    3
5   3    1

Note that in your specific case, since you want quartiles, you can just assign q_list = 4 .请注意,在您的特定情况下,由于您想要四分位数,您可以只分配q_list = 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM