简体   繁体   English

如何在熊猫的数据框中删除 * 值?

[英]How do I drop * values in a data frame in pandas?

I am working with a data frame in pandas and some of the values in a certain column have * values.我正在使用 Pandas 中的数据框,并且某个列中的某些值具有*值。 When I try to run a visual on that column using Seaborn I get the following error:当我尝试使用 Seaborn 在该列上运行视觉效果时,出现以下错误:

ValueError: could not convert string to float: '*' ValueError: 无法将字符串转换为浮点数:'*'

I know what columns have * values in:我知道哪些列有*值:

0      347
1      332
2      324
3      310
4      347
      ... 
163      *
164      *
165      *
166    310
167    319
Name: MeanScore, Length: 168, dtype: object

You can do to following.你可以做以下。 In my example, I will have the data frame df with two columns y and z which might potentially have a * in them.在我的示例中,我将拥有包含两列yz的数据框df ,其中可能包含*

import pandas as pd

df = pd.DataFrame({
    'x': [1, 2, 3, 4],
    'y': [1, 4, '*', 16],
    'z': [2, 3, 5, '*'],
})

df[(df['y'] != '*') & (df['z'] != '*')].head()

You can use replace function.您可以使用replace功能。 As arguments just pass a dictionary according this scheme {I DON'T NEED IT: THIS IS EXACTLY WHAT I WANT}.作为参数,只需根据此方案传递字典{我不需要它:这正是我想要的}。

Ex:前任:

df = pd.DataFrame({"column1": ["a", "b", "a"]})
print(df)

  column1
0       a
1       b
2       a

df["column1"].replace({"a": "x", "b": "y"}, inplace=True)
print(df)

  column1
0       x
1       y
2       x

Adding to_numpy gives a better performance.添加to_numpy提供更好的性能。

# Example to reproduce solution
# We know that col_a and col_b contains '*'
df = pd.DataFrame(
    {
        "col_a": [1, 5, "*", 6, 8, "*"],
        "col_b": ["*", 6, 8, "*", 2, 4],
        "col_c": [1, 6, 8, 10, 2, 4],
    }
)
df = df[(df["col_a"].to_numpy() != "*") & (df["col_b"].to_numpy() != "*")]

And if you don't mind using numpy and the number of column is large you can use:如果您不介意使用numpy并且列数很大,您可以使用:

import numpy as np


def clean_asterisk(df, cols):
    """
    Drop asterisk in know columns
    
    Parameters:
    -----------
    df : pd.DataFrame
    DataFrame we want to clean
    
    cols : str or list of strings
    List of known columns with asterisks
    
    Returns:
    --------
    df : pd.DataFrame
    DataFrame cleaned without asterisk
    """
    if len(cols) == 0:
        raise ValueError(
            "Pass at least a list of one element or a string with one character"
        )
    if len(cols) == 1 or isinstance(cols, str):
        try:
            df = df[df[cols].to_numpy() != "*"]
            return df
        except KeyError:
            print(f"Column {cols} must be in the DataFrame")
    try:
        df = df[np.bitwise_and.reduce([df[col] != "*" for col in cols])]
        return df
    except KeyError:
        print(f"Column {cols} must be in the DataFrame")

df = clean_asterisk(df,["col_a","col_b"])

The last approach is much more scalable but too complex for small examples.最后一种方法更具可扩展性,但对于小示例来说太复杂了。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用1行代码基于熊猫数据框中的值删除多行? - How do I drop multiple rows based on values in a pandas data frame with 1 line of code? 如何将顺序值添加到 pandas 数据帧? - How do I add sequential values to a pandas data frame? Python/Pandas:如何根据个人 ID 替换 Pandas 数据框的特定值? - Python/Pandas: How do I replace specific values of a Pandas Data Frame based on individual id? 如何根据 Pandas 数据框中的元数据字典创建相应的值? - How do I create corresponding values based on the metadata dictionary in pandas data frame? 如何仅取消堆叠数据框的某些值(pandas/python)? - How do I unstack only some values of a data frame (pandas/python)? 如何根据字符串值列表对熊猫数据框进行子集设置? - How do I subset a pandas data frame based on a list of string values? 如何遍历 pandas 数据框以估算另一个数据框中存在的缺失值? - How do I iterate over a pandas dataframe to impute missing values that are present in another data frame? 如何在循环中从字典中用值替换 pandas 数据帧中的字符串中的键? - How do I replace keys in strings in a pandas data frame in a loop from dictionaries with values? 如何使用pandas数据框删除不在群集中的值? - How do you remove values not in a cluster using a pandas data frame? 如何转换 Pandas 数据框中的一列数据? - How do I convert a column of data in a Pandas data frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM