[英]How do I drop * values in a data frame in pandas?
I am working with a data frame in pandas and some of the values in a certain column have *
values.我正在使用 Pandas 中的数据框,并且某个列中的某些值具有
*
值。 When I try to run a visual on that column using Seaborn I get the following error:当我尝试使用 Seaborn 在该列上运行视觉效果时,出现以下错误:
ValueError: could not convert string to float: '*'
ValueError: 无法将字符串转换为浮点数:'*'
I know what columns have *
values in:我知道哪些列有
*
值:
0 347
1 332
2 324
3 310
4 347
...
163 *
164 *
165 *
166 310
167 319
Name: MeanScore, Length: 168, dtype: object
You can do to following.你可以做以下。 In my example, I will have the data frame
df
with two columns y
and z
which might potentially have a *
in them.在我的示例中,我将拥有包含两列
y
和z
的数据框df
,其中可能包含*
。
import pandas as pd
df = pd.DataFrame({
'x': [1, 2, 3, 4],
'y': [1, 4, '*', 16],
'z': [2, 3, 5, '*'],
})
df[(df['y'] != '*') & (df['z'] != '*')].head()
You can use replace
function.您可以使用
replace
功能。 As arguments just pass a dictionary according this scheme {I DON'T NEED IT: THIS IS EXACTLY WHAT I WANT}.作为参数,只需根据此方案传递字典{我不需要它:这正是我想要的}。
Ex:前任:
df = pd.DataFrame({"column1": ["a", "b", "a"]})
print(df)
column1
0 a
1 b
2 a
df["column1"].replace({"a": "x", "b": "y"}, inplace=True)
print(df)
column1
0 x
1 y
2 x
Adding to_numpy
gives a better performance.添加
to_numpy
提供更好的性能。
# Example to reproduce solution
# We know that col_a and col_b contains '*'
df = pd.DataFrame(
{
"col_a": [1, 5, "*", 6, 8, "*"],
"col_b": ["*", 6, 8, "*", 2, 4],
"col_c": [1, 6, 8, 10, 2, 4],
}
)
df = df[(df["col_a"].to_numpy() != "*") & (df["col_b"].to_numpy() != "*")]
And if you don't mind using numpy
and the number of column is large you can use:如果您不介意使用
numpy
并且列数很大,您可以使用:
import numpy as np
def clean_asterisk(df, cols):
"""
Drop asterisk in know columns
Parameters:
-----------
df : pd.DataFrame
DataFrame we want to clean
cols : str or list of strings
List of known columns with asterisks
Returns:
--------
df : pd.DataFrame
DataFrame cleaned without asterisk
"""
if len(cols) == 0:
raise ValueError(
"Pass at least a list of one element or a string with one character"
)
if len(cols) == 1 or isinstance(cols, str):
try:
df = df[df[cols].to_numpy() != "*"]
return df
except KeyError:
print(f"Column {cols} must be in the DataFrame")
try:
df = df[np.bitwise_and.reduce([df[col] != "*" for col in cols])]
return df
except KeyError:
print(f"Column {cols} must be in the DataFrame")
df = clean_asterisk(df,["col_a","col_b"])
The last approach is much more scalable but too complex for small examples.最后一种方法更具可扩展性,但对于小示例来说太复杂了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.