简体   繁体   English

如何从数字数据中删除除“。”之外的所有字符串值和“-”使用 pandas

[英]How to remove all string values from numerical data except "." and "-" using pandas

I have a column with numerical data and it has some string values like $, # etc. attached to every column.我有一列包含数字数据,它有一些字符串值,如 $、# 等附加到每一列。 My numerical data is like this:我的数值数据是这样的:

SIZE = [ 10 OZ, 20 OZ, 2.5.尺寸 = [ 10 盎司,20 盎司,2.5。 OZ, #30.1 OZ, ,2 OZ, 1-8 OZ, 1-7OZ, 20 OZ]盎司,#30.1 盎司,.2 盎司,1-8 盎司,1-7 盎司,20 盎司]

But when I delete all the string characters, it also removes the "."但是当我删除所有字符串字符时,它也会删除“。” and "-" characters, which I don't want to remove.和“-”字符,我不想删除。 How can remove string values from numerical column except some strings like decimal and "-" using pandas?除了使用 pandas 的小数和“-”等字符串之外,如何从数字列中删除字符串值?

my desire output is like this我的愿望 output 是这样的

SIZE = [ 10, 20, 2.5, 30.1, 2, 1-8, 1-7, 20]尺寸 = [ 10, 20, 2.5, 30.1, 2, 1-8, 1-7, 20]

and this is my sample data just to simplify, in my actual data i have around 600 values.这是我的示例数据,只是为了简化,在我的实际数据中,我有大约 600 个值。

Haven't worked with pandas but you can use this regex to get the required results.尚未使用 pandas 但您可以使用此正则表达式来获得所需的结果。

import re
re.sub("[^0-9^.^-]", "", "sdkjh987978asd098.as0980-a98sd")

Try this:尝试这个:

import re

full_pattern = re.compile(r"[^\d,.-]+")

def re_replace(data_list):
    new_data = []
    for data in data_list:
         new_data.append(re.sub(full_pattern, '', data))
    return new_data

data = [ "10 OZ" , "20 OZ", "2.5. OZ" , "#30.1 OZ", "!2 O Z" , "1-8 OZ", "1-7OZ", "20 OZ"]
st = re_replace(data)

print(st)

Output: Output:

['10', '20', '2.5.', '30.1', '2', '1-8', '1-7', '20']

Pandas' .str.replace() takes regular expressions, too. Pandas 的.str.replace()也接受正则表达式。

import pandas as pd

SIZE = [ "10 OZ" , "20 OZ", "2.5. OZ" , "#30.1 OZ", "!2 O Z" , "1-8 OZ", "1-7OZ", "20 OZ"]

df = pd.DataFrame({"SIZE": SIZE})

# remove everything that's not a number, dot or hyphen and strip leading/trailing dots
df["SIZE"] = (df.SIZE
.str.replace("[^0-9.-]+","", regex=True) 
.str.strip("."))

Result:结果:

>>> df
SIZE
0    10
1    20
2   2.5
3  30.1
4     2
5   1-8
6   1-7
7    20

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从字符串中删除除数字字符之外的所有字母字符? 尝试了所有现有的答案 - How do you remove all the alphabetic characters from the string except the numerical characters? tried all the present answers 如何从字典中的列表中删除所有非数值? - How to remove all non numerical values from a list in dictionary? 如何从同时具有数字和非数字数据的 pandas DataFrame 中删除异常值 - How do I remove outliers from a pandas DataFrame that has both numerical and non-numerical data 如何从 TFRecordDataset 中删除除第一条记录之外的所有数据 - How to remove all data from TFRecordDataset except the first record 从列表中删除除字符串之外的所有元素 - Remove all elements except string from list 从具有字符串值和数字值的numpy数组中删除NaN - Remove NaNs from numpy array that has string values and numerical values 对于 pandas 中的列中的所有值,如何从字符串中删除字符并将 rest 转换为 integer 或十进制? - How to remove a character from a string and convert the rest into integer or decimal, for all values in a column in pandas? Pandas Dataframe:从字符串中提取数值(包括小数) - Pandas Dataframe: Extract numerical values (including decimals) from string 如何删除 pandas 中字符前面的所有字符串值? - How to remove all string values that precede a character in pandas? 使用熊猫数据框中的不同参考行比较数值 - Compare numerical values using different reference rows in pandas data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM