简体   繁体   English

在熊猫数据框中搜索和替换点和逗号

[英]Search and replace dots and commas in pandas dataframe

This is my DataFrame:这是我的数据帧:

d = {'col1': ['sku 1.1', 'sku 1.2', 'sku 1.3'], 'col2': ['9.876.543,21', 654, '321,01']}
df = pd.DataFrame(data=d)
df

       col1           col2
0   sku 1.1   9.876.543,21
1   sku 1.2            654
2   sku 1.3         321,01

Data in col2 are numbers in local format, which I would like to convert into: col2 中的数据是本地格式的数字,我想将其转换为:

      col2
9876543.21
       654
    321.01

I tried df['col2'] = pd.to_numeric(df['col2'], downcast='float') , which returns a ValueError: : Unable to parse string "9.876.543,21" at position 0.我试过df['col2'] = pd.to_numeric(df['col2'], downcast='float') ,它返回一个 ValueError:: Unable to parse string "9.876.543,21" at position 0.

I tried also df = df.apply(lambda x: x.str.replace(',', '.')) , which returns ValueError: could not convert string to float: '5.023.654.46'我也试过df = df.apply(lambda x: x.str.replace(',', '.')) ,它返回 ValueError: could not convert string to float: '5.023.654.​​46'

Thanks for your help!谢谢你的帮助!

The best is use if possible parameters in read_csv :如果可能,最好使用read_csv参数:

df = pd.read_csv(file, thousands='.', decimal=',')

If not possible, then replace should help:如果不可能,那么replace应该有帮助:

df['col2'] = (df['col2'].replace('\.','', regex=True)
                        .replace(',','.', regex=True)
                        .astype(float))

You can try你可以试试

df = df.apply(lambda x: x.replace(',', '&'))
df = df.apply(lambda x: x.replace('.', ','))
df = df.apply(lambda x: x.replace('&', '.'))

You are always better off using standard system facilities where they exist.最好使用现有的标准系统设施。 Knowing that some locales use commas and decimal points differently I could not believe that Pandas would not use the formats of the locale.知道某些语言环境以不同的方式使用逗号和小数点,我无法相信 Pandas 不会使用语言环境的格式。

Sure enough a quick search revealed this gist , which explains how to make use of locales to convert strings to numbers.果然,快速搜索揭示了 这个要点,它解释了如何利用语言环境将字符串转换为数字。 In essence you need to import locale and after you've built the dataframe call locale.setlocale to establish a locale that uses commas as decimal points and periods for separators, then apply the dataframe's applymapp method.本质上,您需要import locale并在构建数据locale.setlocale之后调用locale.setlocale来建立使用逗号作为小数点和句点作为分隔符的语言环境,然后应用数据applymappapplymapp方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM