简体   繁体   English

从整个 DataFrame 的字符串中删除 trailing.0

[英]Remove trailing .0 from strings of entire DataFrame

Hi I would like to remove all ".0" at the end of a string for an entire DataFrame and I need it to be an exact match.嗨,我想删除整个 DataFrame 字符串末尾的所有“.0”,我需要它完全匹配。

Let's make an example df:让我们举一个例子df:

a      b      c
20     39.0   17-50
34.0   .016.0   001-6784532

The desired output:所需的 output:

a      b      c
20     39     17-50
34     .016   001-6784532

I tried using replace but it didn't work for some reason (I read maybe because replace only replaces entire strings and not substrings?).我尝试使用replace ,但由于某种原因它不起作用(我读过可能是因为替换只替换整个字符串而不是子字符串?)。 Either way, if there is a way it can work I'm interested to hear about it because it would work for my dataframe but I feel it's less correct in case I'll have values like.016.0 beacause then it would also replace the first 2 characters.无论哪种方式,如果有一种方法可以工作,我很想知道它,因为它适用于我的 dataframe 但我觉得它不太正确,以防万一我有像.016.0 这样的值,因为它也将取代第一个2个字符。

Then I tried sub and rtrim with regex r'\.0$' but I didn't get this to work either.然后我用正则表达式r'\.0$'尝试了 sub 和 rtrim ,但我也没有让它工作。 I'm not sure if it's because of the regex or because these methods don't work on an entire dataframe.我不确定是因为正则表达式还是因为这些方法不适用于整个 dataframe。 Also using rtrim with .0 didn't work because it removes also zeros without a dot before and then 20 will become 2. When trying sub and rtrim with regex I got an error that dataframe doesn't have an attribute str , how is that possible?同样使用带有.0的 rtrim 也不起作用,因为它也删除了之前没有点的零,然后 20 将变为 2。当尝试使用正则表达式的 sub 和 rtrim 时,我收到一个错误,即 dataframe 没有属性str ,这是怎么回事可能的?

Is there anyway to do this without looping over all columns?无论如何都可以在不遍历所有列的情况下做到这一点?

Thank you!谢谢!

Let's try DataFrame.replace :让我们试试DataFrame.replace

import pandas as pd

df = pd.DataFrame({
    'a': ['20', '34.0'],
    'b': ['39.0', '.016.0'],
    'c': ['17-50', '001-6784532']
})

df = df.replace(r'\.0$', '', regex=True)

print(df)

Optional DataFrame.astype if the columns are not already str :如果列还不是str ,则可选DataFrame.astype

df = df.astype(str).replace(r'\.0$', '', regex=True)

Before:前:

      a       b            c
0    20    39.0        17-50
1  34.0  .016.0  001-6784532

After:后:

    a     b            c
0  20    39        17-50
1  34  .016  001-6784532

rtrim / rstrip will not work here as they don't parse regex but rather take a list of characters to remove. rtrim / rstrip在这里不起作用,因为它们不解析正则表达式,而是获取要删除的字符列表。 For this reason, they will remove all 0 because 0 is in the "list" to remove.出于这个原因,他们将删除所有0 ,因为0在要删除的“列表”中。

Conditionally replace;有条件更换; Use np.where().使用 np.where()。

df['b']=np.where(df['b'].str.contains('\.\d+\.'),df['b'].str.replace(r'\.\d+$','', regex=True), df['b'])



    a     b            c
0  20.0  39.0        17-50
1  34.0  .016  001-6784532

That is, where we have .digit(s).也就是说,我们有.digit(s). , replace .\digit(s) at the end , 最后替换.\digit(s)

For those who are going to export the DataFrame to a CSV (or other types), you can use the parameter float_format from Pandas to eliminate all trailing zeros from the entire DataFrame. For those who are going to export the DataFrame to a CSV (or other types), you can use the parameter float_format from Pandas to eliminate all trailing zeros from the entire DataFrame.

df.to_csv(path_to_file.csv, float_format='%g')

'%g' and other formats explanation . '%g' 等格式解释

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM