去掉 Dataframe 标头末尾的“\\xa0”标签

Question

I have a data set which contains some headers which end with no break space hex code.我有一个数据集，其中包含一些以没有中断空间十六进制代码结尾的标题。 Below is my attempt to get rid of that but it still exists there.下面是我试图摆脱它，但它仍然存在。

Input:输入：

files=[file1,file2,file3]
for f in files:
    for col in f.columns:
        col = col.replace("\xc2\xa0", "")
        col = col.replace(u'\xa0', u' ')
    print(f.columns.values)

Output:输出：

'Name' 'Date' 'rep_cur' 'Passenger Revenue\xa0' 'Cargo Revenue\xa0'
 'Other Revenue\xa0' 'Total Cargo & Other Revenue' 'Total Revenue\xa0'
 '% inc / (dec) to previous period' 'Employee Costs\xa0' 'Fuel and oil\xa0'

Answer 1

Use str.strip :使用str.strip ：

l = ['Name','Date','rep_cur','Passenger Revenue\xa0','Cargo Revenue\xa0',
 'Other Revenue\xa0','Total Cargo & Other Revenue','Total Revenue\xa0',
 '% inc / (dec) to previous period','Employee Costs\xa0','Fuel and oil\xa0']
new_l = [i.strip() for i in l]

Output:输出：

['Name',
 'Date',
 'rep_cur',
 'Passenger Revenue',
 'Cargo Revenue',
 'Other Revenue',
 'Total Cargo & Other Revenue',
 'Total Revenue',
 '% inc / (dec) to previous period',
 'Employee Costs',
 'Fuel and oil']

Answer 2

for col in f.columns:
    col = col.replace("\xc2\xa0", "")
    col = col.replace(u'\xa0', u' ')

That does nothing to the actual col used for the iteration.这对用于迭代的实际col没有任何影响。 That is pretty much equivalent to:这几乎等同于：

li = [1, 2, 3]
for n in li:
    n = n + 1
print(li)
# [1, 2, 3]

A decent IDE should show you a warning along the lines of " n (or col in your example) is redefined with no usage".一个体面的 IDE 应该向您显示“ n （或您的示例中的col ）被重新定义而没有使用”的警告。

Instead you should use the tools pandas provide, for example df.rename .相反，您应该使用 pandas 提供的工具，例如df.rename 。

df = pd.DataFrame({'a\xa0': []})

print(df.rename(lambda col: col.replace('\xa0', ''), axis='columns'))

Note that .rename returns a new dataframe.请注意， .rename返回一个新的数据帧。 You can use inplace=True to change the original dataframe:您可以使用inplace=True更改原始数据框：

df.rename(lambda col: col.replace('\xa0', ''), axis='columns', inplace=True)

If you don't want to be so fancy you could replace the columns' names yourself (which is similar to what your original code tried to do):如果你不想太花哨，你可以自己替换列的名称（这类似于你的原始代码尝试做的）：

df.columns = [column.replace('\xa0', '') for col in df.columns]

去掉 Dataframe 标头末尾的“\\xa0”标签

问题描述

2 个解决方案

解决方案1
0 2019-05-02 12:16:39

解决方案2
0 2019-05-02 12:17:55

去掉 Dataframe 标头末尾的“\\xa0”标签

问题描述

2 个解决方案

解决方案1 0 2019-05-02 12:16:39

解决方案2 0 2019-05-02 12:17:55

解决方案1
0 2019-05-02 12:16:39

解决方案2
0 2019-05-02 12:17:55