[英]Get rid of "\xa0" tag at the end of header in Dataframe
I have a data set which contains some headers which end with no break space hex code.我有一个数据集,其中包含一些以没有中断空间十六进制代码结尾的标题。 Below is my attempt to get rid of that but it still exists there.
下面是我试图摆脱它,但它仍然存在。
Input:输入:
files=[file1,file2,file3]
for f in files:
for col in f.columns:
col = col.replace("\xc2\xa0", "")
col = col.replace(u'\xa0', u' ')
print(f.columns.values)
Output:输出:
'Name' 'Date' 'rep_cur' 'Passenger Revenue\xa0' 'Cargo Revenue\xa0'
'Other Revenue\xa0' 'Total Cargo & Other Revenue' 'Total Revenue\xa0'
'% inc / (dec) to previous period' 'Employee Costs\xa0' 'Fuel and oil\xa0'
Use str.strip
:使用
str.strip
:
l = ['Name','Date','rep_cur','Passenger Revenue\xa0','Cargo Revenue\xa0',
'Other Revenue\xa0','Total Cargo & Other Revenue','Total Revenue\xa0',
'% inc / (dec) to previous period','Employee Costs\xa0','Fuel and oil\xa0']
new_l = [i.strip() for i in l]
Output:输出:
['Name',
'Date',
'rep_cur',
'Passenger Revenue',
'Cargo Revenue',
'Other Revenue',
'Total Cargo & Other Revenue',
'Total Revenue',
'% inc / (dec) to previous period',
'Employee Costs',
'Fuel and oil']
for col in f.columns:
col = col.replace("\xc2\xa0", "")
col = col.replace(u'\xa0', u' ')
That does nothing to the actual col
used for the iteration.这对用于迭代的实际
col
没有任何影响。 That is pretty much equivalent to:这几乎等同于:
li = [1, 2, 3]
for n in li:
n = n + 1
print(li)
# [1, 2, 3]
A decent IDE should show you a warning along the lines of " n
(or col
in your example) is redefined with no usage".一个体面的 IDE 应该向您显示“
n
(或您的示例中的col
)被重新定义而没有使用”的警告。
Instead you should use the tools pandas provide, for example df.rename
.相反,您应该使用 pandas 提供的工具,例如
df.rename
。
df = pd.DataFrame({'a\xa0': []})
print(df.rename(lambda col: col.replace('\xa0', ''), axis='columns'))
Note that .rename
returns a new dataframe.请注意,
.rename
返回一个新的数据帧。 You can use inplace=True
to change the original dataframe:您可以使用
inplace=True
更改原始数据框:
df.rename(lambda col: col.replace('\xa0', ''), axis='columns', inplace=True)
If you don't want to be so fancy you could replace the columns' names yourself (which is similar to what your original code tried to do):如果你不想太花哨,你可以自己替换列的名称(这类似于你的原始代码尝试做的):
df.columns = [column.replace('\xa0', '') for col in df.columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.