简体   繁体   English

Python - CSV - 如何仅从 CSV 的 header 中删除某些字符?

[英]Python - CSVs - How to delete certain characters only from the header of CSVs?

I have CSVs in a folder, where I want to clean the headers (and only the headers) from special characters, then save the updated CSVs in a new folder.我在一个文件夹中有 CSV,我想从特殊字符中清除标题(并且只有标题),然后将更新的 CSV 保存在一个新文件夹中。

The issue I'm having is that the special characters are removed from not only the headers, but also from the other rows below.我遇到的问题是,特殊字符不仅从标题中删除,还从下面的其他行中删除。

My code looks like this:我的代码如下所示:

from pathlib import Path
import pandas as pd
import os

parent_dir = input("Enter CSV directory path:")
newdir = "Processed"
directory = os.path.join(parent_dir, newdir)
os.mkdir(directory)
csv_files = [f for f in Path(parent_dir).glob('*.csv')]

for csv in csv_files:
    data = pd.read_csv(csv, encoding = 'ISO-8859-1', engine='python', delimiter = ',')
    data.columns = data.columns.str.replace('[",@]','')
    data.to_csv(parent_dir + "/Processed/" + csv.name, index=False)

Any suggestions on correcting this?有关纠正此问题的任何建议?

try this尝试这个

df.columns = df.columns.str.replace(r"[^a-zA-Z\d\_]+", "")

It will remove all characters except letters belonging to english alphabet, spaces and tabs它将删除除属于英文字母、空格和制表符的字母之外的所有字符

just replace the characters one by one like this像这样一一替换字符

import pandas as pd

# generate sample df
foo = pd.DataFrame(columns=['a@', 'b[', 'c]'])

# select characters to drop
chars_to_drop = ['@', '[', ']']

for char in chars_to_drop:
   foo.columns = foo.columns.str.replace(char, '')

print(foo.columns)
>>> Index(['a', 'b', 'c'], dtype='object')

Can you try the following:您可以尝试以下方法:

To remove from header:要从 header 中删除:

data.columns = data.columns.str.replace('[^\w\s]|_', '', regex=True)

To remove from all rows从所有行中删除

data = data.replace(r'[^\w\s]|_', '', regex=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM