Python - CSVs - How to delete certain characters only from the header of CSVs?

Question

I have CSVs in a folder, where I want to clean the headers (and only the headers) from special characters, then save the updated CSVs in a new folder.

The issue I'm having is that the special characters are removed from not only the headers, but also from the other rows below.

My code looks like this:

from pathlib import Path
import pandas as pd
import os

parent_dir = input("Enter CSV directory path:")
newdir = "Processed"
directory = os.path.join(parent_dir, newdir)
os.mkdir(directory)
csv_files = [f for f in Path(parent_dir).glob('*.csv')]

for csv in csv_files:
    data = pd.read_csv(csv, encoding = 'ISO-8859-1', engine='python', delimiter = ',')
    data.columns = data.columns.str.replace('[",@]','')
    data.to_csv(parent_dir + "/Processed/" + csv.name, index=False)

Any suggestions on correcting this?

Answer 1

try this

df.columns = df.columns.str.replace(r"[^a-zA-Z\d\_]+", "")

It will remove all characters except letters belonging to english alphabet, spaces and tabs

Answer 2

just replace the characters one by one like this

import pandas as pd

# generate sample df
foo = pd.DataFrame(columns=['a@', 'b[', 'c]'])

# select characters to drop
chars_to_drop = ['@', '[', ']']

for char in chars_to_drop:
   foo.columns = foo.columns.str.replace(char, '')

print(foo.columns)
>>> Index(['a', 'b', 'c'], dtype='object')

Answer 3

Can you try the following:

To remove from header:

data.columns = data.columns.str.replace('[^\w\s]|_', '', regex=True)

To remove from all rows

data = data.replace(r'[^\w\s]|_', '', regex=True)

Python - CSVs - How to delete certain characters only from the header of CSVs?

Question

3 answers

solution1
1 2022-08-24 07:50:31

solution2
0 2022-08-24 07:47:52

solution3
0 2022-08-24 07:49:45

Python - CSVs - How to delete certain characters only from the header of CSVs?

Question

3 answers

solution1 1 2022-08-24 07:50:31

solution2 0 2022-08-24 07:47:52

solution3 0 2022-08-24 07:49:45

solution1
1 2022-08-24 07:50:31

solution2
0 2022-08-24 07:47:52

solution3
0 2022-08-24 07:49:45