I have written a function to cleanup the csv file by using regex function and exporting it to csv. input fn_file for the function will take a file from a folder 'x' with a filename file.csv and process the file and exports the processed file into 'x' folder as 'file_processed.csv' . While converting the dataframe to csv, it shows below error. How can I add header as columns to the file
Function
--------
process_file
Use regex to create a file with title,date and header
Parameters
----------
fn_file : str
Name of file scraped from WDET
fn_out : str
Name of new, reformatted, file
Returns
-------
Nothing returned. Just creates output file.
Example
-------
process_file('E:/data.csv' , 'E:/data_processed.csv')
raise ValueError('DataFrame constructor not properly called!') ValueError: DataFrame constructor not properly called!
s_df = pd.DataFrame(data = fn_file, columns = [header])
My code as given below
def process_file(fn_file , fn_csv_out):
s = re.compile(r'.*?word.+?\(\d{1,2}[ap]m-\d{1,2}[ap]m\)\s+$')
date = re.compile(r'(Sunday)\s+(\w+\s+\d+,\s+(2010))')
he = re.compile(r'\t\w+.\t\w+\t\w+\t\w+\s\(\w+\)\t\w+$')
son = re.compile(r'^.*\t\d+\t.+\t')
# Initialize counters
num_lines = 0
num_s = 0
num_date = 0
num_he = 0
num_son = 0
num_unmatched = 0
# Initialize empty list to store output lines
sonlines = []
# Initialize string vars to use for the show title and date
title = ''
date = ''
with open(fn_file) as f:
# Loop over the lines in the file
for line in f:
num_lines +=1
line = line.rstrip('\n')
m_s = re.match(s, line)
m_date = re.match(date, line)
m_he = re.match(he, line)
m_son = re.match(son, line)
if m_s:
num_s += 1
# Get the show title
ti = m_s.group()
elif m_date:
# it's a date line
num_date += 1
show_day = m_date.group(1)
s_date = m_date.group(2)
elif m_he:
# it's a header line
num_he += 1
heline = m_he.group()
elif m_son:
num_son += 1
son_group = m_son.group()
son = re.split(r'\t+', son_group)
son.insert(0,ti)
son.insert(1,s_date)
sonlines.append(son)
header = re.split(r'\t+',heline.rstrip('\t'))
header[0] = 'b'
header.insert(0,'ti')
header.insert(1,'s_date')
# Create pandas dataframe and export to csv
```lines throwing error
s_df = pd.DataFrame(data = fn_file, columns = [header])
s_df.to_csv(fn_csv_out, sep='\t', index= False)
Last two lines are throwing error, Can you please help on the error. Thanks in advance.
Problem was solved in comments. The variable was being passed to the DataFrame constructor. the fix it so change it to
s_df = pd.DataFrame(data = sonlines, columns = [header])
instead of
s_df = pd.DataFrame(data = fn_file, columns = [header])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.