简体   繁体   中英

Have an issue converting a dataframe to csv

I have written a function to cleanup the csv file by using regex function and exporting it to csv. input fn_file for the function will take a file from a folder 'x' with a filename file.csv and process the file and exports the processed file into 'x' folder as 'file_processed.csv' . While converting the dataframe to csv, it shows below error. How can I add header as columns to the file

Function
--------
process_file

Use regex to create a file with title,date and header

Parameters
----------
fn_file : str
    Name of file scraped from WDET

fn_out : str
    Name of new, reformatted, file

Returns
-------
Nothing returned. Just creates output file.

Example
-------
process_file('E:/data.csv' , 'E:/data_processed.csv')

error is in the line

raise ValueError('DataFrame constructor not properly called!') ValueError: DataFrame constructor not properly called!

s_df = pd.DataFrame(data = fn_file, columns = [header])

My code as given below


def process_file(fn_file , fn_csv_out):

    s = re.compile(r'.*?word.+?\(\d{1,2}[ap]m-\d{1,2}[ap]m\)\s+$')

    date = re.compile(r'(Sunday)\s+(\w+\s+\d+,\s+(2010))')

    he = re.compile(r'\t\w+.\t\w+\t\w+\t\w+\s\(\w+\)\t\w+$')

    son = re.compile(r'^.*\t\d+\t.+\t')

    # Initialize counters
    num_lines = 0
    num_s = 0
    num_date = 0
    num_he = 0
    num_son = 0
    num_unmatched = 0

    # Initialize empty list to store output lines
    sonlines = []

    # Initialize string vars to use for the show title and date
    title = ''
    date = ''

    with open(fn_file) as f:

        # Loop over the lines in the file
        for line in f:

            num_lines +=1


            line = line.rstrip('\n')

            m_s = re.match(s, line)
            m_date = re.match(date, line)
            m_he = re.match(he, line)
            m_son = re.match(son, line)


            if m_s:


                num_s += 1

                # Get the show title
                ti =  m_s.group()

            elif m_date:
                # it's a date line
                num_date += 1
                show_day = m_date.group(1)
                s_date = m_date.group(2)

            elif m_he:
                # it's a header line
                num_he += 1
                heline = m_he.group()

            elif m_son:

                num_son += 1
                son_group = m_son.group()
                son = re.split(r'\t+', son_group)
                son.insert(0,ti)
                son.insert(1,s_date)
                sonlines.append(son)


    header = re.split(r'\t+',heline.rstrip('\t'))           
    header[0] = 'b'               
    header.insert(0,'ti')       
    header.insert(1,'s_date')    

    # Create pandas dataframe and export to csv

```lines throwing error
    s_df = pd.DataFrame(data = fn_file, columns = [header])
    s_df.to_csv(fn_csv_out, sep='\t', index= False)

Last two lines are throwing error, Can you please help on the error. Thanks in advance.

Problem was solved in comments. The variable was being passed to the DataFrame constructor. the fix it so change it to

s_df = pd.DataFrame(data = sonlines, columns = [header])

instead of

s_df = pd.DataFrame(data = fn_file, columns = [header])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM