how to use headers only once after using to_csv based on conditions in pandas?

Question

I am trying to make new txt file from another txt file based on conditions. Both txt files have same headers. But after using 'to_csv' i see in output we have more than 1 header. I need header ONLY once.

Code:

import pandas as pd

import glob 

big_files = glob.glob('*.txt')

for small_file in big_files:
    
    df = pd.read_csv(small_file, sep= '\t')
    
    df[df['grade'].isin(['Good']) & df['area'].str.contains('Texas')].to_csv('out.txt',sep= '\t',index=False, mode = 'a')
    print('ok')

Output:

grade   area
Good    Texas
Good    Texas
Good    Texas
grade   area
Good    Texas
Good    Texas
Good    Texas

Expected Output:

grade   area
Good    Texas
Good    Texas
Good    Texas
Good    Texas
Good    Texas
Good    Texas

Answer 1

You can use header parameter for to_csv method:

import pandas as pd
import glob 

big_files = glob.glob('*.txt')

header = True
for small_file in big_files:
    df = pd.read_csv(small_file, sep= '\t')
    
    (df[df['grade'].isin(['Good']) & df['area'].str.contains('Texas')]
          .to_csv('out.txt', sep= '\t', 
                  index=False, mode = 'a', 
                  header=header))
    header = False
    print('ok')

Answer 2

Another way to solve this is to concatenate the separate dataframes and only write out once:

import pandas as pd

import glob 

big_files = glob.glob('*.txt')

dfs = [pd.read_csv(file, sep= '\t') for file in big_files]

df = pd.concat(dfs)
    
df[df['grade'].isin(['Good']) & df['area'].str.contains('Texas')].to_csv('out.txt',sep= '\t',index=False)
print('ok')

how to use headers only once after using to_csv based on conditions in pandas?

Question

2 answers

solution1
1 2020-06-23 14:37:53

solution2
0 2020-06-23 14:41:48

how to use headers only once after using to_csv based on conditions in pandas?

Question

2 answers

solution1 1 2020-06-23 14:37:53

solution2 0 2020-06-23 14:41:48

solution1
1 2020-06-23 14:37:53

solution2
0 2020-06-23 14:41:48