简体   繁体   中英

how to use headers only once after using to_csv based on conditions in pandas?

I am trying to make new txt file from another txt file based on conditions. Both txt files have same headers. But after using 'to_csv' i see in output we have more than 1 header. I need header ONLY once.

Code:

import pandas as pd

import glob 

big_files = glob.glob('*.txt')

for small_file in big_files:
    
    df = pd.read_csv(small_file, sep= '\t')
    
    df[df['grade'].isin(['Good']) & df['area'].str.contains('Texas')].to_csv('out.txt',sep= '\t',index=False, mode = 'a')
    print('ok')

Output:

grade   area
Good    Texas
Good    Texas
Good    Texas
grade   area
Good    Texas
Good    Texas
Good    Texas

Expected Output:

grade   area
Good    Texas
Good    Texas
Good    Texas
Good    Texas
Good    Texas
Good    Texas

You can use header parameter for to_csv method:

import pandas as pd
import glob 

big_files = glob.glob('*.txt')

header = True
for small_file in big_files:
    df = pd.read_csv(small_file, sep= '\t')
    
    (df[df['grade'].isin(['Good']) & df['area'].str.contains('Texas')]
          .to_csv('out.txt', sep= '\t', 
                  index=False, mode = 'a', 
                  header=header))
    header = False
    print('ok')

Another way to solve this is to concatenate the separate dataframes and only write out once:

import pandas as pd

import glob 

big_files = glob.glob('*.txt')

dfs = [pd.read_csv(file, sep= '\t') for file in big_files]

df = pd.concat(dfs)
    
df[df['grade'].isin(['Good']) & df['area'].str.contains('Texas')].to_csv('out.txt',sep= '\t',index=False)
print('ok')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM