I am trying to make new txt file from another txt file based on conditions. Both txt files have same headers. But after using 'to_csv' i see in output we have more than 1 header. I need header ONLY once.
Code:
import pandas as pd
import glob
big_files = glob.glob('*.txt')
for small_file in big_files:
df = pd.read_csv(small_file, sep= '\t')
df[df['grade'].isin(['Good']) & df['area'].str.contains('Texas')].to_csv('out.txt',sep= '\t',index=False, mode = 'a')
print('ok')
Output:
grade area
Good Texas
Good Texas
Good Texas
grade area
Good Texas
Good Texas
Good Texas
Expected Output:
grade area
Good Texas
Good Texas
Good Texas
Good Texas
Good Texas
Good Texas
You can use header
parameter for to_csv
method:
import pandas as pd
import glob
big_files = glob.glob('*.txt')
header = True
for small_file in big_files:
df = pd.read_csv(small_file, sep= '\t')
(df[df['grade'].isin(['Good']) & df['area'].str.contains('Texas')]
.to_csv('out.txt', sep= '\t',
index=False, mode = 'a',
header=header))
header = False
print('ok')
Another way to solve this is to concatenate the separate dataframes and only write out once:
import pandas as pd
import glob
big_files = glob.glob('*.txt')
dfs = [pd.read_csv(file, sep= '\t') for file in big_files]
df = pd.concat(dfs)
df[df['grade'].isin(['Good']) & df['area'].str.contains('Texas')].to_csv('out.txt',sep= '\t',index=False)
print('ok')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.