How to efficiently strip tabs character from a txt format file with Python

Question

The objective is strip tabs character that exist between two strings.

Specifically, I would like to remove the Tab character in between the *Generic and h_two which is highlighted in yellow as depicted below

the expected output as viewed using Microsoft Office application in a Show Paragraph Mark is a below'

The file is from a txt format file.

One naive way is as

f_output.write(line.replace('*Generic \t \t', ','))

However, this did not work as intended.

So, there are two issues.

The code below replace all the tab characters instead of only in between the Generic and h_two strings

How to efficiently replace only the tab characters between the sub-strings?

The full code to replicate this issue is.

import pandas as pd

fname = 'endnote_csv_help'
'''
Step 1) Create mock df and save to csv
'''
my_list = ['col_one', 'col_two', 'col_three']
combine_list = [{'h_one', 'h_two', 'h_three'}, my_list, my_list]
df = pd.DataFrame(combine_list)
df.to_csv(f'{fname}.csv', index=False, header=False)

'''
Step 2) Read the csv and convert to txt format
'''

df_shifted = pd.read_csv(f'{fname}.csv', header=None).shift(1, axis=0)
df_shifted.at[0, 0] = '*Generic'
df_shifted.fillna('').to_csv(f'{fname}.txt', sep='\t', index=False, header=False)

'''
Step 3) Read the txt and replace the tab character
'''



with open('endnote_csv_help.txt') as f_input, open('new_endnote_csv_help.txt', 'w') as f_output:
    for line in f_input:
        f_output.write(line.replace('*Generic \t \t', ','))

Note: The thread has been updated slightly upon the response by @Kuldeep.

Answer 1

Input: endnote_csv_help.txt

*Generic        
h_one   h_three h_two
col_one col_two col_three

Output: new_endnote_csv_help.txt

*Generic,,
h_one,h_three,h_two
col_one,col_two,col_three

Reading a line from the input and replacing tabs the writing it to output

with open('endnote_csv_help.txt') as f_input, open('new_endnote_csv_help.txt', 'w') as f_output:
    for line in f_input:
        f_output.write(line.replace('\t', ','))

Answer 2

As appear, there are two character Tab between in between the *Generic and h_two which.

Hence, this can be replace simply by

replace('\t\t', '')

The complete code then as below

with open('endnote_csv_help.txt') as f_input, open('new_endnote_csv_help.txt', 'w') as f_output:
    for line in f_input:
        f_output.write(line.replace('\t\t', ''))

Note that, there should be no spacing between the Character Tabs symbol \t\t .

Thanks to the suggestion by @Kuldeep, it does provide major hint. As a result, his comment will be accepted as answer

Answer 3

per other answer - your error is because you are reading from a file that you have opened for write. If you want to replace multiple instances of tab with blank use a reg expr. This expression matches 2 or more consecutive tabs with empty string

import re
data = '*Generic\t\t\nh_three\th_one\th_two\ncol_one\tcol_two\tcol_three\n'
re.sub("([\t][\t]+)", "", data)

output

'*Generic\nh_three\th_one\th_two\ncol_one\tcol_two\tcol_three\n'

to remove exception, read from file which is opened for read and write to file opened for write.

import pandas as pd
import re

fname = 'endnote_csv_help'
'''
Create mock df and save to csv
'''
my_list = ['col_one', 'col_two', 'col_three']
combine_list = [{'h_one', 'h_two', 'h_three'}, my_list, my_list]
df = pd.DataFrame(combine_list)
df.to_csv(f'{fname}.csv', index=False, header=False)

'''
# Read the csv and convert to txt format
'''

df_shifted = pd.read_csv(f'{fname}.csv', header=None).shift(1, axis=0)
df_shifted.at[0, 0] = '*Generic'
df_shifted.fillna('').to_csv(f'{fname}.txt', sep='\t', index=False, header=False)

'''
Read the txt and replace the tab character
'''

with open(f'{fname}.txt', 'r') as file:
    data = re.sub("([\t][\t]+)", "", file.read())
with open(f'{fname}.txt', 'w') as file:
    file.write(data)

How to efficiently strip tabs character from a txt format file with Python

Question

3 answers

solution1
1 2020-07-09 04:40:26

solution2
0 2020-07-09 05:32:16

solution3
-1 ACCPTED 2020-07-09 04:50:06

How to efficiently strip tabs character from a txt format file with Python

Question

3 answers

solution1 1 2020-07-09 04:40:26

solution2 0 2020-07-09 05:32:16

solution3 -1 ACCPTED 2020-07-09 04:50:06

solution1
1 2020-07-09 04:40:26

solution2
0 2020-07-09 05:32:16

solution3
-1 ACCPTED 2020-07-09 04:50:06