The objective is strip tabs character that exist between two strings.
Specifically, I would like to remove the Tab character in between the *Generic
and h_two
which is highlighted in yellow as depicted below
the expected output as viewed using Microsoft Office application in a Show Paragraph Mark is a below'
The file is from a txt format file.
One naive way is as
f_output.write(line.replace('*Generic \t \t', ','))
However, this did not work as intended.
So, there are two issues.
The full code to replicate this issue is.
import pandas as pd
fname = 'endnote_csv_help'
'''
Step 1) Create mock df and save to csv
'''
my_list = ['col_one', 'col_two', 'col_three']
combine_list = [{'h_one', 'h_two', 'h_three'}, my_list, my_list]
df = pd.DataFrame(combine_list)
df.to_csv(f'{fname}.csv', index=False, header=False)
'''
Step 2) Read the csv and convert to txt format
'''
df_shifted = pd.read_csv(f'{fname}.csv', header=None).shift(1, axis=0)
df_shifted.at[0, 0] = '*Generic'
df_shifted.fillna('').to_csv(f'{fname}.txt', sep='\t', index=False, header=False)
'''
Step 3) Read the txt and replace the tab character
'''
with open('endnote_csv_help.txt') as f_input, open('new_endnote_csv_help.txt', 'w') as f_output:
for line in f_input:
f_output.write(line.replace('*Generic \t \t', ','))
Note: The thread has been updated slightly upon the response by @Kuldeep.
Input: endnote_csv_help.txt
*Generic
h_one h_three h_two
col_one col_two col_three
Output: new_endnote_csv_help.txt
*Generic,,
h_one,h_three,h_two
col_one,col_two,col_three
Reading a line from the input and replacing tabs the writing it to output
with open('endnote_csv_help.txt') as f_input, open('new_endnote_csv_help.txt', 'w') as f_output:
for line in f_input:
f_output.write(line.replace('\t', ','))
As appear, there are two character Tab between in between the *Generic and h_two which.
Hence, this can be replace simply by
replace('\t\t', '')
The complete code then as below
with open('endnote_csv_help.txt') as f_input, open('new_endnote_csv_help.txt', 'w') as f_output:
for line in f_input:
f_output.write(line.replace('\t\t', ''))
Note that, there should be no spacing between the Character Tabs symbol \t\t
.
Thanks to the suggestion by @Kuldeep, it does provide major hint. As a result, his comment will be accepted as answer
per other answer - your error is because you are reading from a file that you have opened for write. If you want to replace multiple instances of tab with blank use a reg expr. This expression matches 2 or more consecutive tabs with empty string
import re
data = '*Generic\t\t\nh_three\th_one\th_two\ncol_one\tcol_two\tcol_three\n'
re.sub("([\t][\t]+)", "", data)
output
'*Generic\nh_three\th_one\th_two\ncol_one\tcol_two\tcol_three\n'
to remove exception, read from file which is opened for read and write to file opened for write.
import pandas as pd
import re
fname = 'endnote_csv_help'
'''
Create mock df and save to csv
'''
my_list = ['col_one', 'col_two', 'col_three']
combine_list = [{'h_one', 'h_two', 'h_three'}, my_list, my_list]
df = pd.DataFrame(combine_list)
df.to_csv(f'{fname}.csv', index=False, header=False)
'''
# Read the csv and convert to txt format
'''
df_shifted = pd.read_csv(f'{fname}.csv', header=None).shift(1, axis=0)
df_shifted.at[0, 0] = '*Generic'
df_shifted.fillna('').to_csv(f'{fname}.txt', sep='\t', index=False, header=False)
'''
Read the txt and replace the tab character
'''
with open(f'{fname}.txt', 'r') as file:
data = re.sub("([\t][\t]+)", "", file.read())
with open(f'{fname}.txt', 'w') as file:
file.write(data)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.