What I Have: Huge text data (.txt) with text separated by tabs.
What I want: Convert text (.txt) to CSV (.csv) placing each word separated by tabs in different columns using Python.
// Start Time: 10
// Update Rate: 2
// Scenario: 367.3
// Firmware Version: 1.1.1
Count Temp V_X V_Y V_Z
25 0 0.28 0.43 -0.07
23 4 0.34 0.33 -0.03
22 3 0.34 0.23 -0.04
21 2 0.35 0.43 -0.03
27 3 0.33 0.33 -0.12
The first problem is that I want to remove all the lines from the text file. Second problem is that I want to get all the tab separated text data into csv columns.
Here is what I am doing at this moment,
infile = open('/Users/parth_To_File/myData.txt','r').readlines()
with open('/Users/parth_To_File/out_myData.txt','w') as outfile:
for index,line in enumerate(infile):
if index != 0:
outfile.write(line)
I am running the above code 4 times to get red of the redundant information in data. Then, I use the below code to convert the data in csv file.
save_path = "/Users/parth_To_File/"
in_filename = os.path.join(save_path,'myData.txt')
out_filename = os.path.join(save_path,'out_myData.csv')
df = pd.read_csv(in_filename, sep=";")
df.to_csv(out_filename, index=False)
The problem with the methods I am using is, - The code is not optimised to delete specific lines from txt data - The code does not provide proper tabular data with individual columns
I would appreciate if someone can help me understand the correct method to perform txt to csv conversion as per above mentioned need.
A file seperated by tabs is a TSV format ( https://en.wikipedia.org/wiki/Tab-separated_values ). Pandas supports this. You can do:
df = pd.read_csv('input.tsv', sep='\t', skiprows=4)
df.to_csv('input.csv', index=False, sep=",")
Everything is provided by pandas, no need to read the file line by line by yourself. You can use read_csv
and set the separator to '\\t'. Lines starting with the character given as comment
are skipped:
df = pd.read_csv('myData.txt', sep = '\t', comment = '/')
Output:
Count Temp V_X V_Y V_Z
0 25 0 0.28 0.43 -0.07
1 23 4 0.34 0.33 -0.03
2 22 3 0.34 0.23 -0.04
3 21 2 0.35 0.43 -0.03
4 27 3 0.33 0.33 -0.12
sed '/\//d; s/\t/,/g' myData.txt > myData.csv
or
sed '/\\//d; s/\\t/,/g' myData.txt > myData.csv
The former converts any tabs to commas starting from line #5 whereas the latter converts all lines not starting with a /
.
If your file is huge as you said, this might be faster than first converting it into a pandas dataframe.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.