简体   繁体   English

发布删除特定行,在Python中使用制表符分隔的列将.txt转换为.csv

[英]Post deleting specific line, convert .txt to .csv with tab separated columns in Python

What I Have: Huge text data (.txt) with text separated by tabs. 我所拥有的:巨大的文本数据(.txt),文本之间用制表符分隔。

What I want: Convert text (.txt) to CSV (.csv) placing each word separated by tabs in different columns using Python. 我想要的是:使用Python将文本(.txt)转换为CSV(.csv),将每个单词用制表符分隔在不同的列中。

// Start Time: 10
// Update Rate: 2
// Scenario: 367.3
// Firmware Version: 1.1.1
Count   Temp    V_X V_Y V_Z
25  0   0.28    0.43    -0.07
23  4   0.34    0.33    -0.03
22  3   0.34    0.23    -0.04
21  2   0.35    0.43    -0.03
27  3   0.33    0.33    -0.12

The first problem is that I want to remove all the lines from the text file. 第一个问题是我想从文本文件中删除所有行。 Second problem is that I want to get all the tab separated text data into csv columns. 第二个问题是我想将所有制表符分隔的文本数据放入csv列中。

Here is what I am doing at this moment, 这是我目前正在做的事情,

infile = open('/Users/parth_To_File/myData.txt','r').readlines()
with open('/Users/parth_To_File/out_myData.txt','w') as outfile:
    for index,line in enumerate(infile):
        if index != 0:
            outfile.write(line)

I am running the above code 4 times to get red of the redundant information in data. 我正在运行上述代码4次以获取数据中的冗余信息的红色。 Then, I use the below code to convert the data in csv file. 然后,我使用下面的代码转换csv文件中的数据。

save_path = "/Users/parth_To_File/"
in_filename = os.path.join(save_path,'myData.txt')
out_filename = os.path.join(save_path,'out_myData.csv')
df = pd.read_csv(in_filename, sep=";")
df.to_csv(out_filename, index=False)

The problem with the methods I am using is, - The code is not optimised to delete specific lines from txt data - The code does not provide proper tabular data with individual columns 我使用的方法存在的问题是-代码未针对从txt数据中删除特定行进行优化-代码未提供单独列的正确表格数据

I would appreciate if someone can help me understand the correct method to perform txt to csv conversion as per above mentioned need. 如果有人可以帮助我了解按照上述需求执行txt到csv转换的正确方法,我将不胜感激。

A file seperated by tabs is a TSV format ( https://en.wikipedia.org/wiki/Tab-separated_values ). 用制表符分隔的文件是TSV格式( https://en.wikipedia.org/wiki/Tab-separated_values )。 Pandas supports this. 熊猫对此表示支持。 You can do: 你可以做:

df = pd.read_csv('input.tsv', sep='\t', skiprows=4)
df.to_csv('input.csv', index=False, sep=",")

Everything is provided by pandas, no need to read the file line by line by yourself. 一切都由熊猫提供,无需自己逐行读取文件。 You can use read_csv and set the separator to '\\t'. 您可以使用read_csv并将分隔符设置为'\\ t'。 Lines starting with the character given as comment are skipped: 跳过以comment字符开头的行:

df = pd.read_csv('myData.txt', sep = '\t', comment = '/')

Output: 输出:

   Count  Temp   V_X   V_Y   V_Z
0     25     0  0.28  0.43 -0.07
1     23     4  0.34  0.33 -0.03
2     22     3  0.34  0.23 -0.04
3     21     2  0.35  0.43 -0.03
4     27     3  0.33  0.33 -0.12


If all you need is just to convert the TSV file to a CSV, you can also do it without any programming by just: 如果您只需要将TSV文件转换为CSV,也可以通过以下任何一种操作而无需任何编程即可:

sed '/\//d; s/\t/,/g' myData.txt > myData.csv

or 要么

 sed '/\\//d; s/\\t/,/g' myData.txt > myData.csv 

The former converts any tabs to commas starting from line #5 whereas the latter converts all lines not starting with a / . 前者将所有制表符转换为从#5行开始的逗号,而后者将所有非以/开头的行转换为逗号。
If your file is huge as you said, this might be faster than first converting it into a pandas dataframe. 如果您的文件很大 ,那么这可能比先将其转换为pandas数据帧要快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM