[英]Convert .txt file to .csv with specific columns PYTHON
I have some text file that I want to load into my python code, but the format of the txt file is not suitable.我有一些文本文件要加载到我的 python 代码中,但 txt 文件的格式不合适。
Here is what it contains:这是它包含的内容:
SEQ MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNY
SS3 CCCHHHHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
95024445656543114678678999999999999999888889998886
SS8 CCHHHHHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
96134445555554311253378999999999999999999999999987
SA EEEbBBBBBBBBBBbEbEEEeeEeBeEbBEEbbEeBeEbbeebBbBbBbb
41012123422000000103006262214011342311110000030001
TA bhHHHHHHHHHHHHHgIihiHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
00789889988663201010099999999999999999898999998741
CD NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
54433221111112221122124212411342243234323333333333
I want to convert it into panda Dataframe to have SEQ SS4 SA TA CD SS8 as columns of the DataFrame and the line next to them as the rows.我想将它转换为熊猫数据帧,以将 SEQ SS4 SA TA CD SS8 作为数据帧的列,并将它们旁边的行作为行。 Like this:
像这样:
I tried pd.read_csv
but it doesn't give me the result I want.我试过
pd.read_csv
但它没有给我想要的结果。
Thank you !谢谢 !
Note: this solution works with arbitrary (includes zero, and of course not too many) consecutive lines with omitted values in the first column.注意:此解决方案适用于第一列中省略值的任意(包括零,当然不是太多)连续行。
# data (3 characters for the second column only)
file_path = "/mnt/ramdisk/input.txt"
df = pd.read_fwf(file_path, names=["col", "val"])
# fill the blank values
df["col"].ffill(inplace=True)
# get correct row location
df["gp"] = df.groupby("col").cumcount()
# pivot group (0,1) to columns and then transpose.
df_ans = df.pivot(index="col", columns="gp", values="val").transpose()
print(df_ans) # show the first 3 characters only
col CD SA SEQ SS3 SS8 TA
gp
0 NNN EEE MSS CCC CCH bhH
1 544 410 NaN 950 961 007
Then you can save the resulting DataFrame using df_ans.to_csv()
.然后您可以使用
df_ans.to_csv()
保存生成的 DataFrame。
To read a text file using pandas.read_csv() method, the text file should contain data separated with comma.要使用 pandas.read_csv() 方法读取文本文件,文本文件应包含用逗号分隔的数据。
SEQ, SS3, ....
MSSSSWLLLSLVAVTAAQSTIEEQ..., CCCHHHHHHHHHHHHCCCCCCHHHHHHH.....
You can use this script to load the .txt file to DataFrame and save it as csv file:您可以使用此脚本将 .txt 文件加载到 DataFrame 并将其保存为 csv 文件:
import pandas as pd
data = {}
with open('<your file.txt>', 'r') as f_in:
for line in f_in:
line = line.split()
if len(line) == 2:
data[line[0]] = [line[1]]
df = pd.DataFrame(data)
print(df)
df.to_csv('data.csv', index=False)
Saves this CSV:保存此 CSV:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.