简体   繁体   English

将txt文件(类似于字典格式)读入pandas dataframe

[英]Reading txt file (similar to dictionary format) into pandas dataframe

I have a txt file that looks like this:我有一个看起来像这样的 txt 文件:

('GTCC', 'ACTB'): 1
('GTCC', 'GAPDH'): 2
('CGAG', 'ACTB'): 1
('CGAG', 'GAPDH'): 4

where the first string is a gRNA name, the second string is a gene name, and the number is a count of those two strings occurring together.其中第一个字符串是 gRNA 名称,第二个字符串是基因名称,数字是这两个字符串一起出现的计数。

I want to read this into a pandas dataframe and re-shape it so that it looks like this:我想将其读入 pandas dataframe 并重新塑造它,使其看起来像这样:

      ACTB GAPDH
GTCC   1     2
CGAG   1     4

How might I do this?我该怎么做?

The file will not always be this size-- it will often be much larger (200 gRNA names x 20 gene names) but the size will be variable.文件并不总是这么大——它通常会大得多(200 个 gRNA 名称 x 20 个基因名称),但大小是可变的。 There will always only be one gRNA name and one gene name per count.每次计数总是只有一个 gRNA 名称和一个基因名称。 The titles of the columns/rows are accurate as to what the real file will look like (some string of letters for the rows and some gene name for the columns).列/行的标题对于真实文件的外观是准确的(行的一些字母字符串和列的一些基因名称)。

This is certainly not the cleanest way to do it, but I figured out a way to get what I wanted:这当然不是最干净的方法,但我想出了一种方法来获得我想要的东西:

df = pd.read_csv('test.txt', sep=",|:", engine ='python', names=['gRNA','gene','count'])
df["gRNA"]=df["gRNA"].str.replace("(","")
df["gRNA"]=df["gRNA"].str.replace("'","")
df["gene"]=df["gene"].str.replace(")","")
df["gene"]=df["gene"].str.replace("'","")
df=df.pivot(index='gRNA', columns='gene', values='count')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM