[英]Reading txt file (similar to dictionary format) into pandas dataframe
I have a txt file that looks like this:我有一个看起来像这样的 txt 文件:
('GTCC', 'ACTB'): 1
('GTCC', 'GAPDH'): 2
('CGAG', 'ACTB'): 1
('CGAG', 'GAPDH'): 4
where the first string is a gRNA name, the second string is a gene name, and the number is a count of those two strings occurring together.其中第一个字符串是 gRNA 名称,第二个字符串是基因名称,数字是这两个字符串一起出现的计数。
I want to read this into a pandas dataframe and re-shape it so that it looks like this:我想将其读入 pandas dataframe 并重新塑造它,使其看起来像这样:
ACTB GAPDH
GTCC 1 2
CGAG 1 4
How might I do this?我该怎么做?
The file will not always be this size-- it will often be much larger (200 gRNA names x 20 gene names) but the size will be variable.文件并不总是这么大——它通常会大得多(200 个 gRNA 名称 x 20 个基因名称),但大小是可变的。 There will always only be one gRNA name and one gene name per count.
每次计数总是只有一个 gRNA 名称和一个基因名称。 The titles of the columns/rows are accurate as to what the real file will look like (some string of letters for the rows and some gene name for the columns).
列/行的标题对于真实文件的外观是准确的(行的一些字母字符串和列的一些基因名称)。
This is certainly not the cleanest way to do it, but I figured out a way to get what I wanted:这当然不是最干净的方法,但我想出了一种方法来获得我想要的东西:
df = pd.read_csv('test.txt', sep=",|:", engine ='python', names=['gRNA','gene','count'])
df["gRNA"]=df["gRNA"].str.replace("(","")
df["gRNA"]=df["gRNA"].str.replace("'","")
df["gene"]=df["gene"].str.replace(")","")
df["gene"]=df["gene"].str.replace("'","")
df=df.pivot(index='gRNA', columns='gene', values='count')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.