简体   繁体   English

迭代文件并将值添加到 python 字典

[英]Iterating over files and adding values to python dictionary

I have a set of 50 text files, all set up with a first header row, the first column being gene names, and the remaining columns being values for each gene.我有一组 50 个文本文件,全部设置为第一个 header 行,第一列是基因名称,其余列是每个基因的值。 I also have an official gene list text file.我也有一个官方的基因列表文本文件。 I want to use the official gene name list to build a dictionary, then iterate over the files, determine if the gene name for each line matches the gene name in the dictionary, and if it does, append the dictionary value with the additional values from the experimental file.我想使用官方的基因名称列表来构建一个字典,然后遍历文件,确定每一行的基因名称是否与字典中的基因名称匹配,如果匹配,则 append 字典值与附加值来自实验文件。

So the experimental file looks like this:所以实验文件是这样的:

GENE    Exp1    Exp2
geneA   12      34
geneB   42      10
geneC   42      10

The official gene list looks like this:官方的基因列表是这样的:

GENE    
geneA   
geneC

I've tried using defaultdict and the following code (for just one experimental file, but could later iterate over more):我尝试使用 defaultdict 和以下代码(仅用于一个实验文件,但以后可以迭代更多):

combo = {}

with open('official_gene_list.txt', 'r') as f:
    f.readline()
    for line in f:
        name = line.split('\n')[0]
        combo[name]={}

with open('expeirmenta1_file.txt', 'r') as g:
for each in g:
    name2 = each.split('\t')[0]
    data = each.rstrip('\n').split('\t')[1:]
    for name2 in combo:
        combo[name2].append(data)

But whenever I do that, the dictionary is made fine, but I get the following error:但是每当我这样做时,字典就很好了,但我收到以下错误:

AttributeError: 'dict' object has no attribute 'append'

I've also tried using a defaultdict():我也尝试过使用 defaultdict():

from collections import defaultdict
combo = defaultdict(list)
with open('gene_orf_updated2.txt', 'r') as f:
    f.readline()
    for line in f:
        name = line.split('\n')[0]
        combo[name]={}
with open('GSE139_meanCenter_results.txt', 'r') as g:
    for each in g:
        name2 = each.split('\t')[0]
        data = each.rstrip('\n').split('\t')[1:]
        for name2 in combo:
            combo[name2].append(data)

And I get the same error about 'dict' object has no attribute 'append'.我得到关于'dict' object 没有属性'append' 的相同错误。

I've made dictionaries before, but never tried to append new values to existing keys like this.我以前制作过字典,但从未尝试过像这样对现有键 append 新值。 Is this possible?这可能吗? Any help or advice would be greatly appreciated.任何帮助或建议将不胜感激。

So if you want to use .append() , you need to make sure the dictionary value you are appending to is a list.因此,如果您想使用.append() ,您需要确保要附加到的字典值是一个列表。 Looks like you are setting it to combo[name]={} , and thus you are getting the 'dict has no attribute' error.看起来您将其设置为combo[name]={} ,因此您收到“dict has no attribute”错误。 You should probably try changing the combo[name]={} to combo[name]=[] to continue using append later.您可能应该尝试将combo[name]={}更改为combo[name]=[]以便稍后继续使用 append。

New Edit fixing logic:新编辑修复逻辑:

for each in g:
    # name2 = each.split('\t')[0]
    data = each.rstrip('\n').split('\t')[1:]
    for key_name in combo: # For every 'value' (each individual list) in the dictionary
        combo[key_name].append(data) # Add the this lines data. 

You are close do like this.你很接近这样做。

combo = {}

with open('gene_orf_updated2.txt', 'r') as f:
    for line in f:
        name = line.split('\n')[0]
        combo[name]= []
with open('GSE139_meanCenter_results.txt', 'r') as g:
    for each in g:
        name2 = each.split('\t')[0]
        data = each.rstrip('\n').split('\t')[1:]
        if name2 in combo:
            combo[name2].append(data)

If you want to remove the nested list do this instead.如果要删除嵌套列表,请改为执行此操作。

combo[name2] += data

As others have pointed out you can't append to dicts.正如其他人指出的那样,您不能 append 听写。

d = {}

After you've initialized your dict you can add new keys like so初始化 dict 后,您可以像这样添加新键

d['new'] = 9

You can overwrite existing keys by doing this您可以通过这样做覆盖现有的密钥

d['new'] = 10

In your situation you may want to try creating a dicts of lists and then appending to that.在您的情况下,您可能想尝试创建列表的字典,然后附加到该列表中。

import pandas as pd

def print_file(f_name):
    print('\n\n'+f_name)
    print('*'*10)
    print(open(f_name,'r').read())

gene_fname = 'genes.txt'
print_file(gene_fname)
gene = pd.read_csv(gene_fname)
df_final = pd.DataFrame(gene)
df_final['combined'] = [list() for x in range(len(df_final.index))]

for val in ['values1.txt','values2.txt','values3.txt','values4.txt']:
    print_file(val)
    val_df = pd.read_csv(val,header=0,delim_whitespace=True)
    df_final = pd.merge(df_final,val_df,on='GENE',how='left')
    df_final['new'] = df_final.loc[:,df_final.columns.difference(['GENE','combined'])].values.tolist()
    df_final['combined'] = df_final['new']+df_final['combined']
    df_final.drop(df_final.columns.difference(['GENE','combined']),1,inplace=True)

df_final['combined'] = df_final['combined'].apply(lambda x: [int(i) for i in x if str(i) != "nan"])
print('\n\n')
print(df_final)

Output Output

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM