简体   繁体   English

如何读取csv文件中包含的python字典并将数据存储在pandas数据框中?

[英]How can i read a python dict contained in a csv file and store the data in a pandas dataframe?

I have a csv where each row is a dictionary, inside each row's dict, there is a list, and this list contains a sublist and a subdict. 我有一个csv,其中每一行都是字典,在每一行的dict中都有一个列表,并且此列表包含一个子列表和一个下标。 Each sublist has 2 elements, and the subdict has 100 keys, and a value per key. 每个子列表都有2个元素,而子对象有100个键,每个键都有一个值。 This is a screenshot of the data: 这是数据的屏幕截图:

在此处输入图片说明

Here's a sample of the data in text format: 这是文本格式的数据示例:

{"0": [[10.8, 36.0], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"1": [[10.8, 36.1], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"2": [[10.8, 36.2], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"3": [[10.8, 36.300000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"4": [[10.8, 36.4], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"5": [[10.8, 36.5], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"6": [[10.8, 36.6], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"7": [[10.8, 36.7], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"8": [[10.8, 36.800000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}
{"9": [[10.8, 36.9], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}

What i would like to do is to read this into a pandas dataframe that would produce an expected output like this (i will just type a single row for the sake of simplicity): 我想做的是将其读入pandas数据帧,该数据帧将产生预期的输出(为简单起见,我将只键入一行):

list_elemnt_1   list_elemnt_2  key_0,  key_1,  key_2,  key_3,  key_4,  and so on...
        value           value  value   value   value   value   value   and so on...

For each row in the csv, i would like to build a dataframe with one column per sublist value (2), and one column for each key in the subdict contained on the row's dict. 对于csv中的每一行,我想构建一个数据框,其中每个子列表值(2)包含一列,并且该行的dict中包含的下标中的每个键对应一列。

How could i do this?? 我该怎么办? Please feel free to ask more information if needed. 如果需要,请随时询问更多信息。

Thank you very much in advance 提前非常感谢你

EDIT 编辑

Key_0, key_1, key_2, etc... are the subdict keys, not the master dict keys Key_0,key_1,key_2等...是从属键,而不是主dict键

import ast
import pandas as pd

file = open('file_55966371.csv', 'r')

lines = [ast.literal_eval(line) for line in file]

def clean_lines(line):
    value = [v for v in line.values()]

    l1, l2 = value[0][0]

    line_dict = value[0][1]

    line_dict = {f'key_{key}': value for key, value in line_dict.items()}

    line_dict['list_element1'] = l1
    line_dict['list_element2'] = l2

    return line_dict

to_read = [clean_lines(line) for line in lines]

df = pd.DataFrame(to_read)

I agree with @furas this looks a lot like a JSON, and if this data was sourced from someone, it would be best if you could ask them if they could send it to you in JSON format. 我同意@furas,它看起来很像JSON,并且如果此数据是从某人获取的,那么最好问问他们是否可以将其以JSON格式发送给您。

If not, the code above works. 如果不是,则上面的代码有效。

  • Open the file. 打开文件。

  • read each line and store it as a list. 阅读每一行并将其存储为列表。 ast.literal_eval allows Python to recognize that it's a dictionary from the get-go and stores them as dict objects. ast.literal_eval允许Python从一开始就认识到它是字典并将它们存储为dict对象。

  • i created a helper clean_lines function. 我创建了一个辅助clean_lines函数。 clean_lines is the more important part. clean_lines是更重要的部分。

    1. get the values (ie the list with a sublist and a subdict) 获取值(即带有子列表和下标的列表)
    2. unpack the list into two variables l1 and l2 将列表解压缩为两个变量l1l2
    3. rename the key for the subdict (to your specs of key_X ) d. 重命名为subdict(您的规格主要key_X )d。 add l1 and l2 as entries into the dictionary, basically combining the sublist and subdict into a single dictionary 将l1和l2作为条目添加到字典中,基本上将子列表和下标合并为一个字典

Once you have a list of dictionaries, pandas will be able to recognize it and you can plug it into a pd.DataFrame instantiator object 一旦有了字典列表,熊猫便可以识别它,并将其插入到pd.DataFrame实例化器对象中

Not the best way to do it. 不是最好的方法。

# Edit for reading the csv

# there are two ways to go about it, I am assuming data is in 1 column
df_csv = pd.read_csv('/path/to/your/file/filename.csv')


# read in the csv, I assume you are able to do this.
list_of_dfs = []
for idx, row in df_csv.iterrows():
      d = row[column_name]   # find the column name and insert here
      df = pd.DataFrame.from_dict(d,orient='index') # creating a dataframe to get the number of lines

     remove_cols = df.columns

    for i in d.keys():
         df['list_elemnt_1'] = d[i][0][0]
         df['list_elemnt_2'] = d[i][0][1]
         for key in d[i][1].keys():
               df[key] = d[i][1][key]

         # remove the original cols here
         list_of_dfs.append(df)


This will give you the df s of each line as a line elemnt in the list_of_dfs which I assumed is the goal? 这将为您提供每行的df ,作为list_of_dfs的行list_of_dfs ,我认为这是目标? Let me know if it works. 让我知道它是否有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用pandas.read_csv将CSV文件中的数据插入数据框? - How can I insert data from a CSV file into a dataframe using pandas.read_csv? 如何清理此 csv 数据,以便我可以将其读入 pandas dataframe - How to clean this csv data so that I can read it into a pandas dataframe 如何将数据从 csv 读取到具有多列的 pandas dataframe 中? - How can the data be read from a csv into a pandas dataframe, with multiple columns? Python - How can I read a CSV created by a HTML5 Export Button into a Pandas DataFrame? - Python - How can I read a CSV created by a HTML5 Export Button into a Pandas DataFrame? 如何将熊猫数据框信息存储在 csv 文件中 - How to store pandas dataframe information in a csv file 在 Dataframe python 中存储 DICT 数据 - Store DICT data in Dataframe python 如何在 pandas(或 python csv)中读取此 csv 文件? - How to read this csv file in pandas (or python csv)? 如何删除从csv导入的python pandas数据框中的部分数据? - How can I delete portion of data in python pandas dataframe imported from csv? 从文件中读取数据,存储在python dict中并在python中搜索值 - read data from file, store in a python dict and search for values in python 如何使用 python 和数据框从 csv 文件中读取动态数据 - How to read a dynamic data from csv file using python and dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM