如何读取csv文件中包含的python字典并将数据存储在pandas数据框中？

Question

I have a csv where each row is a dictionary, inside each row's dict, there is a list, and this list contains a sublist and a subdict. 我有一个csv，其中每一行都是字典，在每一行的dict中都有一个列表，并且此列表包含一个子列表和一个下标。 Each sublist has 2 elements, and the subdict has 100 keys, and a value per key. 每个子列表都有2个元素，而子对象有100个键，每个键都有一个值。 This is a screenshot of the data: 这是数据的屏幕截图：

Here's a sample of the data in text format: 这是文本格式的数据示例：

{"0": [[10.8, 36.0], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"1": [[10.8, 36.1], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"2": [[10.8, 36.2], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"3": [[10.8, 36.300000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"4": [[10.8, 36.4], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"5": [[10.8, 36.5], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"6": [[10.8, 36.6], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"7": [[10.8, 36.7], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0}]}
{"8": [[10.8, 36.800000000000004], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}
{"9": [[10.8, 36.9], {"0": 0, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, "7": 0, "8": 0, "9": 0}]}

What i would like to do is to read this into a pandas dataframe that would produce an expected output like this (i will just type a single row for the sake of simplicity): 我想做的是将其读入pandas数据帧，该数据帧将产生预期的输出（为简单起见，我将只键入一行）：

list_elemnt_1   list_elemnt_2  key_0,  key_1,  key_2,  key_3,  key_4,  and so on...
        value           value  value   value   value   value   value   and so on...

For each row in the csv, i would like to build a dataframe with one column per sublist value (2), and one column for each key in the subdict contained on the row's dict. 对于csv中的每一行，我想构建一个数据框，其中每个子列表值（2）包含一列，并且该行的dict中包含的下标中的每个键对应一列。

How could i do this?? 我该怎么办？ Please feel free to ask more information if needed. 如果需要，请随时询问更多信息。

Thank you very much in advance 提前非常感谢你

EDIT 编辑

Key_0, key_1, key_2, etc... are the subdict keys, not the master dict keys Key_0，key_1，key_2等...是从属键，而不是主dict键

Answer 1

import ast
import pandas as pd

file = open('file_55966371.csv', 'r')

lines = [ast.literal_eval(line) for line in file]

def clean_lines(line):
    value = [v for v in line.values()]

    l1, l2 = value[0][0]

    line_dict = value[0][1]

    line_dict = {f'key_{key}': value for key, value in line_dict.items()}

    line_dict['list_element1'] = l1
    line_dict['list_element2'] = l2

    return line_dict

to_read = [clean_lines(line) for line in lines]

df = pd.DataFrame(to_read)

I agree with @furas this looks a lot like a JSON, and if this data was sourced from someone, it would be best if you could ask them if they could send it to you in JSON format. 我同意@furas，它看起来很像JSON，并且如果此数据是从某人获取的，那么最好问问他们是否可以将其以JSON格式发送给您。

If not, the code above works. 如果不是，则上面的代码有效。

Open the file. 打开文件。
read each line and store it as a list. 阅读每一行并将其存储为列表。 ast.literal_eval allows Python to recognize that it's a dictionary from the get-go and stores them as dict objects. ast.literal_eval允许Python从一开始就认识到它是字典并将它们存储为dict对象。
i created a helper clean_lines function. 我创建了一个辅助clean_lines函数。 clean_lines is the more important part. clean_lines是更重要的部分。
1. get the values (ie the list with a sublist and a subdict) 获取值（即带有子列表和下标的列表）
2. unpack the list into two variables l1 and l2 将列表解压缩为两个变量l1和l2
3. rename the key for the subdict (to your specs of key_X ) d. 重命名为subdict（您的规格主要key_X ）d。 add l1 and l2 as entries into the dictionary, basically combining the sublist and subdict into a single dictionary 将l1和l2作为条目添加到字典中，基本上将子列表和下标合并为一个字典

Once you have a list of dictionaries, pandas will be able to recognize it and you can plug it into a pd.DataFrame instantiator object 一旦有了字典列表，熊猫便可以识别它，并将其插入到pd.DataFrame实例化器对象中

Answer 2

Not the best way to do it. 不是最好的方法。

# Edit for reading the csv

# there are two ways to go about it, I am assuming data is in 1 column
df_csv = pd.read_csv('/path/to/your/file/filename.csv')


# read in the csv, I assume you are able to do this.
list_of_dfs = []
for idx, row in df_csv.iterrows():
      d = row[column_name]   # find the column name and insert here
      df = pd.DataFrame.from_dict(d,orient='index') # creating a dataframe to get the number of lines

     remove_cols = df.columns

    for i in d.keys():
         df['list_elemnt_1'] = d[i][0][0]
         df['list_elemnt_2'] = d[i][0][1]
         for key in d[i][1].keys():
               df[key] = d[i][1][key]

         # remove the original cols here
         list_of_dfs.append(df)

This will give you the df s of each line as a line elemnt in the list_of_dfs which I assumed is the goal? 这将为您提供每行的df ，作为list_of_dfs的行list_of_dfs ，我认为这是目标？ Let me know if it works. 让我知道它是否有效。

如何读取csv文件中包含的python字典并将数据存储在pandas数据框中？

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-05-03 09:26:56

解决方案2
1 2019-05-03 09:26:53

如何读取csv文件中包含的python字典并将数据存储在pandas数据框中？

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-05-03 09:26:56

解决方案2 1 2019-05-03 09:26:53

解决方案1
2 已采纳 2019-05-03 09:26:56

解决方案2
1 2019-05-03 09:26:53