简体   繁体   English

如何将 JSON 和 TSV 数据混合到数据框中?

[英]how do I get mixed JSON and TSV data into a data frame?

I have a file that I am trying to read into Pandas, but the main problem is the file has a mixed format of JSON and tab separated values (TSV).我有一个文件,我试图读入 Pandas,但主要问题是该文件具有 JSON 和制表符分隔值 (TSV) 的混合格式。 Here is an image of the file loaded into a dataframe:这是加载到 dataframe 中的文件的图像: 在此处输入图像描述

So if my understanding is correct, You want to load a tsv file as a pandas dataframe right?所以如果我的理解是正确的,你想加载一个 tsv 文件作为 pandas dataframe 对吗?

Assuming you have a tsv file.假设你有一个 tsv 文件。

df = pd.read_csv("path to the tsv file", sep="\t")

This will load your tsv file as a DF.这会将您的 tsv 文件加载为 DF。

Then what you can do is iterate over the column that has your json.然后你可以做的是迭代具有你的 json 的列。

for col in df[["columnname"]]:
    obj = df[col]
    for item in obj.values:
        json_obj = json.loads(item)

This is verbatim a homework question for the UMich Data Science degree, so I won't answer in detail.这是 UMich 数据科学学位的逐字作业问题,所以我不会详细回答。 That said, my overall successful approach was to read in the file as a list, and then evaluate each item in the list with a for loop.也就是说,我的总体成功方法是将文件作为列表读入,然后使用 for 循环评估列表中的每个项目。 Since each item came in as a string, if it looked like a JSON object, I used json.loads() on the item to convert it to Python dictionary format.由于每个项目都以字符串形式出现,如果它看起来像 JSON object,我在项目上使用 json.loads() 将其转换为 Python 字典格式。 If it did not look like a JSON object, I used.split() and then created a dictionary of key-value pairs with the given keys, and subsetting each element of the split as a value.如果它看起来不像 JSON object,我使用 .split() 然后用给定的键创建一个键值对字典,并将拆分的每个元素子集化为一个值。 Clunky, but it worked.笨拙,但它奏效了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM