简体   繁体   中英

how do I get mixed JSON and TSV data into a data frame?

I have a file that I am trying to read into Pandas, but the main problem is the file has a mixed format of JSON and tab separated values (TSV). Here is an image of the file loaded into a dataframe: 在此处输入图像描述

So if my understanding is correct, You want to load a tsv file as a pandas dataframe right?

Assuming you have a tsv file.

df = pd.read_csv("path to the tsv file", sep="\t")

This will load your tsv file as a DF.

Then what you can do is iterate over the column that has your json.

for col in df[["columnname"]]:
    obj = df[col]
    for item in obj.values:
        json_obj = json.loads(item)

This is verbatim a homework question for the UMich Data Science degree, so I won't answer in detail. That said, my overall successful approach was to read in the file as a list, and then evaluate each item in the list with a for loop. Since each item came in as a string, if it looked like a JSON object, I used json.loads() on the item to convert it to Python dictionary format. If it did not look like a JSON object, I used.split() and then created a dictionary of key-value pairs with the given keys, and subsetting each element of the split as a value. Clunky, but it worked.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM