简体   繁体   中英

Write json format using pandas Series and DataFrame

I'm working with csvfiles. My goal is to write a json format with csvfile information. Especifically, I want to get a similar format as miserables.json

Example:

{"source": "Napoleon", "target": "Myriel", "value": 1},

According with the information I have the format would be:

[
{
    "source": "Germany",
    "target": "Mexico",
    "value": 1
},
{
    "source": "Germany",
    "target": "USA",
    "value": 2
},
{
    "source": "Brazil",
    "target": "Argentina",
    "value": 3
}
]

However, with the code I used the output looks as follow:

[
{
    "source": "Germany",
    "target": "Mexico",
    "value": 1
},
{
    "source": null,
    "target": "USA",
    "value": 2
}
][
{
    "source": "Brazil",
    "target": "Argentina",
    "value": 3
}
]

Null source must be Germany. This is one of the main problems, because there are more cities with that issue. Besides this, the information is correct. I just want to remove several list inside the format and replace null to correct country.

This is the code I used using pandas and collections .

csvdata = pandas.read_csv('file.csv', low_memory=False, encoding='latin-1')
countries = csvdata['country'].tolist()
newcountries = list(set(countries))
for element in newcountries:
    bills = csvdata['target'][csvdata['country'] == element]
    frquency = Counter(bills)
    sourceTemp = []
    value = []
    country = element
    for k,v in frquency.items():
        sourceTemp.append(k)
        value.append(int(v))
    forceData = {'source': Series(country), 'target': Series(sourceTemp), 'value': Series(value)}
    dfForce = DataFrame(forceData)
    jsondata = dfForce.to_json(orient='records', force_ascii=False, default_handler=callable)
    parsed = json.loads(jsondata)
    newData = json.dumps(parsed, indent=4, ensure_ascii=False, sort_keys=True)
    # since to_json doesn´t have append mode this will be written in txt file
    savetxt = open('data.txt', 'a')
    savetxt.write(newData)
    savetxt.close()

Any suggestion to solve this problem are appreciate!

Thanks

Consider removing the Series() around the scalar value, country. By doing so and then upsizing the dictionaries of series into a dataframe, you force NaN (later converted to null in json) into the series to match the lengths of other series. You can see this by printing out the dfForce dataframe:

from pandas import Series
from pandas import DataFrame

country = 'Germany'    
sourceTemp = ['Mexico', 'USA', 'Argentina']
value = [1, 2, 3]

forceData = {'source': Series(country),
             'target': Series(sourceTemp),
             'value': Series(value)}
dfForce = DataFrame(forceData)

#     source     target  value
# 0  Germany     Mexico      1
# 1      NaN        USA      2
# 2      NaN  Argentina      3

To resolve, simply keep country as scalar in dictionary of series:

forceData = {'source': country,
             'target': Series(sourceTemp),
             'value': Series(value)}
dfForce = DataFrame(forceData)

#     source     target  value
# 0  Germany     Mexico      1
# 1  Germany        USA      2
# 2  Germany  Argentina      3

By the way, you do not need a dataframe object to output to json. Simply use a list of dictionaries. Consider the following using an Ordered Dictionary collection (to maintain the order of keys). In this way the growing list dumps into a text file without appending which would render an invalid json as opposite facing adjacent square brackets ...][... are not allowed.

from collections import OrderedDict
...

data = []

for element in newcountries:
    bills = csvdata['target'][csvdata['country'] == element]
    frquency = Counter(bills)

    for k,v in frquency.items():
        inner = OrderedDict()
        inner['source']  = element
        inner['target'] = k
        inner['value'] = int(v)

        data.append(inner)

newData = json.dumps(data, indent=4)

with open('data.json', 'w') as savetxt:
    savetxt.write(newData)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM