I have a JSON file which I am reading into my Python script, flattening and then exporting it as a CSV.
My problem is that I noticed there are various carriage returns and newline feeds within the JSON file so it's messing up the whole structure of the CSV.
Updated Current Code:
from pymongo import MongoClient
import pandas as pd
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings
from pandas import json_normalize
from datetime import datetime, timedelta
import numpy as np
mongo_client = MongoClient("XXXX")
db = mongo_client.scaling
table = db.planning
document = table.find()
docs = list(document)
docs = json_normalize(docs)
docs['pressure'] = docs['pressure'].str.strip().str.replace(" \r\n","")
docs.to_csv("planning.csv", sep = ",",index=False)
I'm getting the following error:
Traceback (most recent call last):
File "XXXX\V2.py", line 16, in <module>
docs['pressureLevels'] = docs['pressureLevels'].str.strip().str.replace(" \r\n","")
File "XXXX.venv\lib\site-packages\pandas\core\generic.py", line 5456, in __getattr__
return object.__getattribute__(self, name)
File "XXXX\.venv\lib\site-packages\pandas\core\accessor.py", line 180, in __get__
accessor_obj = self._accessor(obj)
File "XXXX\.venv\lib\site-packages\pandas\core\strings\accessor.py", line 154, in __init__
self._inferred_dtype = self._validate(data)
File "XXXX\.venv\lib\site-packages\pandas\core\strings\accessor.py", line 218, in _validate
raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!
How do I get rid of the carriage returns, newline feeds when there's an integer present in the dictionary?
Any help will be appreciated.
Try df.json_normalize
followed by str.replace
after str.strip
(instead of the other way around).
This will let you take full advantage of the vectorized methods of str
that pandas provides. That way you can skip the explicit for loop! -
docs = [
{'isActive': 1, 'description': 'teleconference call.\n\n'},
{'isActive': 1, 'description': 'calls to review capacity.\n'},
{'isActive': 1, 'description': 'communications \r\n.'}
]
df = pd.json_normalize(docs)
df['description'] = df['description'].str.strip().str.replace(" \r\n","")
print(df)
isActive description
0 1 teleconference call.
1 1 calls to review capacity.
2 1 communications.
Now you can save this to csv or change it further.
You are getting the error since you are trying to use strip with int
object.
Try this:
for i in docs:
x = {}
for k, v in i.items():
if type(v) == str:
x[k.strip()] = v.strip().replace("\r\n","")
else:
x[k.strip()] = v
docs2.append(x)
You could write a little function to do it:
def try_strip(value):
try:
return value.strip().replace("\r\n", "")
except AttributeError:
return value
docs2 = [{k: try_strip(v) for k, v in d.items()} for d in docs]
# [
# {'isActive': 1, 'description': 'teleconference call.'},
# {'isActive': 1, 'description': 'calls to review capacity.'},
# {'isActive': 1, 'description': 'communications .'}
# ]
The function wouldn't need to use try... except
you could use a test using hasattr()
or isinstance
instead.
Finally found a working solution to remove carriage returns and newline feeds in a list of dictionaries.
Firstly, you use json.dumps
which takes a dictionary as input and returns a string as output to enable you to use .replace
as it only works with strings.
Once the newline feeds and carriage returns have been removed from the string, the string can now be converted back to a dictionary using json.loads
which will take a string as input and returns a dictionary as an output.
docs2 = json.dumps(docs)
docs2 = doc2.replace(r"\n",'').replace(r"\r\n",'').replace(r"\r",'')
docs2 = json.loads(docs2)
docs2 = json_normalize(docs2)
print(docs2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.