Remove carriage return and newline feeds within a list of dictionaries - Python

Question

I have a JSON file which I am reading into my Python script, flattening and then exporting it as a CSV.

My problem is that I noticed there are various carriage returns and newline feeds within the JSON file so it's messing up the whole structure of the CSV.

Updated Current Code:

from pymongo import MongoClient
import pandas as pd
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings
from pandas import json_normalize
from datetime import datetime, timedelta
import numpy as np

mongo_client = MongoClient("XXXX") 
db = mongo_client.scaling
table = db.planning
document = table.find()
docs = list(document)
docs = json_normalize(docs) 
docs['pressure'] = docs['pressure'].str.strip().str.replace(" \r\n","")
docs.to_csv("planning.csv", sep = ",",index=False)

I'm getting the following error:

Traceback (most recent call last):
  File "XXXX\V2.py", line 16, in <module>
    docs['pressureLevels'] = docs['pressureLevels'].str.strip().str.replace(" \r\n","")
  File "XXXX.venv\lib\site-packages\pandas\core\generic.py", line 5456, in __getattr__
    return object.__getattribute__(self, name)
  File "XXXX\.venv\lib\site-packages\pandas\core\accessor.py", line 180, in __get__
    accessor_obj = self._accessor(obj)
  File "XXXX\.venv\lib\site-packages\pandas\core\strings\accessor.py", line 154, in __init__
    self._inferred_dtype = self._validate(data)
  File "XXXX\.venv\lib\site-packages\pandas\core\strings\accessor.py", line 218, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!

How do I get rid of the carriage returns, newline feeds when there's an integer present in the dictionary?

Any help will be appreciated.

Answer 1

Try df.json_normalize followed by str.replace after str.strip (instead of the other way around).

This will let you take full advantage of the vectorized methods of str that pandas provides. That way you can skip the explicit for loop! -

docs =  [
{'isActive': 1, 'description': 'teleconference call.\n\n'}, 
{'isActive': 1, 'description': 'calls to review capacity.\n'}, 
{'isActive': 1, 'description': 'communications \r\n.'}
]

df = pd.json_normalize(docs)
df['description'] = df['description'].str.strip().str.replace(" \r\n","")
print(df)

   isActive                description
0         1       teleconference call.
1         1  calls to review capacity.
2         1            communications.

Now you can save this to csv or change it further.

Answer 2

You are getting the error since you are trying to use strip with int object.

Try this:

for i in docs:
    x = {}
    for k, v in i.items():
        if type(v) == str:
            x[k.strip()] = v.strip().replace("\r\n","")
        else:
            x[k.strip()] = v
    docs2.append(x)

Answer 3

You could write a little function to do it:

def try_strip(value):
    try:
        return value.strip().replace("\r\n", "")
    except AttributeError:
        return value

docs2 = [{k: try_strip(v) for k, v in d.items()} for d in docs]
# [
#     {'isActive': 1, 'description': 'teleconference call.'}, 
#     {'isActive': 1, 'description': 'calls to review capacity.'}, 
#     {'isActive': 1, 'description': 'communications .'}
# ]

The function wouldn't need to use try... except you could use a test using hasattr() or isinstance instead.

Answer 4

Finally found a working solution to remove carriage returns and newline feeds in a list of dictionaries.

Firstly, you use json.dumps which takes a dictionary as input and returns a string as output to enable you to use .replace as it only works with strings.

Once the newline feeds and carriage returns have been removed from the string, the string can now be converted back to a dictionary using json.loads which will take a string as input and returns a dictionary as an output.

docs2 = json.dumps(docs)
docs2 = doc2.replace(r"\n",'').replace(r"\r\n",'').replace(r"\r",'')
docs2 = json.loads(docs2)
docs2 = json_normalize(docs2)
print(docs2)

Remove carriage return and newline feeds within a list of dictionaries - Python

Question

4 answers

solution1
1 2021-01-22 14:21:39

solution2
1 2021-01-22 14:22:23

solution3
0 2021-01-22 14:22:17

solution4
0 ACCPTED 2021-01-25 09:51:04

Remove carriage return and newline feeds within a list of dictionaries - Python

Question

4 answers

solution1 1 2021-01-22 14:21:39

solution2 1 2021-01-22 14:22:23

solution3 0 2021-01-22 14:22:17

solution4 0 ACCPTED 2021-01-25 09:51:04

solution1
1 2021-01-22 14:21:39

solution2
1 2021-01-22 14:22:23

solution3
0 2021-01-22 14:22:17

solution4
0 ACCPTED 2021-01-25 09:51:04