Replace function not working when ran directly from python-requests reponse

Question

I have some code that processes some json data that is received via API. The json is poorly formatted but I've gotten it to process the response so that I get the columns and corresponding rows working correctly and into a pandas dataframe.

However it looks like when processing the column names it's including some special characters that I don't want (parenthesis and a comma) in the column names.

If I process the same json data that's saved to a file on the computer or if I query my mssql server to get the data (I'm saving this data from the API to an MSSQL instance for processing in another program) I am able to run the .replace function and can get it to get rid of the extra characters.

But if I directly process the requests response data into the pandas dataframe and then try using the .replace function on the column names it doesn't work. Either it won't do anything or it will work but it returns all the column names as NaN.

I've tested a number of things including using regex and looked through other posts that may help but have been unable to get anything to work.

I've tried instead to get the column names as a list, using the replace function, and then rename the column names from the list but that also is giving me an error that the lengths don't match. I'd rather not go that second route if possible so I can keep the code smaller.

##This function processes the json file from the desktop
def processJsonFile(jsonFile):
    ##Unpack json file
    with open(jsonFile) as f:
        data = json.loads(f)
    ##unpack rows
    df_rows = json_normalize(data, record_path=['rows'])
    df_rows.columns = data['columnNames']
    return df_rows

##This is the function that processes the json data from the api response
def processJsonResponse(jsonFile):
    ##Unpack json response
    data = json.loads(jsonFile)
    ##unpack rows and columns into the dataframe
    df_rows = json_normalize(data, record_path=['items',['rows']])
    df_rows.columns = json_normalize(data, record_path=['items', 
       ['columnNames']])

    return df_rows

##process response into dataframe
##df = processJson("incidents.json")
df = pd.DataFrame (columns = {"(id,)","(lookupName,)","(createdTime,)"})
##Remove extra values
df.columns = df.columns.str.replace(r'[^a-zA-Z ]\s?', '')

With the above code, if I create the dataframe with the column names that are incorrect and then I run the replace function it works and removes the comma and parenthesis. If I pass through the incidents.json file (which has the same column names as the created dataframe that I tested with) and then run the replace function it works as expected.

However if I take the get request response from my API and pass through the text or content of that response to the processJsonResponse function and then try to run the .replace function it doesn't work. Either it just doesn't do anything or it replaces the column names with "NaN" so that isn't helpful.

Preferably I'd like to just pass through the response and be able to format the column names correctly without having to save the response as a file and then open it and process it and I'd prefer not to pass the column data to a list, format the list, then pass that through as the column names. The second option has been causing me some problems with array length not matching.

The json data I'm getting back looks similar to this.

    {
        "tableName": "incidents",
        "count": 4,
        "columnNames": [
            "id",
            "lookupName",
            "createdTime",
            "updatedTime"
        ],
        "rows": [
            [
                "100",
                "1",
                "2015-01-01T00:14:42.000Z",
                "2017-05-02T14:01:03.000Z"
            ],
            [
                "101",
                "2",
                "2015-01-01T00:22:56.000Z",
                "2015-01-01T04:34:35.000Z"
            ],
            [
                "102",
                "3",
                "2015-01-01T00:29:09.000Z",
                "2015-01-01T00:29:09.000Z"
            ],
            [
                "103",
                "4",
                "2015-01-01T00:40:35.000Z",
                "2015-01-01T00:40:35.000Z"
            ]

] }

Any help or insight with why this is acting so weird when processing through the API request response data would be greatly appreciated.

The API request is made using the requests library and is a GET request to an oracle database API. The data sent back is in the JSON format.

Answer 1

I was able to fix it..... The end

lol, jk

I was able to fix this by sending the column dataframe to a list, running a regex on the list to replace the characters I didn't want, and then set that list as the column names. Below is what fixed it.

columnlist = columnNames[0].to_list()
columnlist = [re.sub("[:\-() ]","",x) for x in columnlist]
df_rows.columns = columnlist

Thank you to anyone who may have looked at this.

An update to this is I don't need the regex replace at all. Sending to a list and then directly as the dataframe columns seems to work to not include those special characters.

Replace function not working when ran directly from python-requests reponse

Question

1 answers

solution1
0 ACCPTED 2019-09-12 21:49:36

Replace function not working when ran directly from python-requests reponse

Question

1 answers

solution1 0 ACCPTED 2019-09-12 21:49:36

solution1
0 ACCPTED 2019-09-12 21:49:36