简体   繁体   English

直接从python-requests响应运行时,替换功能不起作用

[英]Replace function not working when ran directly from python-requests reponse

I have some code that processes some json data that is received via API. 我有一些代码可以处理通过API接收的一些json数据。 The json is poorly formatted but I've gotten it to process the response so that I get the columns and corresponding rows working correctly and into a pandas dataframe. json的格式不正确,但是我已经得到它来处理响应,以便我可以正确地将列和相应的行工作并输入到pandas数据框中。

However it looks like when processing the column names it's including some special characters that I don't want (parenthesis and a comma) in the column names. 但是,在处理列名称时,看起来好像在列名称中包含了一些我不想使用的特殊字符(括号和逗号)。

If I process the same json data that's saved to a file on the computer or if I query my mssql server to get the data (I'm saving this data from the API to an MSSQL instance for processing in another program) I am able to run the .replace function and can get it to get rid of the extra characters. 如果我处理与计算机上保存的文件相同的json数据,或者查询了mssql服务器以获取数据(我将这些数据从API保存到MSSQL实例中,以便在另一个程序中进行处理),运行.replace函数,可以获取它来摆脱多余的字符。

But if I directly process the requests response data into the pandas dataframe and then try using the .replace function on the column names it doesn't work. 但是,如果我直接将请求响应数据处理到pandas数据框中,然后尝试在列名上使用.replace函数,则该函数将无效。 Either it won't do anything or it will work but it returns all the column names as NaN. 它要么什么都不做,要么将起作用,但是它将所有列名称返回为NaN。

I've tested a number of things including using regex and looked through other posts that may help but have been unable to get anything to work. 我已经测试了很多东西,包括使用正则表达式,并浏览了其他文章,这些文章可能会有所帮助,但一直无法正常工作。

I've tried instead to get the column names as a list, using the replace function, and then rename the column names from the list but that also is giving me an error that the lengths don't match. 我尝试使用replace函数将列名作为列表获取,然后从列表中重命名列名,但这也给我一个错误,即长度不匹配。 I'd rather not go that second route if possible so I can keep the code smaller. 如果可能的话,我宁愿不走第二条路,这样我可以使代码更小。

##This function processes the json file from the desktop
def processJsonFile(jsonFile):
    ##Unpack json file
    with open(jsonFile) as f:
        data = json.loads(f)
    ##unpack rows
    df_rows = json_normalize(data, record_path=['rows'])
    df_rows.columns = data['columnNames']
    return df_rows

##This is the function that processes the json data from the api response
def processJsonResponse(jsonFile):
    ##Unpack json response
    data = json.loads(jsonFile)
    ##unpack rows and columns into the dataframe
    df_rows = json_normalize(data, record_path=['items',['rows']])
    df_rows.columns = json_normalize(data, record_path=['items', 
       ['columnNames']])

    return df_rows

##process response into dataframe
##df = processJson("incidents.json")
df = pd.DataFrame (columns = {"(id,)","(lookupName,)","(createdTime,)"})
##Remove extra values
df.columns = df.columns.str.replace(r'[^a-zA-Z ]\s?', '')

With the above code, if I create the dataframe with the column names that are incorrect and then I run the replace function it works and removes the comma and parenthesis. 使用上面的代码,如果我使用不正确的列名创建了数据框,然后运行了replace函数,它将起作用并删除逗号和括号。 If I pass through the incidents.json file (which has the same column names as the created dataframe that I tested with) and then run the replace function it works as expected. 如果我通过了events.json文件(与我测试过的创建的数据框具有相同的列名),然后运行replace函数,它将按预期工作。

However if I take the get request response from my API and pass through the text or content of that response to the processJsonResponse function and then try to run the .replace function it doesn't work. 但是,如果我从我的API中获取get请求响应,并将该响应的文本或内容传递给processJsonResponse函数,然后尝试运行.replace函数,它将无法正常工作。 Either it just doesn't do anything or it replaces the column names with "NaN" so that isn't helpful. 它只是什么都不做,或者将列名替换为“ NaN”,所以这无济于事。

Preferably I'd like to just pass through the response and be able to format the column names correctly without having to save the response as a file and then open it and process it and I'd prefer not to pass the column data to a list, format the list, then pass that through as the column names. 最好是我只想通过响应并能够正确格式化列名称,而不必将响应另存为文件,然后打开并处理它,并且我不希望将列数据传递给列表,格式化列表,然后将其作为列名传递。 The second option has been causing me some problems with array length not matching. 第二个选择一直使我遇到一些数组长度不匹配的问题。

The json data I'm getting back looks similar to this. 我返回的json数据与此类似。

    {
        "tableName": "incidents",
        "count": 4,
        "columnNames": [
            "id",
            "lookupName",
            "createdTime",
            "updatedTime"
        ],
        "rows": [
            [
                "100",
                "1",
                "2015-01-01T00:14:42.000Z",
                "2017-05-02T14:01:03.000Z"
            ],
            [
                "101",
                "2",
                "2015-01-01T00:22:56.000Z",
                "2015-01-01T04:34:35.000Z"
            ],
            [
                "102",
                "3",
                "2015-01-01T00:29:09.000Z",
                "2015-01-01T00:29:09.000Z"
            ],
            [
                "103",
                "4",
                "2015-01-01T00:40:35.000Z",
                "2015-01-01T00:40:35.000Z"
            ]

] } ]}

Any help or insight with why this is acting so weird when processing through the API request response data would be greatly appreciated. 在通过API请求响应数据进行处理时,有关此操作为何如此奇怪的任何帮助或见解将不胜感激。

The API request is made using the requests library and is a GET request to an oracle database API. API请求是使用请求库发出的,是对oracle数据库API的GET请求。 The data sent back is in the JSON format. 发送回的数据为JSON格式。

I was able to fix it..... The end 我能够解决它.....结束

lol, jk 哈哈,jk

I was able to fix this by sending the column dataframe to a list, running a regex on the list to replace the characters I didn't want, and then set that list as the column names. 通过将列数据帧发送到列表,在列表上运行正则表达式来替换不需要的字符,然后将该列表设置为列名称,我能够解决此问题。 Below is what fixed it. 以下是修复问题的方法。

columnlist = columnNames[0].to_list()
columnlist = [re.sub("[:\-() ]","",x) for x in columnlist]
df_rows.columns = columnlist

Thank you to anyone who may have looked at this. 谢谢所有看过此书的人。

An update to this is I don't need the regex replace at all. 对此的更新是我根本不需要正则表达式替换。 Sending to a list and then directly as the dataframe columns seems to work to not include those special characters. 发送到列表,然后直接发送到数据框列似乎不包含那些特殊字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM