简体   繁体   English

Pandas concat 返回 NaN 值

[英]Pandas concat return NaN values

I am trying to concat two dataframes in pandas.我正在尝试在 pandas 中连接两个数据帧。 One is JSON file and another is Excel file.一个是 JSON 文件,另一个是 Excel 文件。 I need to add values from Excel file to JSON file, so the output is updated JSON file.我需要将 Excel 文件中的值添加到 JSON 文件中,因此 output 被更新为 Z0ECD11Z4D7A28D7A28 文件Note: JSON file has about 20000 lines and more than 10 languages.注:JSON 文件大约有 20000 行和 10 多种语言。
Example of an Excel file: Please note that number of languages is larger. Excel 文件示例:请注意,语言数量较多。 Also number of translated words may differ from one to more.翻译的单词数量也可能从一个到多个不同。 Key of every translation in JSON must be in English. JSON 中每个翻译的键必须是英文。

-------------------------
| en     | de     |  ru |
--------+-------+--------
|Flower  |Blume   | FF|
|Chair   |Stuhl   | BB|
|Snake   |Schlange| CC  |
|Monkey  |Affe    |  DD  |
--------------------------

Here is the example of JSON input file (old JSON file which should be updated with new values from an Excel file given above):这是 JSON 输入文件的示例(旧的 JSON 文件,应该使用上面给出的 Excel 文件中的新值进行更新):

{
    "en": {
        "Ball": "Ball",
        "Snow": "Snow"
    },
    "de": {
        "Ball": "Ball",
        "Snow": "Schnee"
    },
    "ru": {
        "Ball": "AA",
    }
}

You can see that there is no "Snow" under "ru", and it is okay.可以看到“ru”下面没有“Snow”,还可以。 But if I concat two DF, the output looks like this但如果我连接两个 DF,output 看起来像这样

{
    "en": {
        "Ball": "Ball",
        "Snow": "Snow"
    },
    "de": {
        "Ball": "Ball",
        "Snow": "Schnee"
    },
    "ru": {
        "Ball": "AA",
        "Snow": NaN
    }
}

Here is my code这是我的代码

with open(json_filePath, encoding='utf-8') as f:
    old_json = json.load(f)
new_data = pd.read_excel(excel_filePath)
old_data = pd.DataFrame(old_json)
new_json = pd.concat([old_data, new_data.set_index('en', drop=False)]).to_dict()
with open(json_filePath, 'w', encoding='utf-8') as f:
    json.dump(new_json, f, ensure_ascii=False, indent=2, separators=(',', ':'))

This is the output I got:这是我得到的 output:

    {
    "en": {
        "Ball": "Ball",
        "Snow": "Snow",
        "Flower": "Flower",
        "Chair": "Chair",
        "Snake": "Snake",
        "Monkey": "Monkey"
    },
    "de": {
        "Ball": "Ball",
        "Snow": "Schnee",
        "Flower": "Blume",
        "Chair": "Stuhl",
        "Snake": "Schlange",
        "Monkey": "Affe"
    },
    "ru": {
        "Ball": "AA",
        "Snow": NaN,
        "Flower": "FF",
        "Chair": "BB",
        "Snake": "CC",
        "Monkey": "DD"    
    }
}

And below is desired output:下面是所需的 output:

 {
    "en": {
        "Ball": "Ball",
        "Snow": "Snow",
        "Flower": "Flower",
        "Chair": "Chair",
        "Snake": "Snake",
        "Monkey": "Monkey"
    },
    "de": {
        "Ball": "Ball",
        "Snow": "Schnee",
        "Flower": "Blume",
        "Chair": "Stuhl",
        "Snake": "Schlange",
        "Monkey": "Affe"
    },
    "ru": {
        "Ball": "AA",
        "Flower": "FF",
        "Chair": "BB",
        "Snake": "CC",
        "Monkey": "DD"    
    }
}

So the only difference is that pandas added NaN value for "Snow" under "ru".所以唯一的区别是 pandas 在“ru”下为“Snow”添加了 NaN 值。 And this is happening every time, if some key is missing from one language in original JSON file.如果原始 JSON 文件中的一种语言缺少某些键,则每次都会发生这种情况。 My point is that I do not want to change existing values in original JSON file, just to add new values from Excel.我的观点是,我不想更改原始 JSON 文件中的现有值,只是从 Excel 添加新值。 Key of new data (from Excel) is set to be English value.新数据的键(来自 Excel)设置为英文值。 I tried to iterate through the JSON and drop the NaN values, but it drop ALL values with that key.我尝试遍历 JSON 并删除 NaN 值,但它使用该键删除了所有值。 For example if "Snow" is NaN under "ru", and has value under "en", it deletes EVERY VALUE for "Snow".例如,如果“Snow”在“ru”下是 NaN,并且在“en”下具有值,它会删除“Snow”的每个值。 Reset index was not very helpful Tried inner, outer join without success.重置索引不是很有帮助尝试了内部,外部连接但没有成功。 Am a newbie, searched for some solutions, but still no success.我是新手,搜索了一些解决方案,但仍然没有成功。 Any idea?任何想法?

The problem is that pandas only works with dataitems of the same type (each object must have identical list of properties) You can filter NaNs out in output json right before exporting it in a file: The problem is that pandas only works with dataitems of the same type (each object must have identical list of properties) You can filter NaNs out in output json right before exporting it in a file:

...
new_json = pd.concat([old_data, new_data.set_index('en', drop=False)]).to_dict()
filtered_json = {item: {inner_item: inner_value for inner_item, inner_value in value.items()
                if not pd.isna(inner_value)}
                for item, value in new_json.items()}
with open(json_filePath, 'w', encoding='utf-8') as f:
    json.dump(filtered_json, f, ensure_ascii=False, indent=2, separators=(',', ':'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM