Python 代码在 Windows 与 CentOS 中的工作方式不同

Question

I have a python code that is presenting a different behavior when I run it on Windows and when I run it on CentOS.我有一个 python 代码，当我在 Windows 上运行它以及在 CentOS 上运行它时，它呈现出不同的行为。 Below is the partial code that is of interest for this issue with comments to explain what is the purpose.以下是对此问题感兴趣的部分代码，并带有注释以解释其目的。 It basically process a bunch of CSV files (some of them with different columns from each other) and merge them into a single CSV that has all the columns:它基本上处理一堆 CSV 文件（其中一些具有彼此不同的列）并将它们合并到具有所有列的单个 CSV 中：

#Get the name of CSV files of the current folder:
 local_csv_files = glob("*.csv")
 #Define the columns and the order they should appear on the final file:
 global_csv_columns = ['Timestamp', 'a_country', 'b_country', 'call_setup_time','quality','latency','throughput','test_type']
 #Dataframe list:
 lista_de_dataframes=[]
 
 #Loop to be executed for all the CSV files in the current folder.
 for ficheiro_csv in local_csv_files:
    df = pd.read_csv(ficheiro_csv)
    #Store the CSV columns on a variable and collect the number of columns:
    colunas_do_csv_aux= df.columns.values
    global_number_of_columns = len(global_csv_columns)
    aux_csv_number_of_columns = len(colunas_do_csv_aux)
    #Normalize each CSV file so that all CSV files have the same columns
    for coluna_ in global_csv_columns:
       if search_column(colunas_do_csv_aux, coluna_)==False:
          #If the column does not exist in the current CSV, add an empty column with the correct header:
          df.insert(0, coluna_, "")
    #Order the dataframe columns according to the order of the global_csv_columns list:
    df = df[global_csv_columns]
    lista_de_dataframes.append(df)
    del df
 big_unified_dataframe = pd.concat(lista_de_dataframes, copy=False).drop_duplicates().reset_index(drop=True)
 big_unified_dataframe.to_csv('global_file.csv', index=False)

#Create an additional txt file to present with each row of the CSV in a JSON format:
with open('global_file.csv', 'r') as arquivo_csv:
   with open('global_file_c.txt', 'w') as arquivo_txt:
      reader = csv.DictReader(arquivo_csv, global_csv_columns)
      iterreader = iter(reader)
      next(iterreader)
      for row in iterreader:
         out=json.dumps(row)
         arquivo_txt.write(out)

Now, on Windows and on CentOS, this works well for the final CSV since it has all the columns ordered as defined in the list:现在，在 Windows 和 CentOS 上，这对于最终的 CSV 非常有效，因为它具有按照列表中定义的所有列排序：

global_csv_columns = ['Timestamp', 'a_country', 'b_country', 'call_setup_time','quality','latency','throughput','test_type']

This ordering is achieved by this code line:此排序是通过以下代码行实现的：

#Order the dataframe columns according to the order of the global_csv_columns list:
    df = df[global_csv_columns]

But the final 'txt' file is different on CentOS.但最终的“txt”文件在 CentOS 上有所不同。 In CentOS the order is changed.在 CentOS 中，顺序已更改。 Below the output of the txt file in both platforms (windows and CentOS).在两个平台（windows和CentOS）下txt文件的output下面。

Windows : Windows ：

{"Timestamp": "06/09/2022 10:33", "a_country": "UAE", "b_country": "UAE", "call_setup_time": "7.847", "quality": "", "latency": "", "throughput": "", "test_type": "voice_call"}
{"Timestamp": "06/09/2022 10:30", "a_country": "Saudi_Arabia", "b_country": "Saudi_Arabia", "call_setup_time": "10.038", "quality": "", "latency": "", "throughput": "", "test_type": "voice_call"}
...

CentOS: CentOS：

{"latency": "", "call_setup_time": "7.847", "Timestamp": "06/09/2022 10:33", "test_type": "voice_call", "throughput": "", "b_country": "UAE", "a_country": "UAE", "quality": ""}
{"latency": "", "call_setup_time": "10.038", "Timestamp": "06/09/2022 10:30", "test_type": "voice_call", "throughput": "", "b_country": "Saudi_Arabia", "a_country": "Saudi_Arabia", "quality": ""}
...

Is there any way to assure the column order in CentOS?有什么方法可以保证 CentOS 中的列顺序？

Answer 1

On CentOS I'm running: Python 2.7.18 On Windows I'm running: Python 3.9.6 On CentOS I'm running: Python 2.7.18 On Windows I'm running: Python 3.9.6

Now reason is clear: order inside common dict s was added in python3.6 as implemention specific and is required to be furnished in python3.7 and newer.现在原因很清楚了：在 python3.6 中添加了 common dict中的 order 作为实现特定的，并且需要在 python3.7 和更高版本中提供。

Read Are dictionaries ordered in Python 3.6+?阅读是否在 Python 3.6+ 中订购了字典？ if you want to know more.如果你想知道更多。

If you know which command/version/repository I should use to install a similar version on CentOS please let me know.如果您知道我应该使用哪个命令/版本/存储库在 CentOS 上安装类似版本，请告诉我。

Optimal solution would be to have same python versions up to minor, that is if you have 3.9.6 on your Windows machine then python3.9 on CentOS.最佳解决方案是拥有相同的 python 版本，直到次要版本，也就是说，如果你的 Windows 机器上有 3.9.6，然后在 CentOS 上安装 python3.9。 If you are unable to install it python3.7 or python3.8 should do, however be warned that if you have both python2 and python3 installed on single machine, then you should use python3 if you want to use newer version, eg如果您无法安装它，python3.7 或 python3.8 应该这样做，但是请注意，如果您在单台机器上同时安装了 python2 和 python3，那么如果您想使用更新版本，则应该使用 python3，例如

python3 helloworld.py

where helloworld.py is file with python code.其中helloworld.py是带有 python 代码的文件。

Answer 2

try the pd.DataFrame.to_json function which allows you to write a dataframe to a json file directly.尝试使用pd.DataFrame.to_json function ，它允许您将 dataframe 直接写入 Z45656DEEC76ECDF32DFCA 文件。 This will allow you to write a df to the json file without reading it from a csv file.这将允许您将 df 写入 json 文件，而无需从 csv 文件中读取它。 I suspect this function may allow you to write without changing the order of the column.我怀疑这个 function 可能允许您在不更改列顺序的情况下进行编写。

Answer 3

Your output JSON dictionaries aren't sorted so the order in which the tags appear could be random.您的 output JSON 字典未排序，因此标签出现的顺序可能是随机的。 I think in practice the tags usually appear in the order in which they were created in each dictionary but if you can have the dictionaries sorted by tag:我认为实际上标签通常按照它们在每个字典中创建的顺序出现，但如果你可以让字典按标签排序：

out=json.dumps(row, sort_keys=True)

This will at least make them consistent although you may place more meaning on some tags.尽管您可能会在某些标签上赋予更多含义，但这至少会使它们保持一致。

Python 代码在 Windows 与 CentOS 中的工作方式不同

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-09-12 13:06:57

解决方案2
0 2022-09-12 11:43:54

解决方案3
0 2022-09-12 11:47:04

Python 代码在 Windows 与 CentOS 中的工作方式不同

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-09-12 13:06:57

解决方案2 0 2022-09-12 11:43:54

解决方案3 0 2022-09-12 11:47:04

解决方案1
1 已采纳 2022-09-12 13:06:57

解决方案2
0 2022-09-12 11:43:54

解决方案3
0 2022-09-12 11:47:04