Python 代码在 Windows 与 CentOS 中的工作方式不同

Question

我有一个 python 代码，当我在 Windows 上运行它以及在 CentOS 上运行它时，它呈现出不同的行为。 以下是对此问题感兴趣的部分代码，并带有注释以解释其目的。 它基本上处理一堆 CSV 文件（其中一些具有彼此不同的列）并将它们合并到具有所有列的单个 CSV 中：

#Get the name of CSV files of the current folder:
 local_csv_files = glob("*.csv")
 #Define the columns and the order they should appear on the final file:
 global_csv_columns = ['Timestamp', 'a_country', 'b_country', 'call_setup_time','quality','latency','throughput','test_type']
 #Dataframe list:
 lista_de_dataframes=[]
 
 #Loop to be executed for all the CSV files in the current folder.
 for ficheiro_csv in local_csv_files:
    df = pd.read_csv(ficheiro_csv)
    #Store the CSV columns on a variable and collect the number of columns:
    colunas_do_csv_aux= df.columns.values
    global_number_of_columns = len(global_csv_columns)
    aux_csv_number_of_columns = len(colunas_do_csv_aux)
    #Normalize each CSV file so that all CSV files have the same columns
    for coluna_ in global_csv_columns:
       if search_column(colunas_do_csv_aux, coluna_)==False:
          #If the column does not exist in the current CSV, add an empty column with the correct header:
          df.insert(0, coluna_, "")
    #Order the dataframe columns according to the order of the global_csv_columns list:
    df = df[global_csv_columns]
    lista_de_dataframes.append(df)
    del df
 big_unified_dataframe = pd.concat(lista_de_dataframes, copy=False).drop_duplicates().reset_index(drop=True)
 big_unified_dataframe.to_csv('global_file.csv', index=False)

#Create an additional txt file to present with each row of the CSV in a JSON format:
with open('global_file.csv', 'r') as arquivo_csv:
   with open('global_file_c.txt', 'w') as arquivo_txt:
      reader = csv.DictReader(arquivo_csv, global_csv_columns)
      iterreader = iter(reader)
      next(iterreader)
      for row in iterreader:
         out=json.dumps(row)
         arquivo_txt.write(out)

现在，在 Windows 和 CentOS 上，这对于最终的 CSV 非常有效，因为它具有按照列表中定义的所有列排序：

global_csv_columns = ['Timestamp', 'a_country', 'b_country', 'call_setup_time','quality','latency','throughput','test_type']

此排序是通过以下代码行实现的：

#Order the dataframe columns according to the order of the global_csv_columns list:
    df = df[global_csv_columns]

但最终的“txt”文件在 CentOS 上有所不同。 在 CentOS 中，顺序已更改。 在两个平台（windows和CentOS）下txt文件的output下面。

Windows ：

{"Timestamp": "06/09/2022 10:33", "a_country": "UAE", "b_country": "UAE", "call_setup_time": "7.847", "quality": "", "latency": "", "throughput": "", "test_type": "voice_call"}
{"Timestamp": "06/09/2022 10:30", "a_country": "Saudi_Arabia", "b_country": "Saudi_Arabia", "call_setup_time": "10.038", "quality": "", "latency": "", "throughput": "", "test_type": "voice_call"}
...

CentOS：

{"latency": "", "call_setup_time": "7.847", "Timestamp": "06/09/2022 10:33", "test_type": "voice_call", "throughput": "", "b_country": "UAE", "a_country": "UAE", "quality": ""}
{"latency": "", "call_setup_time": "10.038", "Timestamp": "06/09/2022 10:30", "test_type": "voice_call", "throughput": "", "b_country": "Saudi_Arabia", "a_country": "Saudi_Arabia", "quality": ""}
...

有什么方法可以保证 CentOS 中的列顺序？

Answer 1

On CentOS I'm running: Python 2.7.18 On Windows I'm running: Python 3.9.6

现在原因很清楚了：在 python3.6 中添加了 common dict中的 order 作为实现特定的，并且需要在 python3.7 和更高版本中提供。

阅读是否在 Python 3.6+ 中订购了字典？ 如果你想知道更多。

如果您知道我应该使用哪个命令/版本/存储库在 CentOS 上安装类似版本，请告诉我。

最佳解决方案是拥有相同的 python 版本，直到次要版本，也就是说，如果你的 Windows 机器上有 3.9.6，然后在 CentOS 上安装 python3.9。 如果您无法安装它，python3.7 或 python3.8 应该这样做，但是请注意，如果您在单台机器上同时安装了 python2 和 python3，那么如果您想使用更新版本，则应该使用 python3，例如

python3 helloworld.py

其中helloworld.py是带有 python 代码的文件。

Answer 2

尝试使用pd.DataFrame.to_json function ，它允许您将 dataframe 直接写入 Z45656DEEC76ECDF32DFCA 文件。 这将允许您将 df 写入 json 文件，而无需从 csv 文件中读取它。 我怀疑这个 function 可能允许您在不更改列顺序的情况下进行编写。

Answer 3

您的 output JSON 字典未排序，因此标签出现的顺序可能是随机的。 我认为实际上标签通常按照它们在每个字典中创建的顺序出现，但如果你可以让字典按标签排序：

out=json.dumps(row, sort_keys=True)

尽管您可能会在某些标签上赋予更多含义，但这至少会使它们保持一致。

Python 代码在 Windows 与 CentOS 中的工作方式不同

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-09-12 13:06:57

解决方案2
0 2022-09-12 11:43:54

解决方案3
0 2022-09-12 11:47:04

Python 代码在 Windows 与 CentOS 中的工作方式不同

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-09-12 13:06:57

解决方案2 0 2022-09-12 11:43:54

解决方案3 0 2022-09-12 11:47:04

解决方案1
1 已采纳 2022-09-12 13:06:57

解决方案2
0 2022-09-12 11:43:54

解决方案3
0 2022-09-12 11:47:04