使用 WHILE 循环组合多个数据框

Question

Screenshot of results from Updated code from suggestion来自建议的更新代码结果的屏幕截图

'dlist' is a list of provider id that is in a dataframe. “dlist”是 dataframe 中的提供程序 ID 列表。 I tried to use a while loop for 'dlist', but it only returns the value of the last provider id in the array.我尝试对'dlist'使用while循环，但它只返回数组中最后一个提供者ID的值。 In this case it is 1005. I used the append function but it didn't do anything.在这种情况下，它是 1005。我使用了 append function 但它什么也没做。 The additional 74 rows from provider id 1000 are not showing.未显示提供者 ID 为 1000 的额外 74 行。 How do I combine everything so it shows the values of both numbers from dlist, to equal 684 rows?如何组合所有内容，以便显示 dlist 中的两个数字的值等于 684 行？

dlist = ["1000", "1005"]

final_list = pd.DataFrame()

index = 0

while index < len(dlist):
    provider = dlist[index]
    
    # Filter dentist (CHANGEABLE)
    final_list = report_df[(report_df["provider_id"] == provider)]

    # Sort values of the codes
    final_list = final_list.sort_values(['codes','report_month'], ascending=True)

    # Drop 'report_year' column
    final_list = final_list.drop(['report_year'], axis = 1)

    # Change 'report_month' numbers into month name
    final_list = final_list.replace({'report_month': {1: "January",
                                                      2: "February",
                                                      3: "March",
                                                      4: "April",
                                                      5: "May",
                                                      6: "June",
                                                      7: "July",
                                                      8: "August",
                                                      9: "September",
                                                      10: "October",
                                                      11: "November"}})
    final_list.append(final_list)
    index +=1

Missing values缺失值

Result of the current code当前代码的结果

Answer 1

Could create a list with all the dataframes and then concatenate them.可以创建一个包含所有数据框的列表，然后将它们连接起来。 Like before the while loop have a list_of_dfs = [] , and prior to the index+=1 add list_of_dfs.append(final_list) .就像在 while 循环之前有一个list_of_dfs = [] ，并且在index+=1之前添加list_of_dfs.append(final_list) 。 You probably dont want final_list.append(final_list) .你可能不想要final_list.append(final_list) 。 Eventually could do my_df_of_concern = pd.concat(list_of_dfs, index=0) .最终可以做my_df_of_concern = pd.concat(list_of_dfs, index=0) 。 Seehttps://pandas.pydata.org/docs/reference/api/pandas.concat.html见https://pandas.pydata.org/docs/reference/api/pandas.concat.html

Answer 2

Your problem is you are modifying the same variable again and again.你的问题是你一次又一次地修改同一个变量。 In your code:在您的代码中：

Line 1: while index < len(dlist):
Line 2:    provider = dlist[index]
    
Line 3:    # Filter dentist (CHANGEABLE)
Line 4:    final_list = report_df[(report_df["provider_id"] == provider)] # PROBLEM LINE
Line 5:    # MORE CODE
Line 6:    # MORE CODE
Line 7:    final_list.append(final_list)
Line 8:    index +=1

Since your dlist has ["1000", "1005"] , during the first run, in line 4, final_list has all the rows where provider_id == 1000 .由于您的dlist具有["1000", "1005"] ，因此在第一次运行期间，在第 4 行中， final_list具有provider_id == 1000的所有行。 Then you make some modifications to it and then in Line 7, you append it to the same object.然后对它进行一些修改，然后在第 7 行中，将 append 更改为相同的 object。 So now, final_list is going to have 2 copies of everything because you are doing final_list.append(final_list)所以现在， final_list 将拥有所有内容的 2 个副本，因为您正在执行final_list.append(final_list)

Then you increment index and for the next iteration where provider is now 1005 , you again do Line 4 where by your final_list is going to be overwritten.然后你增加 index 并且对于 provider 现在是1005的下一次迭代，你再次执行第 4 行，你的 final_list 将被覆盖。 This means that all your previous values stored in that variable is no longer present only the new values where provider_id == 1005 is present.这意味着存储在该变量中的所有先前值不再存在，仅存在provider_id == 1005的新值。

Try changing your code like this尝试像这样更改您的代码

while index < len(dlist):
    provider = dlist[index]
    
    # Filter dentist (CHANGEABLE)
    report_list = report_df[(report_df["provider_id"] == provider)]

    # Sort values of the codes
    report_list = report_list.sort_values(['codes','report_month'], ascending=True)

    # Drop 'report_year' column
    report_list = report_list.drop(['report_year'], axis = 1)

    # Change 'report_month' numbers into month name
    report_list = report_list.replace({'report_month': {1: "January",
                                                      2: "February",
                                                      3: "March",
                                                      4: "April",
                                                      5: "May",
                                                      6: "June",
                                                      7: "July",
                                                      8: "August",
                                                      9: "September",
                                                      10: "October",
                                                      11: "November"}})
    final_list.append(report_list)
    index +=1

report_list acts as a temporary variable which holds all the data of a particular provider and then after all your modifications like dropping report_year column, sorting, etc. you append the values to final_list. report_list充当一个临时变量，它保存特定提供者的所有数据，然后在您进行所有修改（如删除 report_year 列、排序等）之后，您将 append 值添加到 final_list。 Now you will have the data across multiple iterations.现在，您将拥有跨多次迭代的数据。

Also, instead of doing此外，而不是做

while index < len(dlist):
    provider = dlist[index]
    index +=1

you can do this:你可以这样做：

for provider in dlist:
    # YOUR CODE where provider will be "1000" for 1st run and "1005" in second run

使用 WHILE 循环组合多个数据框

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-12-15 17:09:35

解决方案2
0 2021-12-15 17:23:36

使用 WHILE 循环组合多个数据框

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-12-15 17:09:35

解决方案2 0 2021-12-15 17:23:36

解决方案1
0 已采纳 2021-12-15 17:09:35

解决方案2
0 2021-12-15 17:23:36