Pandas、Python - 使用來自循環的多個列表組裝數據框

Question

使用循環將目標數據從 JSON 文件收集到列表中。 這些列表被組織為列，並且它們的值被組織； 因此，不需要操縱/重組。 僅水平連接它們。

#Selecting Data into List
i=1
target = f'{pathway}\calls_{i}.json'
with open(target,'r') as f: #Reading JSON file
    data = json.load(f)

    specsA=('PreviousDraws',['DrawNumber'])
    draw=(glom(data,specsA)) #list type; glom is a package to access nested data in JSON file.
    print(draw)

    for j in range(0,5):
        specsB=('PreviousDraws',['WinningNumbers'],[f'{j}'],['Number'])
        number=(glom(data,specsB)) #list type; glom is a package to access nested data in JSON file.
        print(number)

    #Now assembling lists into a table using pandas

上面代碼的結果列表如下：

#This is from variable draw
[10346, 10345, 10344, 10343, 10342, 10341, 10340, 10339, 10338, 10337, 10336, 10335, 10334, 10333, 10332, 10331, 10330, 10329, 10328, 10327]

#This is from variable number 
['22', '9', '4', '1', '1', '14', '5', '3', '2', '8', '2', '1', '4', '9', '4', '4', '3', '13', '7', '14']
['28', '18', '16', '2', '3', '17', '16', '13', '11', '9', '8', '2', '9', '19', '7', '13', '7', '23', '21', '17']
['33', '24', '21', '4', '9', '20', '27', '19', '23', '19', '19', '7', '19', '30', '19', '27', '19', '32', '26', '21']
['35', '30', '28', '11', '21', '23', '33', '26', '35', '37', '27', '12', '20', '31', '22', '34', '22', '36', '27', '25']
['36', '32', '33', '19', '29', '38', '35', '27', '37', '38', '32', '30', '22', '36', '33', '39', '36', '38', '30', '27']

組裝后的預期數據幀表：

Draw  |  Number[0]  |  Number[1]  |  Number[2] ...
10346 |  22         |   28        |
10345 |  9          |   18        |
10344 |  4          |   16        |
10343 |  1          |   2         |
10342 |  1          |   3         |

我嘗試組裝表格：使用 Series 組織為字典，如下所示：

dct = {'DrawNumbers':pd.Series(draw),
        'Index1':pd.Series(number),
        'Index2':pd.Series(number),
        'Index3':pd.Series(number),
        'Index4':pd.Series(number),
        'Index5':pd.Series(number)
        }

df = pd.DataFrame(dct)
print(df)

實際結果 - 由於最后一個列表的值在表的行中重復，因此不正確。 到目前為止，只有 Index5 列是正確的，而所有索引列都錯誤地用索引 5 的值表示。

    DrawNumbers Index1 Index2 Index3 Index4 Index5
0         10346     36     36     36     36     36
1         10345     32     32     32     32     32
2         10344     33     33     33     33     33
3         10343     19     19     19     19     19
4         10342     29     29     29     29     29
5         10341     38     38     38     38     38
6         10340     35     35     35     35     35
7         10339     27     27     27     27     27
8         10338     37     37     37     37     37
9         10337     38     38     38     38     38
...        ...      ...    ...    ...    ...    ...

也曾嘗試將數字的數據類型從字符串更改為 int，但多次出現錯誤。 無論哪種方式，我都被困住了，想請求幫助。

Answer 1

問題是您正在覆蓋循環中的number變量，因此在每次迭代結束時不再可用，我添加了一個解決方案，在每次迭代中添加列索引。

# create an empty dataframe
df = pd.DataFrame()

#Selecting Data into List
i=1
target = f'{pathway}\calls_{i}.json'
with open(target,'r') as f: #Reading JSON file
    data = json.load(f)

    specsA=('PreviousDraws',['DrawNumber'])
    draw=(glom(data,specsA)) #list type; glom is a package to access nested data in JSON file.
    print(draw)

    # insert the draw to the dataframe
    df['DrawNumbers'] = draw

    for j in range(0,5):
        specsB=('PreviousDraws',['WinningNumbers'],[f'{j}'],['Number'])
        number=(glom(data,specsB)) #list type; glom is a package to access nested data in JSON file.
        print(number)
        # insert each number to the dataframe
        df[f'Index{j}'] = number

Answer 2

假設該number是一個嵌套列表：

number = list(map(list, zip(*number))) # this transposes the nested list so that each list within the list now corresponds to one row of the desired df
pd.DataFrame(data=number, index=draw)

這將 output 所需格式的 df。 當然你可以提前go和label列等等。

Pandas、Python - 使用來自循環的多個列表組裝數據框

問題描述

2 個解決方案

解決方案1
1 已采納 2022-03-11 12:01:42

解決方案2
0 2022-03-11 12:01:58

Pandas、Python - 使用來自循環的多個列表組裝數據框

問題描述

2 個解決方案

解決方案1 1 已采納 2022-03-11 12:01:42

解決方案2 0 2022-03-11 12:01:58

解決方案1
1 已采納 2022-03-11 12:01:42

解決方案2
0 2022-03-11 12:01:58