[英]Pandas, Python - Assembling a Data Frame with multiple lists from loop
使用循環將目標數據從 JSON 文件收集到列表中。 這些列表被組織為列,並且它們的值被組織; 因此,不需要操縱/重組。 僅水平連接它們。
#Selecting Data into List
i=1
target = f'{pathway}\calls_{i}.json'
with open(target,'r') as f: #Reading JSON file
data = json.load(f)
specsA=('PreviousDraws',['DrawNumber'])
draw=(glom(data,specsA)) #list type; glom is a package to access nested data in JSON file.
print(draw)
for j in range(0,5):
specsB=('PreviousDraws',['WinningNumbers'],[f'{j}'],['Number'])
number=(glom(data,specsB)) #list type; glom is a package to access nested data in JSON file.
print(number)
#Now assembling lists into a table using pandas
上面代碼的結果列表如下:
#This is from variable draw
[10346, 10345, 10344, 10343, 10342, 10341, 10340, 10339, 10338, 10337, 10336, 10335, 10334, 10333, 10332, 10331, 10330, 10329, 10328, 10327]
#This is from variable number
['22', '9', '4', '1', '1', '14', '5', '3', '2', '8', '2', '1', '4', '9', '4', '4', '3', '13', '7', '14']
['28', '18', '16', '2', '3', '17', '16', '13', '11', '9', '8', '2', '9', '19', '7', '13', '7', '23', '21', '17']
['33', '24', '21', '4', '9', '20', '27', '19', '23', '19', '19', '7', '19', '30', '19', '27', '19', '32', '26', '21']
['35', '30', '28', '11', '21', '23', '33', '26', '35', '37', '27', '12', '20', '31', '22', '34', '22', '36', '27', '25']
['36', '32', '33', '19', '29', '38', '35', '27', '37', '38', '32', '30', '22', '36', '33', '39', '36', '38', '30', '27']
組裝后的預期數據幀表:
Draw | Number[0] | Number[1] | Number[2] ...
10346 | 22 | 28 |
10345 | 9 | 18 |
10344 | 4 | 16 |
10343 | 1 | 2 |
10342 | 1 | 3 |
我嘗試組裝表格:使用 Series 組織為字典,如下所示:
dct = {'DrawNumbers':pd.Series(draw),
'Index1':pd.Series(number),
'Index2':pd.Series(number),
'Index3':pd.Series(number),
'Index4':pd.Series(number),
'Index5':pd.Series(number)
}
df = pd.DataFrame(dct)
print(df)
實際結果 - 由於最后一個列表的值在表的行中重復,因此不正確。 到目前為止,只有 Index5 列是正確的,而所有索引列都錯誤地用索引 5 的值表示。
DrawNumbers Index1 Index2 Index3 Index4 Index5
0 10346 36 36 36 36 36
1 10345 32 32 32 32 32
2 10344 33 33 33 33 33
3 10343 19 19 19 19 19
4 10342 29 29 29 29 29
5 10341 38 38 38 38 38
6 10340 35 35 35 35 35
7 10339 27 27 27 27 27
8 10338 37 37 37 37 37
9 10337 38 38 38 38 38
... ... ... ... ... ... ...
也曾嘗試將數字的數據類型從字符串更改為 int,但多次出現錯誤。 無論哪種方式,我都被困住了,想請求幫助。
問題是您正在覆蓋循環中的number
變量,因此在每次迭代結束時不再可用,我添加了一個解決方案,在每次迭代中添加列索引。
# create an empty dataframe
df = pd.DataFrame()
#Selecting Data into List
i=1
target = f'{pathway}\calls_{i}.json'
with open(target,'r') as f: #Reading JSON file
data = json.load(f)
specsA=('PreviousDraws',['DrawNumber'])
draw=(glom(data,specsA)) #list type; glom is a package to access nested data in JSON file.
print(draw)
# insert the draw to the dataframe
df['DrawNumbers'] = draw
for j in range(0,5):
specsB=('PreviousDraws',['WinningNumbers'],[f'{j}'],['Number'])
number=(glom(data,specsB)) #list type; glom is a package to access nested data in JSON file.
print(number)
# insert each number to the dataframe
df[f'Index{j}'] = number
假設該number
是一個嵌套列表:
number = list(map(list, zip(*number))) # this transposes the nested list so that each list within the list now corresponds to one row of the desired df
pd.DataFrame(data=number, index=draw)
這將 output 所需格式的 df。 當然你可以提前go和label列等等。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.