[英]filling existing dataframe with rows from loop
I tried going over How to build and fill pandas dataframe from for loop?我尝试过如何从for循环构建和填充pandas dataframe? but cant seem to write my values to my columns.
但似乎无法将我的价值观写入我的专栏。
Ultimately I am getting data from a webpage and want to put it into a dataframe.最终,我从网页获取数据并希望将其放入 dataframe 中。
my headers are predefined as:我的标题预定义为:
d1 = pd.DataFrame(columns=['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17'])
now I have values I get in a for loop, how can I write these rows to each column then repeat back to column 1 to 17 and then next row?现在我有我在 for 循环中得到的值,我怎样才能将这些行写入每一列,然后重复回到第 1 到第 17 列,然后是下一行?
row = soup.find_all('td', attrs = {'class': 'Table__TD'})
for data in row:
print(data.get_text())
sample output row 1样品 output 第 1 行
Mon 11/11
SA
100
31
3-5
60.0
1-3
33.3
1-2
50.0
10
4
0
1
1
2
8
Sample output row 2样品 output 行 2
Wed 11/13
@CHA
W119-117
32
1-5
20.0
1-5
20.0
0-0
0.0
3
1
0
1
3
3
3
Expected output预期 output
Any help would be appreciated.任何帮助,将不胜感激。
First we have list for column names:首先,我们有列名列表:
cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17']
Then list for values:然后列出值:
row = [x.get_text() for x in soup.find_all('td', attrs = {'class': 'Table__TD'})]
print(row)
# ['Mon 11/11', 'SA', '100', '31', '3-5', '60.0', '1-3', '33.3', '1-2', '50.0', '10', '4', '0', '1', '1', '2', '8']
Then we can zip the columns and the values together, then append to the dataframe:然后我们可以将 zip 的列和值放在一起,然后 append 到 dataframe:
d1 = d1.append(dict(zip(cols, row)), ignore_index=True)
print(d1)
# col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 col12 \
# 0 Mon 11/11 SA 100 31 3-5 60.0 1-3 33.3 1-2 50.0 10 4
#
# col13 col14 col15 col16 col17
# 0 0 1 1 2 8
You can try this,你可以试试这个
import pandas as pd
columns = [
'col1',
'col2',
'col3',
'col4',
'col5',
'col6',
'col7',
'col8',
'col9',
'col10',
'col11',
'col12',
'col13',
'col14',
'col15',
'col16',
'col17',
]
# create dataframe
d1 = pd.DataFrame(columns=columns)
full = []
for data in soup.find_all('td', attrs={'class': 'Table__TD'}):
full.append(data.get_text())
# seperate full list into sub-lists with 17 elements
rows = [full[i: i+17] for i in range(0, len(full), 17)]
# append list of lists structure to dataframe
d1 = d1.append(pd.DataFrame(rows, columns=d1.columns))
Appending data to an existing DataFrame is really slow.将数据附加到现有的 DataFrame 真的很慢。
You better created a list of data from soup
, creating a new dataframe, then concat the new data frame to your old one您最好从
soup
中创建一个数据列表,创建一个新的 dataframe,然后将新数据框连接到旧数据框
This is a quick benchmark, using an empty df
for each case.这是一个快速基准测试,每种情况都使用一个空的
df
。 In your real code, df
should be your existing dataframe:在您的真实代码中,
df
应该是您现有的 dataframe:
# setup some sample data
headers = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7',
'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14',
'col15', 'col16', 'col17']
raw_data = 'Mon 11/11,SA,100,31,3-5,60.0,1-3,33.3,1-2,50.0,10,4,0,1,1,2,8'.split(",")
row_dict_data = dict(zip(headers, raw_data))
# append
%%time
df = pd.DataFrame(columns=headers)
for i in range(100):
df = df.append([row_dict_data])
# CPU times: user 258 ms, sys: 4.82 ms, total: 263 ms
# Wall time: 261 ms
# new dataframe
%%time
df = pd.DataFrame(columns=headers)
df2 = pd.DataFrame([raw_data for i in range(100)], columns=headers)
df3 = pd.concat([df, df2], sort=False)
# CPU times: user 7.03 ms, sys: 1.16 ms, total: 8.2 ms
# Wall time: 7.19 ms
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.