用循环中的行填充现有的 dataframe

Question

I tried going over How to build and fill pandas dataframe from for loop?我尝试过如何从for循环构建和填充pandas dataframe？ but cant seem to write my values to my columns.但似乎无法将我的价值观写入我的专栏。

Ultimately I am getting data from a webpage and want to put it into a dataframe.最终，我从网页获取数据并希望将其放入 dataframe 中。

my headers are predefined as:我的标题预定义为：

d1 = pd.DataFrame(columns=['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
        'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17'])

now I have values I get in a for loop, how can I write these rows to each column then repeat back to column 1 to 17 and then next row?现在我有我在 for 循环中得到的值，我怎样才能将这些行写入每一列，然后重复回到第 1 到第 17 列，然后是下一行？

row = soup.find_all('td', attrs = {'class': 'Table__TD'})
for data in row:
    print(data.get_text())

sample output row 1样品 output 第 1 行

Sample output row 2样品 output 行 2

Wed 11/13
@CHA
W119-117
32
1-5
20.0
1-5
20.0
0-0
0.0
3
1
0
1
3
3
3

Expected output预期 output

Any help would be appreciated.任何帮助，将不胜感激。

Answer 1

First we have list for column names:首先，我们有列名列表：

cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
        'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17']

Then list for values:然后列出值：

row = [x.get_text() for x in soup.find_all('td', attrs = {'class': 'Table__TD'})]
print(row)
# ['Mon 11/11', 'SA', '100', '31', '3-5', '60.0', '1-3', '33.3', '1-2', '50.0', '10', '4', '0', '1', '1', '2', '8']

Then we can zip the columns and the values together, then append to the dataframe:然后我们可以将 zip 的列和值放在一起，然后 append 到 dataframe：

d1 = d1.append(dict(zip(cols, row)), ignore_index=True)
print(d1)
#         col1 col2 col3 col4 col5  col6 col7  col8 col9 col10 col11 col12  \
# 0  Mon 11/11   SA  100   31  3-5  60.0  1-3  33.3  1-2  50.0    10     4   
# 
#   col13 col14 col15 col16 col17  
# 0     0     1     1     2     8

Answer 2

You can try this,你可以试试这个

import pandas as pd

columns = [
    'col1',
    'col2',
    'col3',
    'col4',
    'col5',
    'col6',
    'col7',
    'col8',
    'col9',
    'col10',
    'col11',
    'col12',
    'col13',
    'col14',
    'col15',
    'col16',
    'col17',
]

# create dataframe
d1 = pd.DataFrame(columns=columns)

full = []

for data in soup.find_all('td', attrs={'class': 'Table__TD'}):
    full.append(data.get_text())

# seperate full list into sub-lists with 17 elements
rows = [full[i: i+17] for i in range(0, len(full), 17)]

# append list of lists structure to dataframe
d1 = d1.append(pd.DataFrame(rows, columns=d1.columns))

Answer 3

Appending data to an existing DataFrame is really slow.将数据附加到现有的 DataFrame 真的很慢。

You better created a list of data from soup , creating a new dataframe, then concat the new data frame to your old one您最好从soup中创建一个数据列表，创建一个新的 dataframe，然后将新数据框连接到旧数据框

This is a quick benchmark, using an empty df for each case.这是一个快速基准测试，每种情况都使用一个空的df 。 In your real code, df should be your existing dataframe:在您的真实代码中， df应该是您现有的 dataframe：

# setup some sample data
headers = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 
           'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14',
           'col15', 'col16', 'col17']
raw_data = 'Mon 11/11,SA,100,31,3-5,60.0,1-3,33.3,1-2,50.0,10,4,0,1,1,2,8'.split(",")
row_dict_data = dict(zip(headers, raw_data))

# append
%%time
df = pd.DataFrame(columns=headers)
for i in range(100):
    df = df.append([row_dict_data])

# CPU times: user 258 ms, sys: 4.82 ms, total: 263 ms
# Wall time: 261 ms


# new dataframe
%%time
df = pd.DataFrame(columns=headers)
df2 = pd.DataFrame([raw_data for i in range(100)], columns=headers)
df3 = pd.concat([df, df2], sort=False)

# CPU times: user 7.03 ms, sys: 1.16 ms, total: 8.2 ms
# Wall time: 7.19 ms

用循环中的行填充现有的 dataframe

问题描述

3 个解决方案

解决方案1
1 2019-11-15 14:13:30

解决方案2
1 已采纳 2019-11-15 23:04:14

解决方案3
1 2019-11-17 10:27:55

用循环中的行填充现有的 dataframe

问题描述

3 个解决方案

解决方案1 1 2019-11-15 14:13:30

解决方案2 1 已采纳 2019-11-15 23:04:14

解决方案3 1 2019-11-17 10:27:55

解决方案1
1 2019-11-15 14:13:30

解决方案2
1 已采纳 2019-11-15 23:04:14

解决方案3
1 2019-11-17 10:27:55