简体   繁体   English

用循环中的行填充现有的 dataframe

[英]filling existing dataframe with rows from loop

I tried going over How to build and fill pandas dataframe from for loop?我尝试过如何从for循环构建和填充pandas dataframe? but cant seem to write my values to my columns.但似乎无法将我的价值观写入我的专栏。

Ultimately I am getting data from a webpage and want to put it into a dataframe.最终,我从网页获取数据并希望将其放入 dataframe 中。

my headers are predefined as:我的标题预定义为:

d1 = pd.DataFrame(columns=['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
        'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17'])

now I have values I get in a for loop, how can I write these rows to each column then repeat back to column 1 to 17 and then next row?现在我有我在 for 循环中得到的值,我怎样才能将这些行写入每一列,然后重复回到第 1 到第 17 列,然后是下一行?

row = soup.find_all('td', attrs = {'class': 'Table__TD'})
for data in row:
    print(data.get_text())

sample output row 1样品 output 第 1 行

Mon 11/11
SA
100
31
3-5
60.0
1-3
33.3
1-2
50.0
10
4
0
1
1
2
8

Sample output row 2样品 output 行 2

Wed 11/13
@CHA
W119-117
32
1-5
20.0
1-5
20.0
0-0
0.0
3
1
0
1
3
3
3

Expected output预期 output

在此处输入图像描述

Any help would be appreciated.任何帮助,将不胜感激。

First we have list for column names:首先,我们有列名列表:

cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
        'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17']

Then list for values:然后列出值:

row = [x.get_text() for x in soup.find_all('td', attrs = {'class': 'Table__TD'})]
print(row)
# ['Mon 11/11', 'SA', '100', '31', '3-5', '60.0', '1-3', '33.3', '1-2', '50.0', '10', '4', '0', '1', '1', '2', '8']

Then we can zip the columns and the values together, then append to the dataframe:然后我们可以将 zip 的列和值放在一起,然后 append 到 dataframe:

d1 = d1.append(dict(zip(cols, row)), ignore_index=True)
print(d1)
#         col1 col2 col3 col4 col5  col6 col7  col8 col9 col10 col11 col12  \
# 0  Mon 11/11   SA  100   31  3-5  60.0  1-3  33.3  1-2  50.0    10     4   
# 
#   col13 col14 col15 col16 col17  
# 0     0     1     1     2     8

You can try this,你可以试试这个

import pandas as pd

columns = [
    'col1',
    'col2',
    'col3',
    'col4',
    'col5',
    'col6',
    'col7',
    'col8',
    'col9',
    'col10',
    'col11',
    'col12',
    'col13',
    'col14',
    'col15',
    'col16',
    'col17',
]

# create dataframe
d1 = pd.DataFrame(columns=columns)

full = []

for data in soup.find_all('td', attrs={'class': 'Table__TD'}):
    full.append(data.get_text())

# seperate full list into sub-lists with 17 elements
rows = [full[i: i+17] for i in range(0, len(full), 17)]

# append list of lists structure to dataframe
d1 = d1.append(pd.DataFrame(rows, columns=d1.columns))

Appending data to an existing DataFrame is really slow.将数据附加到现有的 DataFrame 真的很慢。

You better created a list of data from soup , creating a new dataframe, then concat the new data frame to your old one您最好从soup中创建一个数据列表,创建一个新的 dataframe,然后将新数据框连接到旧数据框

This is a quick benchmark, using an empty df for each case.这是一个快速基准测试,每种情况都使用一个空的df In your real code, df should be your existing dataframe:在您的真实代码中, df应该是您现有的 dataframe:

# setup some sample data
headers = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 
           'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14',
           'col15', 'col16', 'col17']
raw_data = 'Mon 11/11,SA,100,31,3-5,60.0,1-3,33.3,1-2,50.0,10,4,0,1,1,2,8'.split(",")
row_dict_data = dict(zip(headers, raw_data))

# append
%%time
df = pd.DataFrame(columns=headers)
for i in range(100):
    df = df.append([row_dict_data])

# CPU times: user 258 ms, sys: 4.82 ms, total: 263 ms
# Wall time: 261 ms


# new dataframe
%%time
df = pd.DataFrame(columns=headers)
df2 = pd.DataFrame([raw_data for i in range(100)], columns=headers)
df3 = pd.concat([df, df2], sort=False)

# CPU times: user 7.03 ms, sys: 1.16 ms, total: 8.2 ms
# Wall time: 7.19 ms

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM