简体   繁体   English

迭代循环并将列表添加到新行或新列中的数据框

[英]Iterate over loop and adding list to dataframe in new row or new column

I'm sure this is simple but I'm quite new to Python.我确定这很简单,但我对 Python 还是很陌生。 I have trouble how to add a list to a dataframe column or row after each iteration of the loop.我在每次循环迭代后如何将列表添加到数据框列或行时遇到问题。 I want to loop through a list of around hundred URLs with the outer for-loop and extract the data with the inner loop.我想使用外部 for 循环遍历大约一百个 URL 的列表,并使用内部循环提取数据。 Every time每次

With the code now I can create a dataframe that appends all lists together to one column or one row in the dataframe.现在使用代码,我可以创建一个数据框,将所有列表一起附加到数据框中的一列或一行。 But I want every iteration of the inner loop seperately in a new colum or row of the dataframe.但是我希望在数据帧的新列或行中单独进行内循环的每次迭代。

list_rows = [] 
for x in link_href_list: 
    urllib.request.urlopen(x)
    html = urlopen(x)
    bs = BeautifulSoup(html, "lxml")    
    table=bs.find('tbody')
    rows = table.tr.next_siblings

    for row in rows:
        a=row.find('td').get_text().strip()
        list_rows.append(a)
list_rows.to_frame()

Unfortunately the lists of the inner loop will have different lengths!不幸的是,内循环的列表会有不同的长度! maybe someone has a simple solution or a hint what I could change?也许有人有一个简单的解决方案或提示我可以改变什么? Thanks!谢谢!

I assume you meant every iteration of the outer loop in a new "row".我假设你的意思是在一个新的“行”中外循环的每次迭代。 This would create a 2 dimensional array (list) as a result, for each element in link_href_list you would get a new "row".因此,这将创建一个二维数组(列表),对于 link_href_list 中的每个元素,您将获得一个新的“行”。 Although, I have no idea what the to_frame() method is, I assume it is a printout.虽然我不知道 to_frame() 方法是什么,但我认为它是一个打印输出。

list_columns = [] 
for x in link_href_list: 
    urllib.request.urlopen(x)
    html = urlopen(x)
    bs = BeautifulSoup(html, "lxml")    
    table=bs.find('tbody')
    rows = table.tr.next_siblings
    list_rows = []

    for row in rows:
        a=row.find('td').get_text().strip()
        list_rows.append(a)
    list_columns.append(list_rows)
list_columns.DataFrame()

Edit: If the to_frame is the pandas DataFrame thing, i am not entirely sure how will it handle different lengths.编辑:如果 to_frame 是 pandas DataFrame 的东西,我不完全确定它将如何处理不同的长度。 I will check in a couple, but there is a way around that as well.我会登记一对夫妇,但也有办法解决这个问题。 It seems that a very simle answer on how to import different length lists is not at hand and finding the longest list and adjust the pandas import accordingly and make the lists of equal length in a new loop.似乎没有关于如何导入不同长度列表的非常简单的答案,并找到最长的列表并相应地调整熊猫导入并在新循环中制作等长的列表。

One way to do it would be just creating an empty list outside, then appending inside the loop, you you have tried.一种方法是在外面创建一个空列表,然后在循环内附加,您已经尝试过了。 You're issue seems to be creating the Dataframe.您的问题似乎是创建数据框。 I would have just commented under the answer above for the reference of others, however I cannot leave comments at this rep.我会在上面的答案下发表评论以供其他人参考,但是我不能在此代表处发表评论。

Define your columns and then create the dataframe using from_records定义您的列,然后使用from_records创建数据from_records

 import pandas as pd
 cols = ['col_1','col_2',...,'col_n']
 df = pd.DataFrame.from_records(list_cols, columns=cols)

The answer above creates a list ( list_columns = [] ) which then tries to convert to a Dataframe.上面的答案创建了一个列表( list_columns = [] ),然后尝试转换为 Dataframe。 This should throw the following:这应该抛出以下内容:

 Traceback (most recent call last):
   File "<ipython-input-396-dc539f26ae12>", line 1, in <module>
    list_columns.Dataframe()

 AttributeError: 'list' object has no attribute 'Dataframe'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM