如何转换在线 Txt 文件 padas Dataframe

Question

Im using requests and beautiful soup to navigate and download data from the Census Webpage.我使用请求和美丽的汤从人口普查网页导航和下载数据。 Im able to get the data into a result object, and if i want a soup object, but can not seem to convert it into a dataframe so that it can be appended with each of the other files.我能够将数据放入结果对象中，如果我想要一个汤对象，但似乎无法将其转换为数据框，以便可以将其附加到其他每个文件中。 It is stored online as a .txt file.它以 .txt 文件的形式在线存储。

from bs4 import BeautifulSoup
import pandas as pd
import csv
import requests 
from json import loads
from bs4.dammit import EncodingDetector 
url = 'https://www2.census.gov/econ/bps/Place/West%20Region/'
parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = requests.get(url)
http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
encoding = html_encoding or http_encoding
region_soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)
df = DataFrame()
for link in region_soup.find_all('a', href=True):
    links = str(link['href'])
    print(links)
    if links[-4:] == ".txt":
        result = requests.get(url + links).text
        df.append(pd.read_csv(result), ignore_index = True)

How do I convert the requests object into a dataframe, and define the column names etc如何将请求对象转换为数据框，并定义列名等

Answer 1

Off the bat, you import pandas as pd so you need use that when calling the DataFrame() method.马上，您将pandas导入为pd ，因此在调用DataFrame()方法时需要使用它。 Secondly, pandas is not parsing the text into a csv table.其次，pandas 不会将文本解析为 csv 表。 It would require a tad more manipulation to read in that text.阅读该文本需要更多操作。 Pandas can actually just read in the csv from a url though, so just do that directly.不过，Pandas 实际上可以从 url 读取 csv，所以直接这样做。

Finally you need to store the appended dataframe.最后，您需要存储附加的数据框。 So change所以改变

df.append(pd.read_csv(result), ignore_index = True)

to至

df = df.append(pd.read_csv(result), ignore_index = True)

Code:代码：

from bs4 import BeautifulSoup
import pandas as pd
import csv
import requests 
from json import loads
from bs4.dammit import EncodingDetector 


url = 'https://www2.census.gov/econ/bps/Place/West%20Region/'
parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = requests.get(url)
http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
encoding = html_encoding or http_encoding
region_soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)
df = pd.DataFrame()
for link in region_soup.find_all('a', href=True):
    links = str(link['href'])
    print(links)
    if links[-4:] == ".txt":
        result = pd.read_csv(url + links)
        df = df.append(result, ignore_index = True)

Note:笔记：

You will get The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.你会得到The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

So I'd rather:所以我宁愿：

df_list = []
for link in region_soup.find_all('a', href=True):
    links = str(link['href'])
    print(links)
    if links[-4:] == ".txt":
        result = pd.read_csv(url + links)
        df_list.append(result)
        
df = pd.concat(df_list, ignore_index=True)

如何转换在线 Txt 文件 padas Dataframe

问题描述

1 个解决方案

解决方案1
0 2022-06-16 09:13:16

如何转换在线 Txt 文件 padas Dataframe

问题描述

1 个解决方案

解决方案1 0 2022-06-16 09:13:16

解决方案1
0 2022-06-16 09:13:16