Turning.txt 文件（空 dataframe，所有數據作為列名）轉換為 dataframe

Question

在顯示 a.txt 文件的內容時，我得到了這個 output：

        Empty DataFrame
        Columns: [[{'city': 'Zurich, Switzerland', 'cost': '135.74'}, {'city': 'Basel, 
        Switzerland', 'cost': '135.36'}, {'city': 'Lausanne, Switzerland', 'cost': '131.24'}, 
        {'city': 'Lugano, Switzerland', 'cost': '130.32'}, {'city': 'Geneva, Switzerland', 
        'cost': '130.14'}, {'city': 'Bern, Switzerland', 'cost': '125.86'}, {'city': 'Tromso, 
        Norway', 'cost': '114.81'}, {'city': 'Stavanger, Norway', 'cost': '108.38'} etc.]

有誰知道如何將其轉換為具有“城市”和“成本”列的數據框？ Pandas.DataFrame() 不起作用，它輸出與原始文件相同的字典列表。

Answer 1

如果您已經有一個具有相同鍵的 dicts 列表，您應該能夠做到這一點：

pandas.__version__ ->>  '1.1.5'

dctlst = [{"a": 1, "b":1}, {"a":2, "b":2}]
from pandas import DataFrame
df = DataFrame(dctlst)
df
   a  b
0  1  1
1  2  2

否則，您可以使用 json 從中制作字典列表。

但首先你必須清理一下文本（閱讀后）：

with open(r"C:\Users\User\Desktop\toDF.txt", "r") as txt:
        txt = txt.read()
txt = txt.replace("Columns: [", "").replace("etc.", "").replace("\n", "")

如果您不移除額外的開口支架，其他東西 json 將不會加載它。 此外， json 需要雙引號，所以用雙引號替換單引號：

txt = txt.replace("'", '"')
txt
'[{"city": "Zurich, Switzerland", "cost": "135.74"}, {"city": "Basel,         
Switzerland", "cost": "135.36"}, {"city": "Lausanne, Switzerland", "cost": "131.24"},         
{"city": "Lugano, Switzerland", "cost": "130.32"}, {"city": "Geneva, Switzerland",         
"cost": "130.14"}, {"city": "Bern, Switzerland", "cost": "125.86"}, {"city": "Tromso,         
Norway", "cost": "114.81"}, {"city": "Stavanger, Norway", "cost": "108.38"} ]'

現在它看起來像一個正確的字典列表，可以由 json.loads 轉換

from json import loads
from pandas import DataFrame

lst = loads(txt)
df = DataFrame(lst)

df
                         city    cost
0         Zurich, Switzerland  135.74
1  Basel,         Switzerland  135.36
2       Lausanne, Switzerland  131.24
3         Lugano, Switzerland  130.32
4         Geneva, Switzerland  130.14
5           Bern, Switzerland  125.86
6      Tromso,         Norway  114.81
7           Stavanger, Norway  108.38

如果您希望帶有城市的行看起來更漂亮，您可以查看字符串操作： pandas 字符串操作

這會起作用，但顯然取決於你想要什么：

df["city"] = df["city"].astype("string").str.replace(" ","")

    df
               city    cost
0    Zurich,Switzerland  135.74
1     Basel,Switzerland  135.36
2  Lausanne,Switzerland  131.24
3    Lugano,Switzerland  130.32
4    Geneva,Switzerland  130.14
5      Bern,Switzerland  125.86
6         Tromso,Norway  114.81
7      Stavanger,Norway  108.38

這將使它變得更好：

df[["city", "country"]] = df["city"].str.split(",", expand= True)

df
        city    cost      country
0     Zurich  135.74  Switzerland
1      Basel  135.36  Switzerland
2   Lausanne  131.24  Switzerland
3     Lugano  130.32  Switzerland
4     Geneva  130.14  Switzerland
5       Bern  125.86  Switzerland
6     Tromso  114.81       Norway
7  Stavanger  108.38       Norway

Answer 2

url = "https://www.numbeo.com/cost-of-living/region_rankings_current.jsp?region=150"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")
table = BeautifulSoup(str(soup.find_all("table", id="t2")[0]), "html.parser")
table_body = BeautifulSoup(str(table.find_all("tbody")[0]), "html.parser")

findings = table_body.find_all('tr')

living_costs= []

for finding in findings:
    city = finding.find("a", class_="discreet_link").string
    cost = finding.find("td", style ="text-align: right").string
    living_costs.append({"city": city, "cost": cost})

for dicti in living_costs:
    for word in dicti:
        word.replace("Columns: [", "").replace("\n", "")

df = pd.DataFrame(living_costs)
print(df)

Turning.txt 文件（空 dataframe，所有數據作為列名）轉換為 dataframe

問題描述

2 個解決方案

解決方案1
0 2021-02-16 14:51:30

解決方案2
0 2021-03-29 07:20:36

Turning.txt 文件（空 dataframe，所有數據作為列名）轉換為 dataframe

問題描述

2 個解決方案

解決方案1 0 2021-02-16 14:51:30

解決方案2 0 2021-03-29 07:20:36

解決方案1
0 2021-02-16 14:51:30

解決方案2
0 2021-03-29 07:20:36