[英]Creating an empty dataframe or List with column names then add data by column names
[英]Turning .txt file (empty dataframe with all data as column names) into dataframe
在顯示 a.txt 文件的內容時,我得到了這個 output:
Empty DataFrame
Columns: [[{'city': 'Zurich, Switzerland', 'cost': '135.74'}, {'city': 'Basel,
Switzerland', 'cost': '135.36'}, {'city': 'Lausanne, Switzerland', 'cost': '131.24'},
{'city': 'Lugano, Switzerland', 'cost': '130.32'}, {'city': 'Geneva, Switzerland',
'cost': '130.14'}, {'city': 'Bern, Switzerland', 'cost': '125.86'}, {'city': 'Tromso,
Norway', 'cost': '114.81'}, {'city': 'Stavanger, Norway', 'cost': '108.38'} etc.]
有誰知道如何將其轉換為具有“城市”和“成本”列的數據框? Pandas.DataFrame() 不起作用,它輸出與原始文件相同的字典列表。
如果您已經有一個具有相同鍵的 dicts 列表,您應該能夠做到這一點:
pandas.__version__ ->> '1.1.5'
dctlst = [{"a": 1, "b":1}, {"a":2, "b":2}]
from pandas import DataFrame
df = DataFrame(dctlst)
df
a b
0 1 1
1 2 2
否則,您可以使用 json 從中制作字典列表。
但首先你必須清理一下文本(閱讀后):
with open(r"C:\Users\User\Desktop\toDF.txt", "r") as txt:
txt = txt.read()
txt = txt.replace("Columns: [", "").replace("etc.", "").replace("\n", "")
如果您不移除額外的開口支架,其他東西 json 將不會加載它。 此外, json 需要雙引號,所以用雙引號替換單引號:
txt = txt.replace("'", '"')
txt
'[{"city": "Zurich, Switzerland", "cost": "135.74"}, {"city": "Basel,
Switzerland", "cost": "135.36"}, {"city": "Lausanne, Switzerland", "cost": "131.24"},
{"city": "Lugano, Switzerland", "cost": "130.32"}, {"city": "Geneva, Switzerland",
"cost": "130.14"}, {"city": "Bern, Switzerland", "cost": "125.86"}, {"city": "Tromso,
Norway", "cost": "114.81"}, {"city": "Stavanger, Norway", "cost": "108.38"} ]'
現在它看起來像一個正確的字典列表,可以由 json.loads 轉換
from json import loads
from pandas import DataFrame
lst = loads(txt)
df = DataFrame(lst)
df
city cost
0 Zurich, Switzerland 135.74
1 Basel, Switzerland 135.36
2 Lausanne, Switzerland 131.24
3 Lugano, Switzerland 130.32
4 Geneva, Switzerland 130.14
5 Bern, Switzerland 125.86
6 Tromso, Norway 114.81
7 Stavanger, Norway 108.38
如果您希望帶有城市的行看起來更漂亮,您可以查看字符串操作: pandas 字符串操作
這會起作用,但顯然取決於你想要什么:
df["city"] = df["city"].astype("string").str.replace(" ","")
df
city cost
0 Zurich,Switzerland 135.74
1 Basel,Switzerland 135.36
2 Lausanne,Switzerland 131.24
3 Lugano,Switzerland 130.32
4 Geneva,Switzerland 130.14
5 Bern,Switzerland 125.86
6 Tromso,Norway 114.81
7 Stavanger,Norway 108.38
這將使它變得更好:
df[["city", "country"]] = df["city"].str.split(",", expand= True)
df
city cost country
0 Zurich 135.74 Switzerland
1 Basel 135.36 Switzerland
2 Lausanne 131.24 Switzerland
3 Lugano 130.32 Switzerland
4 Geneva 130.14 Switzerland
5 Bern 125.86 Switzerland
6 Tromso 114.81 Norway
7 Stavanger 108.38 Norway
url = "https://www.numbeo.com/cost-of-living/region_rankings_current.jsp?region=150"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
table = BeautifulSoup(str(soup.find_all("table", id="t2")[0]), "html.parser")
table_body = BeautifulSoup(str(table.find_all("tbody")[0]), "html.parser")
findings = table_body.find_all('tr')
living_costs= []
for finding in findings:
city = finding.find("a", class_="discreet_link").string
cost = finding.find("td", style ="text-align: right").string
living_costs.append({"city": city, "cost": cost})
for dicti in living_costs:
for word in dicti:
word.replace("Columns: [", "").replace("\n", "")
df = pd.DataFrame(living_costs)
print(df)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.