i have a log file with data from 100 pages from a webscrape script. the.log file are read in log like this:
Title: Canon EF 100mm f/2.8L Macro IS USM
Price: 6�900 kr
Link: https://www.finn.no/bap/forsale/ad.html?finnkode=161065896
21-Oct-19 10:21:14 - Found:
Title: Canon EF 100mm f/2.8L Macro IS USM
Price: 7�500 kr
Link: https://www.finn.no/bap/forsale/ad.html?finnkode=155541389
21-Oct-19 10:21:14 - Found:
Title: Panasonic Lumix G 25mm F1.4 ASPH
Price: 3�200 kr
Link: https://www.finn.no/bap/forsale/ad.html?finnkode=161066674
I would like to import this data and send it to excel like
title price link
canon 100mm 6900kr https
The approach need to be changed if the log file is not in the order you have shown. As the following function will always start to find the Title, Price and Link text and add to a list. To convert to dataframe the all list need to be equal length. Let me know if it works.
def log_to_frame(location="./datalake/file.log"):
with open(location, mode='r', encoding='UTF-8') as f:
title_list = []
price_list = []
link_list = []
for line in f:
if "Title" in line:
title = line.split(": ")[1].rstrip()
title_list.append(title)
elif "Price" in line:
price = line.split(": ")[1].replace("�", "").rstrip()
price_list.append(title)
elif "Link" in line:
link = line.split(": ")[1].rstrip()
link_list.append(title)
else:
pass
main_df = pd.DataFrame({"title": title_list, "price": price_list, "link": link_list})
return main_df
log_df = log_to_frame()
log_df.to_excel("log.xlsx", index=False)
You can load the data into a DataFrame as a normal table and then combine the columns using the DataFrame's log
and reset_index
functions. This assumes that there is only one ":" symbol on each line, separating the "key" column from the "value" column, and that every "record" has a line for every key.
import pandas as pd
p = pd.read_table("table.log", sep=':', header=None)
df = pd.DataFrame()
keys = set(p[0]) # set of all unique keys
for key in keys:
# get all values with the current key and re-index them from 0...n
col_data = p.loc[p[0]==key][1].reset_index(drop=True)
# put this in a new column named after the key
df[key] = col_data
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.