I am iteratively reading a log file and parsing/extracting data and would like to append that to a dataframe.
df = pd.DataFrame([], columns=['item','price','qty','sold'])
with open("mylogfile") as fh:
for line in fh:
data = extract_data(line)
df.append(data) ## ?
def extract_data(line):
# parse and get values as a list
return list_values
Update: I get the following error: ValueError: Shape of passed values is (0, 0), indices imply (4, 0)
Also, My log file has data in the format
item,2,price,4.5,qty,17,sold,11
item,12,price,14.5,qty,7,sold,4
item,2,price,4.5,qty,13,sold,2
Edit2: (the actual file is, and i am only interested in the 'item' line
item,2,price,4.5,qty,17,sold,11
a,12,b,14,c,18,d,15,e16
item,12,price,14.5,qty,7,sold,4
x,4,y,1,z,81
a,12,b,14,c,18,d,15,e16
a,14,b,11,c,8,d,51,e26
item,2,price,4.5,qty,13,sold,2
x,14,y,11,z,8
Here is a multi-step approach:
In [210]:
# read in as csv, set header to None
df = pd.read_csv(io.StringIO(t), header=None)
df
Out[210]:
0 1 2 3 4 5 6 7
0 item 2 price 4.5 qty 17 sold 11
1 item 12 price 14.5 qty 7 sold 4
2 item 2 price 4.5 qty 13 sold 2
In [213]:
# extract the header names from the first row
col_names = df.iloc[0][0::2]
print(col_names)
# extract the data columns we will use later to filter the df
col_list = df.columns[1::2]
col_list
0 item
2 price
4 qty
6 sold
Name: 0, dtype: object
Out[213]:
Int64Index([1, 3, 5, 7], dtype='int64')
In [214]:
# now filter the df to the columns that actually have your data
df = df[col_list]
# assign the column names
df.columns = col_names
df
Out[214]:
0 item price qty sold
0 2 4.5 17 11
1 12 14.5 7 4
2 2 4.5 13 2
So I would read it as a csv using read_csv
, don't copy my code verbatim, substitute io.StringIO(t)
with the path to your text file.
UPDATE
A better approach would be to read a single line in, extract the header names and cols of interest and then read the whole file in again but select just those columns of interest and pass the name of columns in:
In [217]:
df = pd.read_csv(io.StringIO(t), header=None, nrows=1)
df
Out[217]:
0 1 2 3 4 5 6 7
0 item 2 price 4.5 qty 17 sold 11
In [218]:
col_names = df.iloc[0][0::2]
print(col_names)
col_list = df.columns[1::2]
col_list
0 item
2 price
4 qty
6 sold
Name: 0, dtype: object
Out[218]:
Int64Index([1, 3, 5, 7], dtype='int64')
In [219]:
df = pd.read_csv(io.StringIO(t), usecols=col_list, names=col_names)
df
Out[219]:
item price qty sold
0 2 4.5 17 11
1 12 14.5 7 4
2 2 4.5 13 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.