简体   繁体   中英

How to append a list as a row in pandas.DataFrame()?

I am iteratively reading a log file and parsing/extracting data and would like to append that to a dataframe.

df = pd.DataFrame([], columns=['item','price','qty','sold'])
with open("mylogfile") as fh:
    for line in fh:
        data = extract_data(line)
        df.append(data) ## ?


def extract_data(line):
   # parse and get values as a list
   return list_values

Update: I get the following error: ValueError: Shape of passed values is (0, 0), indices imply (4, 0)

Also, My log file has data in the format

item,2,price,4.5,qty,17,sold,11
item,12,price,14.5,qty,7,sold,4
item,2,price,4.5,qty,13,sold,2

Edit2: (the actual file is, and i am only interested in the 'item' line

item,2,price,4.5,qty,17,sold,11
a,12,b,14,c,18,d,15,e16
item,12,price,14.5,qty,7,sold,4
x,4,y,1,z,81
a,12,b,14,c,18,d,15,e16
a,14,b,11,c,8,d,51,e26
item,2,price,4.5,qty,13,sold,2
x,14,y,11,z,8

Here is a multi-step approach:

In [210]:
# read in as csv, set header to None
df = pd.read_csv(io.StringIO(t), header=None)
df

Out[210]:
      0   1      2     3    4   5     6   7
0  item   2  price   4.5  qty  17  sold  11
1  item  12  price  14.5  qty   7  sold   4
2  item   2  price   4.5  qty  13  sold   2

In [213]:
# extract the header names from the first row
col_names = df.iloc[0][0::2]
print(col_names)
# extract the data columns we will use later to filter the df
col_list = df.columns[1::2]
col_list
0     item
2    price
4      qty
6     sold
Name: 0, dtype: object

Out[213]:
Int64Index([1, 3, 5, 7], dtype='int64')

In [214]:
# now filter the df to the columns that actually have your data
df = df[col_list]
# assign the column names
df.columns = col_names
df

Out[214]:
0  item  price  qty  sold
0     2    4.5   17    11
1    12   14.5    7     4
2     2    4.5   13     2

So I would read it as a csv using read_csv , don't copy my code verbatim, substitute io.StringIO(t) with the path to your text file.

UPDATE

A better approach would be to read a single line in, extract the header names and cols of interest and then read the whole file in again but select just those columns of interest and pass the name of columns in:

In [217]:

df = pd.read_csv(io.StringIO(t), header=None, nrows=1)
df
Out[217]:
      0  1      2    3    4   5     6   7
0  item  2  price  4.5  qty  17  sold  11
In [218]:

col_names = df.iloc[0][0::2]
print(col_names)
col_list = df.columns[1::2]
col_list
0     item
2    price
4      qty
6     sold
Name: 0, dtype: object
Out[218]:
Int64Index([1, 3, 5, 7], dtype='int64')
In [219]:

df = pd.read_csv(io.StringIO(t), usecols=col_list, names=col_names)
df
Out[219]:
   item  price  qty  sold
0     2    4.5   17    11
1    12   14.5    7     4
2     2    4.5   13     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM