简体   繁体   中英

Reading csv containing a list in Pandas

I'm trying to read this csv into pandas

HK,"[u'5328.1', u'5329.3', '2013-12-27 13:58:57.973614']"
HK,"[u'5328.1', u'5329.3', '2013-12-27 13:58:59.237387']"
HK,"[u'5328.1', u'5329.3', '2013-12-27 13:59:00.346325']"

As you can see there are only 2 columns and the second one is a list, is there a way to interpret it correctly ( meaning reading the values in the list as columns) when using pd.read_csv() with arguments ?

thank you

One option is to use ast.literal_eval as converter:

>>> import ast
>>> df = pd.read_clipboard(header=None, quotechar='"', sep=',', 
...                   converters={1:ast.literal_eval})
>>> df
    0                                             1
0  HK  [5328.1, 5329.3, 2013-12-27 13:58:57.973614]
1  HK  [5328.1, 5329.3, 2013-12-27 13:58:59.237387]
2  HK  [5328.1, 5329.3, 2013-12-27 13:59:00.346325]

And convert those lists to a DataFrame if needed, for example with:

>>> df = pd.DataFrame.from_records(df[1].tolist(), index=df[0],
...                           columns=list('ABC')).reset_index()
>>> df['C'] = pd.to_datetime(df['C'])
>>> df
    0       A       B                          C
0  HK  5328.1  5329.3 2013-12-27 13:58:57.973614
1  HK  5328.1  5329.3 2013-12-27 13:58:59.237387
2  HK  5328.1  5329.3 2013-12-27 13:59:00.346325
df['new_column'] = df['column'].apply(lambda x: ast.literal_eval(x))

只需在包含列表作为字符串的列上运行上面的代码。

Based alko's answer, you can use the df.apply() function for the first part to read the actual data in the list string:

 >>> df = pd.read_clipboard(header=None,sep=',')
 >>> df
     0                                                  1
  0  HK  [u'5328.1', u'5329.3', '2013-12-27 13:58:57.97...
  1  HK  [u'5328.1', u'5329.3', '2013-12-27 13:58:59.23...
  2  HK  [u'5328.1', u'5329.3', '2013-12-27 13:59:00.34...
 >>> df[1] = df[1].apply(eval)
 >>> df
     0                                             1
  0  HK  [5328.1, 5329.3, 2013-12-27 13:58:57.973614]
  1  HK  [5328.1, 5329.3, 2013-12-27 13:58:59.237387]
  2  HK  [5328.1, 5329.3, 2013-12-27 13:59:00.346325]

use .strip() in python.

with open(csvfile, 'r')as infile:
    reader = csv.reader(infile)
    for row in reader:
        col1 = row[0]
        col2 = row[1:].strip("[]")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM