Python将逗号分隔列表转换为pandas数据帧

Question

I am struggling to convert a comma separated list into a multi column (7) data-frame. 我正在努力将逗号分隔列表转换为多列（7）数据帧。

print (type(mylist))

<type 'list'>
Print(mylist)


['AN,2__AAS000,26,20150826113000,-283.000,20150826120000,-283.000',         'AN,2__AE000,26,20150826113000,0.000,20150826120000,0.000',.........

The following creates a frame of a single column: 以下内容创建单个列的框架：

df = pd.DataFrame(mylist)

I have reviewed the inbuilt csv functionality for Pandas, however my csv data is held in a list. 我已经回顾了Pandas的内置csv功能，但是我的csv数据保存在列表中。 How can I simply covert the list into a 7 column data-frame. 我怎样才能简单地将列表转换为7列数据帧。

Thanks in advance. 提前致谢。

Answer 1

You need to split each string in your list: 您需要拆分列表中的每个字符串：

import  pandas as pd

df = pd.DataFrame([sub.split(",") for sub in l])
print(df)

Output: 输出：

   0         1   2               3         4               5         6
0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000
1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000
2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000
3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000
4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000

If you know how many lines to skip in your csv you can do it all with read_csv using skiprows=lines_of_metadata : 如果您知道在csv中要跳过多少行，则可以使用skiprows=lines_of_metadata通过skiprows=lines_of_metadata完成所有skiprows=lines_of_metadata ：

import  pandas as pd

df = pd.read_csv("in.csv",skiprows=3,header=None)
print(df)

Or if each line of the metadata starts with a certain character you can use comment: 或者，如果元数据的每一行都以某个字符开头，您可以使用注释：

df = pd.read_csv("in.csv",header=None,comment="#")

If you need to specify more then one character you can combine itertools.takewhile which will drop lines starting with xxx : 如果你需要指定多个字符，你可以组合itertools.takewhile ，这将删除以xxx开头的行：

import pandas as pd
from itertools import dropwhile
import csv
with open("in.csv") as f:
    f = dropwhile(lambda x: x.startswith("#!!"), f)
    r = csv.reader(f)
    df = pd.DataFrame().from_records(r)

Using your input data adding some lines starting with #!!: 使用输入数据添加一些以＃!!开头的行：

#!! various
#!! metadata
#!! lines
AN,2__AS000,26,20150826113000,-283.000,20150826120000,-283.000
AN,2__A000,26,20150826113000,0.000,20150826120000,0.000
AN,2__AE000,26,20150826113000,-269.000,20150826120000,-269.000
AN,2__AE000,26,20150826113000,-255.000,20150826120000,-255.000
AN,2__AE00,26,20150826113000,-254.000,20150826120000,-254.000

Outputs: 输出：

    0         1   2               3         4               5         6
0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000
1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000
2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000
3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000
4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000

Answer 2

you can covert the list into a 7 column data-frame in the following way: 您可以通过以下方式将列表转换为7列数据框：

import pandas as pd

df = pd.read_csv(filename, sep=',')

Answer 3

I encounter a similar problem. 我遇到了类似的问题。 I solve it by this way. 我这样解决了。

def lrsplit(line):
    left, *_ , right = line.split('-')
    mid = '-'.join(_)
    return left, mid, right.strip()
example = pd.DataFrame(lrsplit(line) for line in open("example.csv"))
example.columns = ['location', 'position', 'company']

Result: 结果：

    location    position    company
0   india   manager intel
1   india   sales-manager   amazon
2   banglore    ccm- head - county  jp morgan

Python将逗号分隔列表转换为pandas数据帧

问题描述

3 个解决方案

解决方案1
17 已采纳 2015-08-26 11:01:38

解决方案2
0 2019-03-16 08:23:35

解决方案3
-1 2018-08-08 05:29:33

Python将逗号分隔列表转换为pandas数据帧

问题描述

3 个解决方案

解决方案1 17 已采纳 2015-08-26 11:01:38

解决方案2 0 2019-03-16 08:23:35

解决方案3 -1 2018-08-08 05:29:33

解决方案1
17 已采纳 2015-08-26 11:01:38

解决方案2
0 2019-03-16 08:23:35

解决方案3
-1 2018-08-08 05:29:33