Python将CSV文件转换为数据框

Question

I have a large csv file containing data like: 我有一个很大的csv文件，其中包含以下数据：

2018-09, 100, A, 2018-10, 50, M, 2018-11, 69, H,....

and so on. 等等。 (continuous stream without separate rows) （连续流，没有单独的行）

I would want to convert it into dataframe, which would look something like 我想将其转换为数据框，看起来像

Col1     Col2  Col3
2018-09  100   A
2018-10  50    M
2018-11  69    H

This is a simplified version of the actual data. 这是实际数据的简化版本。 Please advice what would be the best way to approach it. 请提出什么最好的方法。

Edit: To clarify, my csv file doesn't have separate lines for each row. 编辑：澄清一下，我的csv文件没有每一行的单独行。 All the data is on one row. 所有数据都在一行上。

Answer 1

One solution is to split your single row into chunks via the csv module and this algorithm , then feed to pd.DataFrame constructor. 一种解决方案是通过csv模块和此算法将单行拆分为多个块，然后将其馈送到pd.DataFrame构造函数。 Note your dataframe will be of dtype object , so you'll have to cast numeric series types explicitly afterwards. 请注意，您的数据框将是dtype object ，因此之后必须显式转换数字系列类型。

from io import StringIO
import pandas as pd
import csv

x = StringIO("""2018-09, 100, A, 2018-10, 50, M, 2018-11, 69, H""")

# define chunking algorithm
def chunks(L, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(L), n):
        yield L[i:i + n]

# replace x with open('file.csv', 'r')
with x as fin:
    reader = csv.reader(fin, skipinitialspace=True)
    data = list(chunks(next(iter(reader)), 3))

# read dataframe
df = pd.DataFrame(data)

print(df)

         0    1  2
0  2018-09  100  A
1  2018-10   50  M
2  2018-11   69  H

Answer 2

data = pd.read_csv('tmp.txt', sep=',\s *', header=None).values
pd.DataFrame(data.reshape(-1, 3), columns=['Col1', 'Col2', 'Col3'])

returns 回报

      Col1 Col2 Col3
0  2018-09  100    A
1  2018-10   50    M
2  2018-11   69    H

Python将CSV文件转换为数据框

问题描述

2 个解决方案

解决方案1
3 2018-11-09 17:20:03

解决方案2
1 2018-11-09 17:29:48

Python将CSV文件转换为数据框

问题描述

2 个解决方案

解决方案1 3 2018-11-09 17:20:03

解决方案2 1 2018-11-09 17:29:48

解决方案1
3 2018-11-09 17:20:03

解决方案2
1 2018-11-09 17:29:48