[英]Python converting csv files to dataframes
I have a large csv file containing data like: 我有一个很大的csv文件,其中包含以下数据:
2018-09, 100, A, 2018-10, 50, M, 2018-11, 69, H,....
and so on. 等等。 (continuous stream without separate rows)
(连续流,没有单独的行)
I would want to convert it into dataframe, which would look something like 我想将其转换为数据框,看起来像
Col1 Col2 Col3
2018-09 100 A
2018-10 50 M
2018-11 69 H
This is a simplified version of the actual data. 这是实际数据的简化版本。 Please advice what would be the best way to approach it.
请提出什么最好的方法。
Edit: To clarify, my csv file doesn't have separate lines for each row. 编辑:澄清一下,我的csv文件没有每一行的单独行。 All the data is on one row.
所有数据都在一行上。
One solution is to split your single row into chunks via the csv
module and this algorithm , then feed to pd.DataFrame
constructor. 一种解决方案是通过
csv
模块和此算法将单行拆分为多个块,然后将其馈送到pd.DataFrame
构造函数。 Note your dataframe will be of dtype object
, so you'll have to cast numeric series types explicitly afterwards. 请注意,您的数据框将是dtype
object
,因此之后必须显式转换数字系列类型。
from io import StringIO
import pandas as pd
import csv
x = StringIO("""2018-09, 100, A, 2018-10, 50, M, 2018-11, 69, H""")
# define chunking algorithm
def chunks(L, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(L), n):
yield L[i:i + n]
# replace x with open('file.csv', 'r')
with x as fin:
reader = csv.reader(fin, skipinitialspace=True)
data = list(chunks(next(iter(reader)), 3))
# read dataframe
df = pd.DataFrame(data)
print(df)
0 1 2
0 2018-09 100 A
1 2018-10 50 M
2 2018-11 69 H
data = pd.read_csv('tmp.txt', sep=',\s *', header=None).values
pd.DataFrame(data.reshape(-1, 3), columns=['Col1', 'Col2', 'Col3'])
returns 回报
Col1 Col2 Col3
0 2018-09 100 A
1 2018-10 50 M
2 2018-11 69 H
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.