简体   繁体   English

如何读取 csv 文件,其中 pandas 在一列中有多个值?

[英]How do I read a csv File with pandas that has multiple values in one column?

I have a csv file that looks like this:我有一个看起来像这样的 csv 文件:

timestamp (int), array(value1,value2,...), identifier (string)

The arrays with the values inside are exactly written like this: arrays 里面的值是这样写的:

List(value1, value2, value3)

where the values can be written in the formats值可以写成格式

1.23E4, -123456.78910

So what I want to have eventually is a dataframe with a timestamp and an identifier, but multiple values at each point.所以我最终想要的是一个带有时间戳和标识符的 dataframe,但每个点都有多个值。

I have no idea how to read this csv file with pandas in python.我不知道如何在 python 中阅读带有 pandas 的 csv 文件。 If I just try如果我只是尝试

pd.read_csv("myFilePath")

it gives me它给了我

pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 3, saw 76

Obviously I didnt tell pandas how to read that file properly and honestly I dont quite know hot to.显然我没有告诉 pandas 如何正确地读取该文件,老实说我不太了解。 Can somebody help me?有人可以帮助我吗? Thanks a lot非常感谢

You could just read in all the data where each row was represented in a single column then extract what you need from there.您可以只读取每行在单个列中表示的所有数据,然后从那里提取您需要的内容。

I think your data may look like this我认为您的数据可能看起来像这样

sim_csv = io.StringIO(
'''2022-07-20,List(1.23E4, -123456.78910),ID001
2022-07-21,List(2.23E4, -223456.78910),ID002
2022-07-22,List(3.23E4, -323456.78910),ID003
2022-07-23,List(4.23E4, -423456.78910, 55),ID004''')

Read it all in - single column全部阅读 - 单列

df = pd.read_fwf(sim_csv, widths=[999999], header=None)
print(df)

                                                  0
0      2022-07-20,List(1.23E4, -123456.78910),ID001
1      2022-07-21,List(2.23E4, -223456.78910),ID002
2      2022-07-22,List(3.23E4, -323456.78910),ID003
3  2022-07-23,List(4.23E4, -423456.78910, 55),ID004

Extract what you need提取你需要的东西

dfs = df[0].str.extract(r'(?P<timestamp>.+),.*List\((?P<raw_values>.+)\),(?P<id>.+)')
print(dfs)

    timestamp                 raw_values     id
0  2022-07-20      1.23E4, -123456.78910  ID001
1  2022-07-21      2.23E4, -223456.78910  ID002
2  2022-07-22      3.23E4, -323456.78910  ID003
3  2022-07-23  4.23E4, -423456.78910, 55  ID004

That leave your in a list that is really still a string and not a Python list object.这使您的列表实际上仍然是一个字符串,而不是 Python 列表 object。 Not sure what you want to do with the values from there.不确定您想对那里的值做什么。 Maybe make them into a real list.也许让他们成为一个真正的清单。

Values to a real list真实列表的值

dfs['list_values'] = dfs['raw_values'].str.split(',')
print(dfs)

    timestamp                 raw_values     id                    list_values
0  2022-07-20      1.23E4, -123456.78910  ID001       [1.23E4,  -123456.78910]
1  2022-07-21      2.23E4, -223456.78910  ID002       [2.23E4,  -223456.78910]
2  2022-07-22      3.23E4, -323456.78910  ID003       [3.23E4,  -323456.78910]
3  2022-07-23  4.23E4, -423456.78910, 55  ID004  [4.23E4,  -423456.78910,  55]

At this point the values in the list are actually still strings.此时列表中的值实际上仍然是字符串。 There are a lot of things you could do from here depending on what you are trying to accomplish.根据您要完成的工作,您可以从这里做很多事情。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将多个csv文件读入熊猫并在一个csv文件中输出 - How to read multiple csv files into pandas and output in one csv file 由于多个 csv 文件,如何在一个 csv 文件的一列中对某些值进行排序? - how can I sort certain values in one column of one csv file as a result of multiple csv file? 如何从多个分隔符值中将pandas中的csv文件读取为两列 - How to read csv file in pandas as two column from multiple delimiter values 如何将已从多个 XML 文件读取的标签/子标签值存储到单个 CSV 文件中 - How do i store tag/child tag values that has been read from multiple XML files into a single CSV file 使用熊猫,如何将一个 csv 文件列转换为列表,然后使用创建的列表过滤不同的 csv? - Using pandas, how do I turn one csv file column into list and then filter a different csv with the created list? 读取熊猫csv文件时如何重命名值 - How do I rename the values in when I read pandas csv file 如果我想通过同一 csv 文件中的另一列将 pandas 中 csv 文件的一部分中的数据拆分,我该怎么做? - If I want to split data in one part of a csv file in pandas by another column in the same csv file how do I do that? 如何使用Pandas在CSV文件中创建新列,并根据这些列中的值添加数据 - How do I create a new column in a csv file using Pandas, and add data depending on the values in those columns 如何使用python中的pandas从csv文件读取? - How do I read from a csv file using pandas in python? 如何读取带有 pandas 的大型 csv 文件? - How do I read a large csv file with pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM