简体   繁体   English

如何在.csv文件中分离数据?

[英]How to go about separating data in a .csv file?

I have a .csv file that includes a long line of data. 我有一个.csv文件,其中包含一长串数据。 The data looks something along the lines of: 数据看起来类似于:

Name,Gender,Age John Smith,M,23 Ashley Jones,F,18 James Smith Johns,M,20

My end goal is to separate all of the data so I can put them into rows. 我的最终目标是分离所有数据,以便将它们放入行中。 My intended result would be: 我的预期结果将是:

['Name','Gender','Age','John Smith','M','23','Ashley Jones','F','18','James Smith Jones','M','20']

However, using something like: 但是,使用类似:

line = line.split(",")
line = line.split(" ")

Will not work as it will separate them at the , or space and there will be values like: 不会起作用,因为它将在或处将它们分开,并且会有类似以下的值:

'Age John Smith' or 'Age','John','Smith'

Is there any way to work around this? 有什么办法可以解决此问题?

Split at , first and then iterate over that list and split at each item at whitespaces. 首先在处分割,然后遍历该列表,并在空格处的每个项目处分割。 If after splitting at whitespaces number of items returned are more than 1 then return the first item and rest of the items separately otherwise simply return the first item. 如果在空格处分割后返回的项目数大于1,则分别返回第一项和其余项,否则只需返回第一项即可。

import csv
def solve(row):
    for item in row:
        spl = item.split(None, 1)
        if len(spl) > 1:
            yield spl[0]
            yield spl[1]           
        else:
            yield spl[0]
...             
with open('abc1') as f:
    reader = csv.reader(f, delimiter=',')
    for row in reader:      
        print list(solve(row))
...         
['Name', 'Gender', 'Age', 'John Smith', 'M', '23', 'Ashley Jones', 'F', '18', 'James Smith Johns', 'M', '20']

Here's a solution using a regular expression: 这是使用正则表达式的解决方案:

re.compile("([^,]+),([^,]+),(\d+|Age)\s+").findall("Name,Gender,Age John Smith,M,23 Ashley Jones,F,18 James Smith Johns,M,20")

The result for this will be: 其结果将是:

[('Name', 'Gender', 'Age'), ('John Smith', 'M', '23'), ('Ashley Jones', 'F', '18')]

There are nice re -solutions, but I just wanted to add this non-regex solution: 有很好的re解决方案,但我只想添加此非正则表达式解决方案:

>>> s = "John Smith,M,23 Ashley Jones,F,18 James Smith Johns,M,20"
>>> sum((item.split(None, 1) for item in s.split(',')), list())
['Name', 'Gender', 'Age', 'John Smith', 'M', '23', 'Ashley Jones', 'F', '18', 'James Smith Johns', 'M', '20']

Instead of sum , you can also use itertools.chain . 除了sum ,还可以使用itertools.chain But in the end, it does not seem to be shorter at all. 但是最后,它似乎一点都不短。

>>> list(itertools.chain(*[item.split(None, 1) for item in s.split(',')]))

or better 或更好

>>> list(itertools.chain.from_iterable(item.split(None, 1) for item in s.split(',')))

A regular expression way. 正则表达式的方式。 :-) :-)

>>> s = "John Smith,M,23 Ashley Jones,F,18 James Smith Johns,M,20" #Note: no title here.
>>> [(x.group(1), x.group(3), x.group(4)) for x in re.finditer(r"(\S+( \S+)),(\S),(\d+)",s)]
[('John Smith', 'M', '23'), ('Ashley Jones', 'F', '18'), ('Smith Johns', 'M', '20')]

Note that I have removed the title (first line), you'll need to modify the regexp accordingly, or modify the input string. 请注意,我已经删除了标题(第一行),您需要相应地修改regexp或修改输入字符串。

As I see in example line will be line = line.split(",") enoght. 正如我在示例中看到的,行将是line = line.split(",") enoght。 Maybe I didn't get something? 也许我什么都没得到?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当所有参与者的数据都在一起时,我该如何分离参与者的数据? - How do I go about separating participants' data when it's all together? 如何使用Python从.CSV文件中收集数据? - How should one go about collecting data from a .CSV file using Python? 我如何 go 关于在循环中读取 csv 文件 - How do I go about reading a csv file while in a loop 将列数据分成csv文件中的两个新列 - separating column data into two new columns in a csv file 如何从该网站自动获取一个csv文件? - How do I go about automatically grabbing a csv file from this website? 我将如何将 .csv 转换为 .arrow 文件而不将其全部加载到内存中? - How would I go about converting a .csv to an .arrow file without loading it all into memory? 您将如何计算 CSV 文件中包含 Python 中每个唯一值的行数? - How would you go about counting the number of rows in a CSV file which contain each unique value in Python? 我有这个包含一堆字节和一些文本的非文本文件,我如何 go 将文本与 rest 干净地分开? - I have this non-text file that has a bunch of bytes and some text, how do I go about separating the text cleanly from the rest? 如何在Json中存储用户数据 - How to go about storing users data in Json 从列表中分离值并将其写入csv文件 - Separating values from list and write into csv file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM