简体   繁体   English

从特定字符开始按空格拆分python字符串

[英]Split python string by space starting at certain character

I am trying to run some basic analyses on a .txt file filled with car data. 我正在尝试对填充了汽车数据的.txt文件运行一些基本分析。 I have read in the file to Python and am trying to split it into the appropriate columns, but what will be the "first column," the car name, sometimes has multiple words. 我已经在文件中读到了Python并试图将其拆分为相应的列,但是“第一列”,汽车名称,有时会有多个单词。 For example, below are two lines with some of the information that my file has: 例如,下面是两行,其中包含我的文件所具有的一些信息:

  1. Car Date Color Quantity (header row) 汽车日期颜色数量 (标题行)
  2. Chevy Nova 7/1/2000 Blue 28,000 Chevy Nova 7/1/2000蓝色28,000
  3. Cadillac 7/1/2001 Silver 30,000 凯迪拉克7/1/2001银牌30,000

Therefore, when I split each line by spaces alone, I end up with lists of different sizes--in the example above, the "Chevy" and "Nova" would be separated from one another. 因此,当我单独用空格分割每一行时,我最终会得到不同大小的列表 - 在上面的例子中,“雪佛兰”和“新星”将彼此分开。

I have figured out a way to identify the the portion of each line that represents the car name: 我找到了一种方法来识别代表汽车名称的每一行的部分:

for line in cardata:
if line == line[0]: #for header line
    continue
else:
    line = line.rstrip()
    carnamebreakpoint =  line.find('7/')
    print carnamebreakpoint
    carname = line[:carnamebreakpoint]
    print carname

What I'd like to do now is tell python to split by space after the carname (with the end goal of a list that looks like [carname, date, color, number sold]), but I've tried playing around with the .split() function to do this with no luck thus far. 我现在要做的是告诉python在carname 之后按空格分割(列表的最终目标看起来像[carname,date,color,number sold]),但是我试过玩了.split()函数到目前为止没有运气。 I'd love some guidance on how to proceed, as I'm fairly new to programming. 我喜欢关于如何继续的一些指导,因为我对编程很新。

Thanks in advance for any help! 在此先感谢您的帮助!

s = "Chevy Nova 7/1/2000 Blue 28,000"  
s.rsplit(None,3)

It will only split 3 times from the end of the string: 它只会从字符串的末尾拆分3次:

In [4]: s = "Chevy Nova 7/1/2000 Blue 28,000"    
In [5]: s.rsplit(None,3)
Out[5]: ['Chevy Nova', '7/1/2000', 'Blue', '28,000']
In [8]: s ="Car Date Color Quantity "
In [9]: s.rsplit(None,3)
Out[9]: ['Car', 'Date', 'Color', 'Quantity']

This presumes that the last three items will always be single word strings like in your example which should be correct or else you indexing approach will also fail. 这假设最后三个项目将始终是单个字符串,如您的示例中应该是正确的,否则您的索引方法也将失败。

Also to ignore the header you can call next() on the file object. 另外要忽略标题,可以在文件对象上调用next()。

with open("your_file.txt") as f:
    header = next(f)
    for line in f:
        car_name,date,col,mile = line.rstrip().rsplit(None,3)
        print(car_name,date,col,mile)
('Chevy Nova', '7/1/2000', 'Blue', '28,000')
('Cadillac', '7/1/2001', 'Silver', '30,000')

首先在断点处对字符串进行切片,然后对结果调用split()

date, color, quantity = line[breakpoint:].split()

Depending on how confident you are on the format of your data your solution might not be the best one. 根据您对数据格式的信心,您的解决方案可能不是最好的解决方案。

What would happen if you get a car with a date different from the 7th of some month? 如果你买的车的日期与某个月的7号不同,会发生什么? And what about the color "Light Blue". 那么“浅蓝”的颜色怎么样?

This kind of task fit quite well the use case for regex. 这种任务非常适合正则表达式的用例。

For instance given a regex of this kind would let you easily isolate the 4 components: 例如,给定这种正则表达式可以让您轻松隔离4个组件:

^(.*) (\d{1,2}/\d{1,2}/\d{4}) (.*) ([\d,]+)$

In python you can use it like this: 在python中你可以像这样使用它:

import re
s = "Chevy Nova 7/1/2000 Blue 28,000"
m = re.match(r"^(.*) (\d{1,2}/\d{1,2}/\d{4}) (.*) ([\d,]+)$", s)
m.group(1) # => Chevy Nova
m.group(2) # => 7/1/2000
m.group(3) # => Blue
m.group(4) # => 28,0000

And if you have a string with multiple lines you could batch process them like this: 如果你有一个包含多行的字符串,你可以像这样批量处理它们:

s = """Chevy Nova 7/1/2000 Blue 28,000
Chevy Nova 10/6/2002 Light Blue 28,000
Cadillac 7/1/2001 Silver 30,000"""

re.findall(r"^(.*) (\d{1,2}/\d{1,2}/\d{4}) (.*) ([\d,]+)$", s, flags=re.MULTILINE)
# => [('Chevy Nova', '7/1/2000', 'Blue', '28,000'),
# =>  ('Chevy Nova', '10/6/2002', 'Light Blue', '28,000'),
# =>  ('Cadillac', '7/1/2001', 'Silver', '30,000')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM