简体   繁体   English

Python - 在不依赖“\\n”的情况下难以连接多行

[英]Python - Difficulty joining multiple lines without depending on "\n"

I'm using beautifulsoup to find all locations on a webpage, and it does.我正在使用 beautifulsoup 来查找网页上的所有位置,而且确实如此。

get_location = second_soup.find_all('span', attrs={"class": "location"})
for local in get_location :
  if local:
    s = local.text
    s = s.replace("\n", "")
    s = s.replace("-", "") #removes the -
    s = s.split("|", 1)[0] #removes | and everything after it
    s = ''.join([i for i in s if not i.isdigit()]) #removes numbers from zip
    s = s.lstrip() #removes spaces
    s = s.rstrip() #removes spaces
    print(s)

I get the following result:我得到以下结果:

New York, NY
Brooklyn, NY
Johnville, KY

However, I need it like so:但是,我需要这样:

New York, NY, Brooklyn, NY, Johnville, KY

Things I've tried:我尝试过的事情:

1) instead of s.replace("\\n", "") using s.replace("\\n", ", ") 1) 代替s.replace("\\n", "")使用s.replace("\\n", ", ")

Results are identical, except when there was \\n it replaced with , So I get:结果是相同的,除非有 \\n 替换为,所以我得到:

, New York, NY, 
, Brooklyn, NY, 
, Johnville, KY, 

2) Removing replace and using s = '\\n'.join([line.strip() for line in s]) 2) 删除替换并使用s = '\\n'.join([line.strip() for line in s])

Results are weird, I get one character per line.结果很奇怪,我每行得到一个字符。 Such as:如:

N
E
W


Y
O
R
K

Edit编辑

The reason I need it in a one-liner is that I'm inserting that into an array, and I'm unable to insert more than one line into an array, so I get New York, NY and that's it.我在一行中需要它的原因是我将它插入到一个数组中,并且我无法在一个数组中插入多于一行,所以我得到New York, NY ,就是这样。

That's how I want my array:这就是我想要我的数组的方式:

['New York, NY, Brooklyn, NY, Johnville, KY', 'Boston, MA, Miami, FL'] etc ['New York, NY, Brooklyn, NY, Johnville, KY', 'Boston, MA, Miami, FL']

I can't test since we don't have your data, but I think you want something like:我无法测试,因为我们没有您的数据,但我认为您想要的是:

get_location = second_soup.find_all('span', attrs={"class": "location"})
rebuilt = []
for local in get_location :
    if local:
        s = local.text
        s = s.replace("\n", "")
        s = s.replace("-", "") #removes the -
        s = s.split("|", 1)[0] #removes | and everything after it
        s = ''.join([i for i in s if not i.isdigit()]) #removes numbers from zip
        s = s.strip() #removes spaces
        rebuilt.extend(s)
print(rebuilt)

You could do the following to replace newlines with commas:您可以执行以下操作以用逗号替换换行符:

s = ', '.join(s.split('\n'))

However, it would be helpful if you could provide an example data blob that you're working with.但是,如果您可以提供您正在使用的示例数据 blob,那将会很有帮助。

In the last line of your 'if' loop - print(s), as each line is printed with print's 'end' parameter as set to default = '\\n'.在“if”循环的最后一行 - 打印,因为每一行都使用设置为默认值 = '\\n' 的打印的 'end' 参数打印。 Therefore, its printed in next line for each loop.因此,它在每个循环的下一行打印。 So if you set the parameter to comma(,) or as per your choice then output will be printed in same line.因此,如果您将参数设置为逗号(,)或根据您的选择,则输出将打印在同一行中。 Try this:--尝试这个: -

print(s, end=',')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM