简体   繁体   中英

How to get all text from an xpath using lxml

I currently have the below

u = 'https://www.cruiseplum.com/search#{%22numPax%22:2,%22geo%22:%22US%22,%22portsMatchAll%22:true,%22numOptionsShown%22:100,%22ppdIncludesTaxTips%22:true,%22uiVersion%22:%22split%22,%22sortTableByField%22:%22dd%22,%22sortTableOrderDesc%22:false,%22filter%22:null}'
driver = webdriver.Chrome()
driver.get(u)
driver.maximize_window()

time.sleep(.3)

driver.find_element_by_id('restoreSettingsYesEncl').click() # select 'yes' on the webpage to restore settings
time.sleep(7) # wait until the website downloads data so we get a return value

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("innerHTML")

t = html.fromstring(source_code)    

for i in t.xpath('//td[@class="dc-table-column _2"]/text()'):
        print(i.strip())

The goal of this is to get the text from the webpage listed in the code. The problem I am running into, is if there are two ports listed in the "Route" column. The code I currently have will print it on 2 separate lines.

Here is an example html that I am having problems with:

<td class="dc-table-column _2">Fort Lauderdale <i class="fa fa-long-arrow-right"></i> Venice</td>

For this example, it will print "Fort Lauderdale" on line 1, then "Venice" on line 2. I would like to be able to print them both on one line.

This is just a consequence of your way of printing the results, as pointed out by AMC in a comment .

print() adds a newline to every string it prints.

Alternative printing method

results = t.xpath('//td[@class="dc-table-column _2"]/text()')

print(" ".join([r.strip() for r in results]))

Output

Barcelona Martinique Martinique Doha Doha Fort Lauderdale Venice Miami Miami Miami Miami [...]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM