简体   繁体   中英

Removing Unwanted Characters From List

I have a list of items that are structured similarly to this:

[{'Condition': '2013 Yamaha FJR 1300',
 'Date': '2018-02-28 11:30',
 'Description': ['\n        ',
  '\n2013 Yamaha FJR 1300 Sport Touring, 4 cylinder, 12.120 miles, silver, cruise control, traction control, ABS brakes, heated hand grips, Two Brothers exhaust, handle bar risers, 6.5 gal. gas tank, adjustable windshield, saddlebags, excellent condition, very clean.',
  '\n$ 7.500 (828) 250-0373 WWW.GREENVALLEYCARS.COM',
  '\n',
  '\n    '],
 'Images': [],
 'Latitude': '35.599694',
 'Location': ' (Asheville)',
 'Longitude': '-82.628866',
 'Price': '$7500',
 'Title': '2013 Yamaha FJR 1300',
 'Url': 'https://asheville.craigslist.org/mcd/d/2013-yamaha-fjr-1300/6513320993.html',
 '_id': {'$oid': '5a96dbee6f9ca5410cc9ed98'}},

{'Condition': '2014 Honda Accord Sedan',
 'Date': '2018-02-28 11:24',
 'Description': ['\n        ',
  '\n2014 Honda Accord  Automatic, White , On Tan, It has Only 41,980 Miles It Has Spoiler, Power Windows, and Mirrors, Tan Cloth Seats, Power Seats, 4 Cylinder, 4 Door, Radio, 6 CD Changer, FM,AM,CD, XM Radio, Bluetooth, Back up Camera, Side and Curtain Air Bag, 16 Inch Factory Wheels with Firestone  Great Tires, Tinted Glass, And Much More, Clean On inside, Runs and Drives Like New, Call Me for more info, 864-266-6936 Willing to Negotiate if offer is fair.....',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\nhonda, bmw, crv, mercedes, ford, mazda, lx, rx, ls, is, gs, 470 honda, lexus, toyota, ford, accord, civic, coupe, Mercedes,Honda Pilot, Lexus gx470 & 460, Chevrolet Tahoe, suburban, Tahoe, land rover, Nissan armada, GMC Yukon, Terrian, CX7, BMW x5, GMC Terrian, B 2011, 2010, 2009, 2008, 2007, 2012, 2013, 2014, 2016, 2006, 2005, 2017, 2018, ',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n',
  '\n    '],
 'Images': ['https://images.craigslist.org/00b0b_gNOi9VtqAy3_600x450.jpg',
  'https://images.craigslist.org/00a0a_gs2eKxUlQho_600x450.jpg',
  'https://images.craigslist.org/00l0l_lPmE8ML0zcb_600x450.jpg',
  'https://images.craigslist.org/00x0x_bS9gCuxM7ID_600x450.jpg',
  'https://images.craigslist.org/01010_dTS4DnHjVWW_600x450.jpg',
  'https://images.craigslist.org/00w0w_70D0xeDKa7d_600x450.jpg',
  'https://images.craigslist.org/00606_4SUFT4ZCbmO_600x450.jpg',
  'https://images.craigslist.org/00k0k_1AQ7kVbviPN_600x450.jpg',
  'https://images.craigslist.org/00d0d_3STBecGHaXD_600x450.jpg',
  'https://images.craigslist.org/01717_guG6n90XfQt_600x450.jpg',
  'https://images.craigslist.org/00h0h_8be8866trLr_600x450.jpg',
  'https://images.craigslist.org/00B0B_gaQQvQHlARl_600x450.jpg',
  'https://images.craigslist.org/00b0b_ih84Nskx5xj_600x450.jpg',
  'https://images.craigslist.org/01616_aveWbY1HQvr_600x450.jpg',
  'https://images.craigslist.org/00x0x_Fflsg0wwsK_600x450.jpg',
  'https://images.craigslist.org/00b0b_6FBg7KV8HYv_600x450.jpg',
  'https://images.craigslist.org/00J0J_3vd5Ip3mQ5S_600x450.jpg',
  'https://images.craigslist.org/00L0L_loNV2CrnnLn_600x450.jpg',
  'https://images.craigslist.org/00K0K_fh8oSEa9fKn_600x450.jpg',
  'https://images.craigslist.org/00r0r_8P0SjsOgNd5_600x450.jpg',
  'https://images.craigslist.org/00k0k_ZY0ywNmKkr_600x450.jpg',
  'https://images.craigslist.org/00y0y_7Gie7XD8uuH_600x450.jpg',
  'https://images.craigslist.org/00c0c_2nVDzLJhnYi_600x450.jpg',
  'https://images.craigslist.org/00202_7k10eK3bxMn_600x450.jpg'],
 'Latitude': '35.039000',
 'Location': ' (Cowpens)',
 'Longitude': '-81.822000',
 'Price': '$10995',
 'Title': '2014 Honda Accord  White  41k',
 'Url': 'https://asheville.craigslist.org/ctd/d/2014-honda-accord-white-41k/6513312696.html',
 '_id': {'$oid': '5a96dbf16f9ca5410cc9ed99'}}]

When I run the following code:

wanted_keys = ['Title', 'Location', 'Price', 'Description', 'Url', 'Latitude', 'Longitude'] 
for item in cl_used_items_raw[:2]:
    for k in wanted_keys:
        lines = str(item[k]).split()
        split_lines = [line.replace('\n', '').strip() for line in lines]
        print("{}".format(' '.join(split_lines) + '\t'))
    print('\n')

I get an ouput of:

2013 Yamaha FJR 1300    
(Asheville) 
$7500   
['\n ', '\n2013 Yamaha FJR 1300 Sport Touring, 4 cylinder, 12.120 miles, silver, cruise control, traction control, ABS brakes, heated hand grips, Two Brothers exhaust, handle bar risers, 6.5 gal. gas tank, adjustable windshield, saddlebags, excellent condition, very clean.', '\n$ 7.500 (828) 250-0373 WWW.GREENVALLEYCARS.COM', '\n', '\n ']    
https://asheville.craigslist.org/mcd/d/2013-yamaha-fjr-1300/6513320993.html 
35.599694   
-82.628866  


2014 Honda Accord White 41k 
(Cowpens)   
$10995  
['\n ', '\n2014 Honda Accord Automatic, White , On Tan, It has Only 41,980 Miles It Has Spoiler, Power Windows, and Mirrors, Tan Cloth Seats, Power Seats, 4 Cylinder, 4 Door, Radio, 6 CD Changer, FM,AM,CD, XM Radio, Bluetooth, Back up Camera, Side and Curtain Air Bag, 16 Inch Factory Wheels with Firestone Great Tires, Tinted Glass, And Much More, Clean On inside, Runs and Drives Like New, Call Me for more info, 864-266-6936 Willing to Negotiate if offer is fair.....', '\n', '\n', '\n', '\n', '\n', '\n', '\nhonda, bmw, crv, mercedes, ford, mazda, lx, rx, ls, is, gs, 470 honda, lexus, toyota, ford, accord, civic, coupe, Mercedes,Honda Pilot, Lexus gx470 & 460, Chevrolet Tahoe, suburban, Tahoe, land rover, Nissan armada, GMC Yukon, Terrian, CX7, BMW x5, GMC Terrian, B 2011, 2010, 2009, 2008, 2007, 2012, 2013, 2014, 2016, 2006, 2005, 2017, 2018, ', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n '] 
https://asheville.craigslist.org/ctd/d/2014-honda-accord-white-41k/6513312696.html  
35.039000   
-81.822000

I know I'm close but I'm struggling to determine how to write my for-loop to remove the additional whitespace characters in Description values while still maintaining the structure of the output I already have?

line.strip() doesn't modify line in-place - it returns the modified value, so the way you are using it won't affect line in any way.

You probably mean:

split_lines = [line.strip() for line in lines]
>>> desc = ['\n        ',
...   '\n2013 Yamaha FJR 1300 Sport Touring, 4 cylinder, 12.120 miles, silver, cruise control, traction control, ABS brakes, heated hand grips, Two Brothers exhaust, handle bar risers, 6.5 gal. gas tank, adjustable windshield, saddlebags, excellent condition, very clean.',
...   '\n$ 7.500 (828) 250-0373 WWW.GREENVALLEYCARS.COM',
...   '\n',
...   '\n    ']

Before:

>>> desc
['\n        ', '\n2013 Yamaha FJR 1300 Sport Touring, 4 cylinder, 12.120 miles, silver, cruise control, traction control, ABS brakes, heated hand grips, Two Brothers exhaust, handle bar risers, 6.5 gal. gas tank, adjustable windshield, saddlebags, excellent condition, very clean.', '\n$ 7.500 (828) 250-0373 WWW.GREENVALLEYCARS.COM', '\n', '\n    ']

Apply replace() and strip()

[x.replace('\n', '').strip() for x in desc ]

After:

['', '2013 Yamaha FJR 1300 Sport Touring, 4 cylinder, 12.120 miles, silver, cruise control, traction control, ABS brakes, heated hand grips, Two Brothers exhaust, handle bar risers, 6.5 gal. gas tank, adjustable windshield, saddlebags, excellent condition, very clean.', '$ 7.500 (828) 250-0373 WWW.GREENVALLEYCARS.COM', '', '']

If I understand you correctly, you can replace the newline character with empty string and then remove around whitespaces

 [x.replace('\n', '').strip() for x in desc ]

This gave me the correct output:

for item in cl_used_items_raw[:2]:
    for k in wanted_keys:
        if k == 'Description':
            lines = str(''.join(item[k])).split()
            split_lines = [line.replace('\n', '').strip() for line in lines]
            split_lines = ' '.join(split_lines)
            print(split_lines)
        else:
            lines = str(item[k]).split()
            split_lines = [line.replace('\n', '').strip() for line in lines]
            print("{}".format(' '.join(split_lines) + '\t'))       
    print('\n')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM