简体   繁体   中英

How to clean image column in a dataframe using Python3.6?

I have fetched products data from a site and after normalization I store the result in a dataframe. For a quick view of this df, following is the content of

print(df.head().to_dict())

{'Available': {0: 33, 1: 22, 2: 12, 3: 12, 4: 11}, 'Images': {0: ['https://example.com/e1e619ab5f11ffe311db03eefad5a2f4.jpg', 'https://example.com/7edc2e3cda8b63591bfacda9e254ad08.jpg', 'https://example.com/7ed2b44335f73cabe0411819820e4d0b.jpg', 'https://example.com/82fed0e56c531cde2fcf5b98f7418a6a.jpg', 'https://example.com/f536c423a97d0c9ab8c488a453818780.jpg', '', '', ''], 1: ['https://example.com/7d63597ae7a75b8481d9d4318951d6c1.jpg', '', '', '', '', '', '', ''], 2: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/d59266704fa3f9750c02ea79956acf1e.jpg', '', '', '', '', '', ''], 3: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/af285804c936cd3278cb2982b6f7a089.jpg', '', '', '', '', '', ''], 4: ['https://example.com/e4b6927a6bf8ad48394534c657ea0994.jpg', 'https://example.com/e630996c631e35013be0fbe0c0113fc5.jpg', '', '', '', '', '', '']}}

I need to clean the Images column here and want to store them without '','[',']' like below- https://example.com/image1.jpg,https://e...image2.jpg

in dataframe column.

I tried with below function-

def formatter(x):
    return ','.join(list(map(os.path.basename, x)))

df['Images'].apply(literal_eval).apply(formatter)

But it gives me ValueError: malformed node or string

Please help to resolve above issue.

Unless I misunderstand the question. I am applying the following to your dataframe above.

def formatter(li):
    return ",".join([x for x in li if x != ""])

df['Images'] = df['Images'].apply(formatter)



print(df)
  Available                                             Images
0         33  https://example.com/e1e619ab5f11ffe311db03eefa...
1         22  https://example.com/7d63597ae7a75b8481d9d43189...
2         12  https://example.com/7476c30281056d6810787c617f...
3         12  https://example.com/7476c30281056d6810787c617f...
4         11  https://example.com/e4b6927a6bf8ad48394534c657...

And to better view just one:

print(df.Images[0])

https://example.com/e1e619ab5f11ffe311db03eefad5a2f4.jpg,https://example.com/7edc2e3cda8b63591bfacda9e254ad08.jpg,https://example.com/7ed2b44335f73cabe0411819820e4d0b.jpg,https://example.com/82fed0e56c531cde2fcf5b98f7418a6a.jpg,https://example.com/f536c423a97d0c9ab8c488a453818780.jpg

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM