使用 pandas/python 从字符串列表中删除空格

Question

I have a dataframe in which one columns values are lists of strings.我有一个 dataframe，其中一列值是字符串列表。 here the structure of the file to read:这里要读取的文件结构：

[
    {
        "key1":"value1 ",
        "key2":"2",
        "key3":["a","b  2 "," exp  white   space 210"],
    },
    {
        "key1":"value1 ",
        "key2":"2",
        "key3":[],
    },

]

I need to remove all white space for each item if it is more than one white space.如果每个项目不止一个空白，我需要删除所有空白。 expected output:预计 output：

[
    {
        "key1":"value1",
        "key2":"2",
        "key3":["a","b2","exp white space 210"],
    },
    {
        "key1":"value1",
        "key2":"2",
        "key3":[],
    }
]

Note: I have some value that are empty in some lines eg "key3":[]注意：我有一些值在某些行中是空的，例如"key3":[]

Answer 1

If I understand correctly some of your dataframe cells have list type values.如果我理解正确的话，您的某些 dataframe 单元格具有list type值。

The file_name.json content is below: file_name.json内容如下：

[
    {
        "key1": "value1 ",
        "key2": "2",
        "key3": ["a", "b  2 ", " exp  white   space 210"]
    }, 
    {
        "key1": "value1 ",
        "key2": "2",
        "key3": []
    }
]

Possible solution in this case is the following:在这种情况下可能的解决方案如下：

import pandas as pd
import re

df = pd.read_json("file_name.json")


def cleanup_data(value):
    if value and type(value) is list:
        return [re.sub(r'\s+', ' ', x.strip()) for x in value]
    elif value and type(value) is str:
        return re.sub(r'\s+', ' ', value.strip())
    else:
        return value

# apply cleanup function to all cells in dataframe
df = df.applymap(cleanup_data)

df

Returns退货

     key1  key2                           key3
0  value1     2  [a, b 2, exp white space 210]
1  value1     2                             []

Answer 2

If I understand correctly:如果我理解正确的话：

df = pd.read_json('''{
    "key1":"value1 ",
    "key2":"value2",
    "key3":["a","b   "," exp  white   space "],
    "key2":" value2"
}''')

df = df.apply(lambda col: col.str.strip().str.replace(r'\s+', ' ', regex=True))

Output: Output：

>>> df
     key1    key2             key3
0  value1  value2                a
1  value1  value2                b
2  value1  value2  exp white space

>>> df.to_numpy()
array([['value1', 'value2', 'a'],
       ['value1', 'value2', 'b'],
       ['value1', 'value2', 'exp white space']], dtype=object)

使用 pandas/python 从字符串列表中删除空格

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-03-18 16:59:14

解决方案2
0

使用 pandas/python 从字符串列表中删除空格

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-03-18 16:59:14

解决方案2 0

解决方案1
1 已采纳 2022-03-18 16:59:14

解决方案2
0