[英]Remove whitespace from list of strings with pandas/python
我有一個 dataframe,其中一列值是字符串列表。 這里要讀取的文件結構:
[
{
"key1":"value1 ",
"key2":"2",
"key3":["a","b 2 "," exp white space 210"],
},
{
"key1":"value1 ",
"key2":"2",
"key3":[],
},
]
如果每個項目不止一個空白,我需要刪除所有空白。 預計 output:
[
{
"key1":"value1",
"key2":"2",
"key3":["a","b2","exp white space 210"],
},
{
"key1":"value1",
"key2":"2",
"key3":[],
}
]
注意:我有一些值在某些行中是空的,例如"key3":[]
如果我理解正確的話,您的某些 dataframe 單元格具有list type
值。
file_name.json
內容如下:
[
{
"key1": "value1 ",
"key2": "2",
"key3": ["a", "b 2 ", " exp white space 210"]
},
{
"key1": "value1 ",
"key2": "2",
"key3": []
}
]
在這種情況下可能的解決方案如下:
import pandas as pd
import re
df = pd.read_json("file_name.json")
def cleanup_data(value):
if value and type(value) is list:
return [re.sub(r'\s+', ' ', x.strip()) for x in value]
elif value and type(value) is str:
return re.sub(r'\s+', ' ', value.strip())
else:
return value
# apply cleanup function to all cells in dataframe
df = df.applymap(cleanup_data)
df
退貨
key1 key2 key3
0 value1 2 [a, b 2, exp white space 210]
1 value1 2 []
如果我理解正確的話:
df = pd.read_json('''{
"key1":"value1 ",
"key2":"value2",
"key3":["a","b "," exp white space "],
"key2":" value2"
}''')
df = df.apply(lambda col: col.str.strip().str.replace(r'\s+', ' ', regex=True))
Output:
>>> df
key1 key2 key3
0 value1 value2 a
1 value1 value2 b
2 value1 value2 exp white space
>>> df.to_numpy()
array([['value1', 'value2', 'a'],
['value1', 'value2', 'b'],
['value1', 'value2', 'exp white space']], dtype=object)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.