[英]Removing Strings from a Pandas DataFrame Column
我有一个熊猫数据框,如下所示。
DF1 =
sid path
1 '["rome","is","in","province","lazio"]'
1 "['rome', 'is', 'in', 'province', 'naples']"
1 ['N']
1 "['rome', 'is', 'in', 'province', 'in', 'campania']"
....
我想删除列path
所有不必要的字符,因此结果应如下所示:
DF2 =
sid path
1 rome is in province lazio
1 rome is in province naples
1 N
1 rome is in province in campania
....
我尝试替换所有不必要的字符,例如:
DF1["path"].replace("[","").replace("]","").replace('"',"").replace(","," ").replace("'","")
但这没有用。 我想这是由于条目["N"]
我怎样才能做到这一点? 任何帮助表示赞赏!
您可以使用ast.literal_eval
安全地读取以字符串形式输出的列表。 解决真实列表的一种方法是捕获ValueError
。
请注意,如果可能的话,您应该尝试在这些问题到达您的数据框之前在上游对其进行排序。
from ast import literal_eval
df = pd.DataFrame({'sid': [1, 1, 1, 1],
'path': ['["rome","is","in","province","lazio"]',
"['rome', 'is', 'in', 'province', 'naples']",
['N'],
"['rome', 'is', 'in', 'province', 'in', 'campania']"]})
def converter(x):
try:
return ' '.join(literal_eval(x))
except ValueError:
return ' '.join(x)
df['path'] = df['path'].apply(converter)
print(df)
path sid
0 rome is in province lazio 1
1 rome is in province naples 1
2 N 1
3 rome is in province in campania 1
使用ast.literal_eval
和str.join
演示:
import pandas as pd
import ast
df = pd.DataFrame({"path": ['["rome","is","in","province","lazio"]', "['rome', 'is', 'in', 'province', 'naples']", ['N']]})
df['path'] = df['path'].astype(str).apply(ast.literal_eval).apply(lambda x: " ".join(x))
print(df)
输出:
path
0 rome is in province lazio
1 rome is in province naples
2 N
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.