![](/img/trans.png)
[英]Python Pandas convert 1 column of combination of strings to multiple columns of categorical data
[英]Convert a column of json strings into columns of data
我有一個大約30000行的大數據框和一個包含json字符串的單列。 每個json字符串包含多個變量及其值,我想將此json字符串分解為數據列
兩行看起來像
0 {"a":"1","b":"2","c":"3"}
1 {"a" ;"4","b":"5","c":"6"}
我想將其轉換為像
a b c
1 2 3
4 5 6
請幫忙
您的列值似乎在實際的json字符串之前有一個額外的數字。 因此,您可能希望先將其剝離(如果不是這樣,請跳至Method )
一種方法是將函數應用於列
# constructing the df
df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])
# print(df)
json
# 0 0 {"a":"1","b":"2","c":"3"}
# 1 1 {"a" :"4","b":"5","c":"6"}
# function to remove the number
import re
def split_num(val):
p = re.compile("({.*)")
return p.search(val).group(1)
# applying the function
df['json'] = df['json'].map(lambda x: split_num(x))
print(df)
# json
# 0 {"a":"1","b":"2","c":"3"}
# 1 {"a" :"4","b":"5","c":"6"}
方法:
df
采用上述格式后,下面的代碼會將每個行條目轉換為字典:
df['json'] = df['json'].map(lambda x: dict(eval(x)))
然后,將pd.Series
應用於列即可
d = df['json'].apply(pd.Series)
print(d)
# a b c
# 0 1 2 3
# 1 4 5 6
with open(json_file) as f:
df = pd.DataFrame(json.loads(line) for line in f)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.