簡體   English   中英

如何從類似json的文本中提取值

[英]How to extract values from json-like text

我想從類似json的文本中提取值,如下所示:

df.head()
    budget  genres  homepage    id  keywords    original_language   original_title  overview    popularity  production_companies    ... runtime spoken_languages    status  tagline title   vote_average    vote_count  movie   cast    crew
0   237000000   [{"id": 28, "name": "Action"}, {"id": 12, "nam...   http://www.avatarmovie.com/ 19995   [{"id": 1463, "name": "culture clash"}, {"id":...   en  Avatar  In the 22nd century, a paraplegic Marine is di...   150.437577  [{"name": "Ingenious Film Partners", "id": 289...   ... 162.0   [{"iso_639_1": "en", "name": "English"}, {"iso...   Released    Enter the World of Pandora. Avatar  7.2 11800   Avatar  [{"cast_id": 242, "character": "Jake Sully", "...   [{"credit_id": "52fe48009251416c750aca23", "de...
1   300000000   [{"id": 12, "name": "Adventure"}, {"id": 14, "...   http://disney.go.com/disneypictures/pirates/    285 [{"id": 270, "name": "ocean"}, {"id": 726, "na...   en  Pirates of the Caribbean: At World's End    Captain Barbossa, long believed to be dead, ha...   139.082615  [{"name": "Walt Disney Pictures", "id": 2}, {"...   ... 169.0   [{"iso_639_1": "en", "name": "English"}]    Released    At the end of the world, the adventure begins.  Pirates of the Caribbean: At World's End    6.9 4500    Pirates of the Caribbean: At World's End    [{"cast_id": 4, "character": "Captain Jack Spa...   [{"credit_id": "52fe4232c3a36847f800b579", "de...
2   245000000   [{"id": 28, "name": "Action"}, {"id": 12, "nam...   http://www.sonypictures.com/movies/spectre/ 206647  [{"id": 470, "name": "spy"}, {"id": 818, "name...   en  Spectre A cryptic message from Bond’s past sends him o...

我試過了:

# Parse the stringified features into their corresponding python objects
from ast import literal_eval

features = ['cast', 'crew', 'keywords', 'genres', 'original_language']
for feature in features:
    df[feature] = df[feature].apply(literal_eval)

...引起:

ValueError:格式錯誤的節點或字符串:<_ast.Name對象,位於0x7f5c5a523358>

幫助將是適當的。

我認為問題在於錯誤的值,一種可能的解決方案是使用try-except語句創建自定義函數:

df = pd.DataFrame({'genres':['[{"id": 28, "name": "Action"}]',
                             '[{"id": 28, "name": "Action"}, {"id": 12, "n]']})
print (df)
                                          genres
0                 [{"id": 28, "name": "Action"}]
1  [{"id": 28, "name": "Action"}, {"id": 12, "n]

from ast import literal_eval

def literal_eval_cust(x):
    try:
        return literal_eval(x)
    except Exception:
        return {}

features = ['genres']
for feature in features:
    df[feature] = df[feature].apply(literal_eval_cust)

print (df)
                           genres
0  [{'id': 28, 'name': 'Action'}]
1                              {}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM