简体   繁体   中英

Pandas DataFrame: Replace based on filter and regex extract

Here's a section of my dataframe:

   Type      Date        Diff   Data
0  Section   20171204    1.0    ~
1  Korean    20171204    1.0    저는 유양이에요.
2  English   20171204    1.0    Im Yooyang.
3  Theme     20171204    1.0    {"zh":"介绍","vi":"giới thiệu","ko":"소개","en":"I...

There are over 10,000 rows, ~500 of which are Type 'Theme'.

I'm trying to replace the Theme Data with only the Korean, ie {"zh":"介绍","vi":"giới thiệu","ko":"소개","en":"I... becomes 소개 .

I can extract the Korean-only text using regex ([가-힣]+) .

I tried making a new df of just the new Theme Data, using df[df['Type'] == 'Theme'][['Data']].T.squeeze().str.extract('([가-힣]+)') , but I can't figure out how to merge this back into the original df ( df[df['Type'] == 'Theme'][['Data']] = doesn't work.

I tried replace, but I can't seem to do it just for Theme Data.

And apparently I shouldn't use an iterator: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html

You might use the map method together with an anonymous helper function, converting the string to a dict with json.loads and indexing via loc :

import json

df.loc[df.Type == 'Theme', 'Data'] = df.loc[df.Type == 'Theme', 'Data'].map(lambda x: json.loads(x)["ko"])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM