计算数据框列中字典值的字符数

Question

After downloading Facebook data, they provide json files with your post information.下载 Facebook 数据后，他们会提供带有您的帖子信息的 json 文件。 I read the json and dataframe with pandas.我用熊猫阅读了json和数据框。 Now I want to count the characters of every post I made.现在我想计算我发表的每个帖子的字符数。 The posts are in: df['data'] like: [{'post': 'Happy bday Raul'}] .这些帖子在： df['data'] 中，例如： [{'post': 'Happy bday Raul'}] 。

I want the output to be the count of characters of: "Happy bday Raul" which will be 15 in this case or 7 in the case of "Morning" from [{'post': 'Morning'}] .我希望输出是以下字符的计数： "Happy bday Raul" 在这种情况下为 15，在[{'post': 'Morning'}]的 "Morning" 情况下为 7。

df=pd.read_json('posts_1.json')

The columns are Date and Data with this format:列是具有以下格式的日期和数据：

Date        Data
01-01-2020    *[{'post': 'Morning'}]*
10-03-2020    *[{'post': 'Happy bday Raul'}]*
17-03-2020    *[{'post': 'This lockdown is sad'}]*

I tried to count the characters of this [{'post': 'Morning'}] by doing this我试图通过这样做来计算这个[{'post': 'Morning'}]的字符

df['count']=df['data'].str.len()

But it's not working as result in "1".但它不能作为“1”的结果。

I need to extract the value of the dictionary and do the len to count the characters.我需要提取字典的值并执行 len 来计算字符数。 The output will be:输出将是：

Date        Data                                   COUNT 
01-01-2020    *[{'post': 'Morning'}]*               5
10-03-2020    *[{'post': 'Happy bday Raul'}]*       15
17-03-2020    *[{'post': 'This lockdown is sad'}]*  20

EDITED:编辑：

Used to_dict()用于_dict()

df11=df_post['data'].to_dict()

Output输出

{0: [{'post': 'Feliz cumpleaÃ±os Raul'}],
 1: [{'post': 'Muchas felicidades Tere!!! Espero que todo vaya genial y siga aÃºn mejor! Un beso desde la Escandinavia profunda'}],
 2: [{'post': 'Hola!\nUna investigadora vendrÃ¡ a finales de mayo, Â¿Alguien tiene una habitaciÃ³n libre en su piso para ella? Many Thanks!'}],
 3: [{'post': 'Â¿CÃ³mo va todo? Se que muchos estÃ¡is o estÃ¡bais por Galicia :D\n\nOs recuerdo, el proceso de MatriculaciÃ³n tiene unos plazos concretos: desde el lunes 13 febrero hasta el viernes 24 de febrero.'}]
}

Answer 1

You can access the value of the post key for each row using list comprehension and count the length with str.len() :您可以使用列表理解访问每一行的post键的值，并使用str.len()计算长度：

In one line of code, it would look like this:在一行代码中，它看起来像这样：

df[1] = pd.Series([x['post'] for x in df[0]]).str.len()

This would also work, but I think it would be slower to execute:这也可以，但我认为执行起来会更慢：

df[1] = df[0].apply(lambda x: x['post']).str.len()

Full reproducible code below:完整的可重现代码如下：

df = pd.DataFrame({0: [{'post': 'Feliz cumpleaÃ±os Raul'}],
 1: [{'post': 'Muchas felicidades Tere!!! Espero que todo vaya genial y siga aÃºn mejor! Un beso desde la Escandinavia profunda'}],
 2: [{'post': 'Hola!\nUna investigadora vendrÃ¡ a finales de mayo, Â¿Alguien tiene una habitaciÃ³n libre en su piso para ella? Many Thanks!'}],
 3: [{'post': 'Â¿CÃ³mo va todo? Se que muchos estÃ¡is o estÃ¡bais por Galicia :D\n\nOs recuerdo, el proceso de MatriculaciÃ³n tiene unos plazos concretos: desde el lunes 13 febrero hasta el viernes 24 de febrero.'}]
})
df = df.T
df[1] = [x['post'] for x in df[0]]
df[2] = df[1].str.len()
df
Out[1]: 
                                                   0  \
0                 {'post': 'Feliz cumpleaÃ±os Raul'}   
1  {'post': 'Muchas felicidades Tere!!! Espero qu...   
2  {'post': 'Hola!
Una investigadora vendrÃ¡ a fi...   
3  {'post': 'Â¿CÃ³mo va todo? Se que muchos estÃ¡...   

                                                   1    2  
0                             Feliz cumpleaÃ±os Raul   22  
1  Muchas felicidades Tere!!! Espero que todo vay...  112  
2  Hola!\nUna investigadora vendrÃ¡ a finales de ...  123  
3  Â¿CÃ³mo va todo? Se que muchos estÃ¡is o estÃ¡...  195

计算数据框列中字典值的字符数

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-12 01:03:19

计算数据框列中字典值的字符数

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-12 01:03:19

解决方案1
1 已采纳 2020-11-12 01:03:19