简体   繁体   English

从 pandas dataframe 中提取 str

[英]Extracting str from pandas dataframe

I read csv file into a dataframe named df我将 csv 文件读入名为 df 的 dataframe

Each rows contains str below.每行包含下面的 str 。

{"name":"Daniel Gimness","id":10551043...} {"name":"丹尼尔·吉姆尼斯","id":10551043...}

I would like to extract "name" and "id" from each row and make a new dataframe to store the str.我想从每一行中提取“name”和“id”并制作一个新的 dataframe 来存储 str。

I tried several ways to do it but all failed and below is the outcome of one of my attempts.我尝试了几种方法来做到这一点,但都失败了,下面是我尝试的结果之一。 Please let me know if there is any suggestions on how to solve this problem.如果对如何解决此问题有任何建议,请告诉我。 Thanks谢谢

pd.DataFrame.from_records(df.creator.tolist())
0   1   2   3   4   5   6   7   8   9   ... 934 935 936 937 938 939 940 941 942 943
0   {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None
1   {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None
2   {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None
3   {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None
4   {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
195609  {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None
195610  {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None
195611  {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None
195612  {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None
195613  {   "   u   r   l   s   "   :   {   "   ... None    None    None    None    None    None    None    None    None    None

Use a regex expression with pandas.Series.str.extract() .使用带有pandas.Series.str.extract()的正则表达式。

Something like:就像是:

df["id"] = df["creator"].str.extract(""" "id":"([0-9]+)" """)

It seems that you've Json data in column "creator" .您似乎在"creator"列中有 Json 数据。 You can try:你可以试试:

import json

x = df["creator"].apply(
    lambda x: {"name": (m := json.loads(x))["name"], "id": m["id"]}
)
print(pd.DataFrame(x.to_list()))

Prints:印刷:

                name        id
0     Daniel Gimness  10551043
1  Redmond Entwistle  10551043

...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM