简体   繁体   中英

Data cleaning, dictionary, inside dictionary,inside lists in CSV

I'm a newbie learning data science, I've been trying to clean a data set, but I've had some hurdles on the way, the first issue I had was to explode a Dictionary inside a table into individual columns link below), thanks to user Parfait I could do it using literal_eval, then I had a problem trying to apply the same solution until I found literal_eval has issues with null values, I got rid of nulls and some bad uses of quotes.

Now I got this, it seems that a column, which is a dictionary has not one but two values which are dictionaries themselves, I've tried to pop and del those values, but it seems the data is not considered a dictionary so I couldn't afford it.

When running df['creator'].map(eval) I get the message appended below, look to the "avatar" and "api" columns, these two columns are not necessary for what I want, so I could drop them, but I have not find a way to do it.

To be clear I just want to extract id and name columns as "cre_id" and "cre_name", add them to the main df with prefix and deleting the rest of the column, thank you for your help.

df['creator'].map(eval)
  File "<string>", line 1
    {"id":347819977,"name":Raul CJ Montes,"is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/019/996/402/9de6ab427db7becb81711ce9b25e3645_original.jpg?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1517101311&auto=format&frame=1&q=92&s=c41776ee80edfa63ba4dc916b24f6f00","small":"https://ksr-ugc.imgix.net/assets/019/996/402/9de6ab427db7becb81711ce9b25e3645_original.jpg?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1517101311&auto=format&frame=1&q=92&s=6983b13a3c4e7a7a5f0b2d42f78f50dc","medium":"https://ksr-ugc.imgix.net/assets/019/996/402/9de6ab427db7becb81711ce9b25e3645_original.jpg?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1517101311&auto=format&frame=1&q=92&s=bb04642f7264234e6c01c5b1b77d8c63"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/347819977"},"api":{"user":"https://api.kickstarter.com/v1/users/347819977?signature=1631849457.e135d96dc2a9edbddb71deef896c78155ed13e8b"}}}
                                 ^
SyntaxError: invalid syntax

Edit: Added first ten rows of the dataset:

{0: '{"id":1379875462,"name":"Batton Lash","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/006/347/706/b3908a1a23f6b9e472edcf7c934e5b0e_original.jpg?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1461382354&auto=format&frame=1&q=92&s=4d88bd2ed1e7098fcaf046321cc4be15","small":"https://ksr-ugc.imgix.net/assets/006/347/706/b3908a1a23f6b9e472edcf7c934e5b0e_original.jpg?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1461382354&auto=format&frame=1&q=92&s=664f586cef17d83dc408a6a10b0f3c4a","medium":"https://ksr-ugc.imgix.net/assets/006/347/706/b3908a1a23f6b9e472edcf7c934e5b0e_original.jpg?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1461382354&auto=format&frame=1&q=92&s=fe307263e32a2385e764e3923a13179e"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/1379875462"},"api":{"user":"https://api.kickstarter.com/v1/users/1379875462?signature=1631849432.d50b79030e15111575554ecae171babad1f2925d"}}}',
 1: '{"id":408247096,"name":"Scott(skoddii)","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/020/330/517/383423c1c19dfbd99534c6185eb09a6f_original.png?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1519354368&auto=format&frame=1&q=92&s=74f83e0070b20db01d5180ba214d1b5e","small":"https://ksr-ugc.imgix.net/assets/020/330/517/383423c1c19dfbd99534c6185eb09a6f_original.png?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1519354368&auto=format&frame=1&q=92&s=671b9100176dbfa63752a7a8e9cc63d0","medium":"https://ksr-ugc.imgix.net/assets/020/330/517/383423c1c19dfbd99534c6185eb09a6f_original.png?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1519354368&auto=format&frame=1&q=92&s=956c6f85ffbc3fb179c260611254a2be"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/408247096"},"api":{"user":"https://api.kickstarter.com/v1/users/408247096?signature=1631849432.6cc0456d4795aea0b32f861b050212afef4387ce"}}}',
 2: '{"id":361953386,"name":"Luis G. Batista, CPM, C.P.S.M","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/015/751/771/b9a11e982831d2190d68e2ea0d3a4ff0_original.jpg?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1488754184&auto=format&frame=1&q=92&s=f4dc0bbe5e7edbb35fb15c07bdb2c843","small":"https://ksr-ugc.imgix.net/assets/015/751/771/b9a11e982831d2190d68e2ea0d3a4ff0_original.jpg?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1488754184&auto=format&frame=1&q=92&s=9c7e202bb6491516468ec69dff66bcdd","medium":"https://ksr-ugc.imgix.net/assets/015/751/771/b9a11e982831d2190d68e2ea0d3a4ff0_original.jpg?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1488754184&auto=format&frame=1&q=92&s=ac05f1a9827cc321ea3e8f754f19be94"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/361953386"},"api":{"user":"https://api.kickstarter.com/v1/users/361953386?signature=1631849432.7262fa85aec828a6b01ea70685ef22b0ada784ad"}}}',
 3: '{"id":202579323,"name":"Brian Carmichael","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/010/482/911/12f9ff13c9a415e4e869b8036662f02c_original.jpg?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1488680236&auto=format&frame=1&q=92&s=9433c133b6bf02a45dd8ba78a0b44a46","small":"https://ksr-ugc.imgix.net/assets/010/482/911/12f9ff13c9a415e4e869b8036662f02c_original.jpg?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1488680236&auto=format&frame=1&q=92&s=900c300f2d425243c108ed4419c78793","medium":"https://ksr-ugc.imgix.net/assets/010/482/911/12f9ff13c9a415e4e869b8036662f02c_original.jpg?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1488680236&auto=format&frame=1&q=92&s=55e58d426c7f41b92081ce735abac404"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/202579323"},"api":{"user":"https://api.kickstarter.com/v1/users/202579323?signature=1631849432.fb88647e78bbe87ca2646330b0d84a0237c7cc46"}}}',
 4: '{"id":1996450690,"name":"Dan Schmeidler","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/015/757/606/4f4d33cc942cdfe4b95af09e43a49255_original.JPG?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1488802482&auto=format&frame=1&q=92&s=97f88d105a1bc21a72f008859b13055c","small":"https://ksr-ugc.imgix.net/assets/015/757/606/4f4d33cc942cdfe4b95af09e43a49255_original.JPG?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1488802482&auto=format&frame=1&q=92&s=a423f3fbf75bdb32f1c895a1f0d76bca","medium":"https://ksr-ugc.imgix.net/assets/015/757/606/4f4d33cc942cdfe4b95af09e43a49255_original.JPG?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1488802482&auto=format&frame=1&q=92&s=49f4a2d61132d1068d3f604b03a1f8e5"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/1996450690"},"api":{"user":"https://api.kickstarter.com/v1/users/1996450690?signature=1631849432.3b51c0d212170f4228293d3133045d040c6a6285"}}}',
 5: '{"id":903880044,"name":"Doug McQuilken","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/014/523/998/230d7cd9d27128f28366a7a1c4977273_original.jpg?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1479214827&auto=format&frame=1&q=92&s=84c65c201bdb46e72afeef51ad261913","small":"https://ksr-ugc.imgix.net/assets/014/523/998/230d7cd9d27128f28366a7a1c4977273_original.jpg?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1479214827&auto=format&frame=1&q=92&s=52beef6574a551f81be17acc750d4e2e","medium":"https://ksr-ugc.imgix.net/assets/014/523/998/230d7cd9d27128f28366a7a1c4977273_original.jpg?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1479214827&auto=format&frame=1&q=92&s=b4bb14d2759e21e6c40d3ef9c86c1ed3"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/903880044"},"api":{"user":"https://api.kickstarter.com/v1/users/903880044?signature=1631849432.6a7dcb45d0ca2a4c5922d51a0b3f36f7972b6ac0"}}}',
 6: '{"id":1391487766,"name":"Karen Scott","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/015/612/365/b1ce5bfa90d24a767547b168e3efdbef_original.JPG?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1487847709&auto=format&frame=1&q=92&s=e18d2c915b50e20cf27bb1255ad82ba9","small":"https://ksr-ugc.imgix.net/assets/015/612/365/b1ce5bfa90d24a767547b168e3efdbef_original.JPG?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1487847709&auto=format&frame=1&q=92&s=bd7c22cafcec49e73bea6a106976043c","medium":"https://ksr-ugc.imgix.net/assets/015/612/365/b1ce5bfa90d24a767547b168e3efdbef_original.JPG?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1487847709&auto=format&frame=1&q=92&s=d1d5327de95dac76d4cbed7a95007de1"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/1391487766"},"api":{"user":"https://api.kickstarter.com/v1/users/1391487766?signature=1631849432.2720fa0d8a70ccfc33034287985b98c0c791a23d"}}}',
 7: '{"id":1344116211,"name":"Sanjiv(Sam) Mall","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/015/648/502/206b8686072b528ea6fd1fe78adfcc25_original.JPG?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1488128800&auto=format&frame=1&q=92&s=fd4520798d39b777e5814219c8fe4ad2","small":"https://ksr-ugc.imgix.net/assets/015/648/502/206b8686072b528ea6fd1fe78adfcc25_original.JPG?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1488128800&auto=format&frame=1&q=92&s=67553420e14378664ae3555275a25d51","medium":"https://ksr-ugc.imgix.net/assets/015/648/502/206b8686072b528ea6fd1fe78adfcc25_original.JPG?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1488128800&auto=format&frame=1&q=92&s=f08f3b4420e3ab37c4e07b4f98100dde"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/1344116211"},"api":{"user":"https://api.kickstarter.com/v1/users/1344116211?signature=1631849432.6e307780f53a56c7a6dd5493ae59f26575d9fbcb"}}}',
 8: '{"id":2071365832,"name":"Christoph Vogelbusch","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/012/912/270/d2f18c4ec6fcb2357ab073d0e6e0aa9e_original.png?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1467291732&auto=format&frame=1&q=92&s=3b321faecc138d42f7aa249620fc342d","small":"https://ksr-ugc.imgix.net/assets/012/912/270/d2f18c4ec6fcb2357ab073d0e6e0aa9e_original.png?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1467291732&auto=format&frame=1&q=92&s=967c607450ac03547632f0865270822f","medium":"https://ksr-ugc.imgix.net/assets/012/912/270/d2f18c4ec6fcb2357ab073d0e6e0aa9e_original.png?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1467291732&auto=format&frame=1&q=92&s=507442b8d2a97678675ec7c19b049e4b"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/2071365832"},"api":{"user":"https://api.kickstarter.com/v1/users/2071365832?signature=1631849432.0d05bc7a066a3748232100864f2d3a441186b289"}}}',
 9: '{"id":850790011,"name":"Harun Sarac","is_registered":None,"is_email_verified":None,"chosen_currency":None,"is_superbacker":None,"avatar":{"thumb":"https://ksr-ugc.imgix.net/assets/015/673/759/79ee3faff36e0fb683f834c1f419a0fc_original.jpg?ixlib=rb-4.0.2&w=40&h=40&fit=crop&v=1488440832&auto=format&frame=1&q=92&s=ab34266c1a0ce2ec4ac5e4931a606b64","small":"https://ksr-ugc.imgix.net/assets/015/673/759/79ee3faff36e0fb683f834c1f419a0fc_original.jpg?ixlib=rb-4.0.2&w=80&h=80&fit=crop&v=1488440832&auto=format&frame=1&q=92&s=e1d62a787470490c4189bb9a72cfbacc","medium":"https://ksr-ugc.imgix.net/assets/015/673/759/79ee3faff36e0fb683f834c1f419a0fc_original.jpg?ixlib=rb-4.0.2&w=160&h=160&fit=crop&v=1488440832&auto=format&frame=1&q=92&s=28e1a25444c13592e5ccf2967ac8b8e3"},"urls":{"web":{"user":"https://www.kickstarter.com/profile/850790011"},"api":{"user":"https://api.kickstarter.com/v1/users/850790011?signature=1631849432.3ac62ea0ee180b660968be6227e29684c54286d6"}}}'}

You have the following dataframe:

0  {"id":1379875462,"name":"Batton Lash","is_regi...
1  {"id":408247096,"name":"Scott(skoddii)","is_re...
2  {"id":361953386,"name":"Luis G. Batista, CPM, ...
3  {"id":202579323,"name":"Brian Carmichael","is_...
4  {"id":1996450690,"name":"Dan Schmeidler","is_r...
5  {"id":903880044,"name":"Doug McQuilken","is_re...
6  {"id":1391487766,"name":"Karen Scott","is_regi...
7  {"id":1344116211,"name":"Sanjiv(Sam) Mall","is...
8  {"id":2071365832,"name":"Christoph Vogelbusch"...
9  {"id":850790011,"name":"Harun Sarac","is_regis...

What you can do is the follwing:

df = pd.DataFrame(pd.Series(data))
from ast import literal_eval
import numpy as np
df[0] = df[0].apply(literal_eval)
df = df.join(pd.json_normalize(df[0]))

which gives you

0  {'id': 1379875462, 'name': 'Batton Lash', 'is_...  1379875462   
1  {'id': 408247096, 'name': 'Scott(skoddii)', 'i...   408247096   
2  {'id': 361953386, 'name': 'Luis G. Batista, CP...   361953386   
3  {'id': 202579323, 'name': 'Brian Carmichael', ...   202579323   
4  {'id': 1996450690, 'name': 'Dan Schmeidler', '...  1996450690   
5  {'id': 903880044, 'name': 'Doug McQuilken', 'i...   903880044   
6  {'id': 1391487766, 'name': 'Karen Scott', 'is_...  1391487766   
7  {'id': 1344116211, 'name': 'Sanjiv(Sam) Mall',...  1344116211   
8  {'id': 2071365832, 'name': 'Christoph Vogelbus...  2071365832   
9  {'id': 850790011, 'name': 'Harun Sarac', 'is_r...   850790011   

                            name is_registered is_email_verified  \
0                    Batton Lash          None              None   
1                 Scott(skoddii)          None              None   
2  Luis G. Batista, CPM, C.P.S.M          None              None   
3               Brian Carmichael          None              None   
4                 Dan Schmeidler          None              None   
5                 Doug McQuilken          None              None   
6                    Karen Scott          None              None   
7               Sanjiv(Sam) Mall          None              None   
8           Christoph Vogelbusch          None              None   
9                    Harun Sarac          None              None   

  chosen_currency is_superbacker  \
0            None           None   
1            None           None   
2            None           None   
3            None           None   
4            None           None   
5            None           None   
6            None           None   
7            None           None   
8            None           None   
9            None           None   

                                        avatar.thumb  \
0  https://ksr-ugc.imgix.net/assets/006/347/706/b...   
1  https://ksr-ugc.imgix.net/assets/020/330/517/3...   
2  https://ksr-ugc.imgix.net/assets/015/751/771/b...   
3  https://ksr-ugc.imgix.net/assets/010/482/911/1...   
4  https://ksr-ugc.imgix.net/assets/015/757/606/4...   
5  https://ksr-ugc.imgix.net/assets/014/523/998/2...   
6  https://ksr-ugc.imgix.net/assets/015/612/365/b...   
7  https://ksr-ugc.imgix.net/assets/015/648/502/2...   
8  https://ksr-ugc.imgix.net/assets/012/912/270/d...   
9  https://ksr-ugc.imgix.net/assets/015/673/759/7...   

                                        avatar.small  \
0  https://ksr-ugc.imgix.net/assets/006/347/706/b...   
1  https://ksr-ugc.imgix.net/assets/020/330/517/3...   
2  https://ksr-ugc.imgix.net/assets/015/751/771/b...   
3  https://ksr-ugc.imgix.net/assets/010/482/911/1...   
4  https://ksr-ugc.imgix.net/assets/015/757/606/4...   
5  https://ksr-ugc.imgix.net/assets/014/523/998/2...   
6  https://ksr-ugc.imgix.net/assets/015/612/365/b...   
7  https://ksr-ugc.imgix.net/assets/015/648/502/2...   
8  https://ksr-ugc.imgix.net/assets/012/912/270/d...   
9  https://ksr-ugc.imgix.net/assets/015/673/759/7...   

                                       avatar.medium  \
0  https://ksr-ugc.imgix.net/assets/006/347/706/b...   
1  https://ksr-ugc.imgix.net/assets/020/330/517/3...   
2  https://ksr-ugc.imgix.net/assets/015/751/771/b...   
3  https://ksr-ugc.imgix.net/assets/010/482/911/1...   
4  https://ksr-ugc.imgix.net/assets/015/757/606/4...   
5  https://ksr-ugc.imgix.net/assets/014/523/998/2...   
6  https://ksr-ugc.imgix.net/assets/015/612/365/b...   
7  https://ksr-ugc.imgix.net/assets/015/648/502/2...   
8  https://ksr-ugc.imgix.net/assets/012/912/270/d...   
9  https://ksr-ugc.imgix.net/assets/015/673/759/7...   

                                    urls.web.user  \
0  https://www.kickstarter.com/profile/1379875462   
1   https://www.kickstarter.com/profile/408247096   
2   https://www.kickstarter.com/profile/361953386   
3   https://www.kickstarter.com/profile/202579323   
4  https://www.kickstarter.com/profile/1996450690   
5   https://www.kickstarter.com/profile/903880044   
6  https://www.kickstarter.com/profile/1391487766   
7  https://www.kickstarter.com/profile/1344116211   
8  https://www.kickstarter.com/profile/2071365832   
9   https://www.kickstarter.com/profile/850790011   

                                       urls.api.user  
0  https://api.kickstarter.com/v1/users/137987546...  
1  https://api.kickstarter.com/v1/users/408247096...  
2  https://api.kickstarter.com/v1/users/361953386...  
3  https://api.kickstarter.com/v1/users/202579323...  
4  https://api.kickstarter.com/v1/users/199645069...  
5  https://api.kickstarter.com/v1/users/903880044...  
6  https://api.kickstarter.com/v1/users/139148776...  
7  https://api.kickstarter.com/v1/users/134411621...  
8  https://api.kickstarter.com/v1/users/207136583...  
9  https://api.kickstarter.com/v1/users/850790011...  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM