简体   繁体   中英

Best practices on importing data posted in Stackoverflow into working environment in Python

Suppose a question is asked in Stackoverflow and example data is given, what is the best way to import the example data

In order to explain this question which I am asking, let me take to a real StackOverflow post.

This post has given the following data to solve the query.


    Access Type     host/IP address Count
0   Authorized      206.196.21.129  23
1   Authorized      207.30.238.8    46
2   Authorized      208.62.55.75    23
3   Authorized      216.12.111.241  23
4   Authorized      63.197.98.106   23
5   Authorized      67.95.49.172    23
6   Unauthorized    207.243.167.114 23
7   Unauthorized    209.152.168.249 10
8   Unauthorized    65.166.159.14   10
9   Unauthorized    68.143.156.89   10

Now what will be the best way to import the data into Python

The methodology should be in such a way that the import is done into the python environment in a single step.

I am aware that, I can copy-paste the data into a spreadsheet -> then save it as CSV file and then import it into Python. But I need a more elegant way.

Other solution as suggested by one developer,

import pandas as pd ;

df = pd.read_clipboard()

pd.read_clipboard(), is not a universal solution across all operating system. My system (Ubuntu 19.10), I did a clipboard copying and then the result came like this. The column names were wrongly imported.

在此处输入图像描述

This post deals with R solution, but I am interested in a solution for Python

Edited answer: this is slightly more work, but it's another viable option in my opinion.

from io import StringIO
import pandas as pd

s = StringIO("""    Access Type     host/IP address Count
0   Authorized      206.196.21.129  23
1   Authorized      207.30.238.8    46
2   Authorized      208.62.55.75    23
3   Authorized      216.12.111.241  23
4   Authorized      63.197.98.106   23
5   Authorized      67.95.49.172    23
6   Unauthorized    207.243.167.114 23
7   Unauthorized    209.152.168.249 10
8   Unauthorized    65.166.159.14   10
9   Unauthorized    68.143.156.89   10""")

df = pd.read_fwf(s,index_col=0,infer_nrows=10)

Result:

In [175]: df
Out[175]: 
    Access Type  host/IP address  Count
0    Authorized   206.196.21.129     23
1    Authorized     207.30.238.8     46
2    Authorized     208.62.55.75     23
3    Authorized   216.12.111.241     23
4    Authorized    63.197.98.106     23
5    Authorized     67.95.49.172     23
6  Unauthorized  207.243.167.114     23
7  Unauthorized  209.152.168.249     10
8  Unauthorized    65.166.159.14     10
9  Unauthorized    68.143.156.89     10

As others have pointed out, The best solution seems to be pd.read_clipboard() In order not to get messed up columns, just replace the spaces in the column names with _ and then to get the actual columns simply do:

df.columns = [name.replace('_',' ') for name in df.columns.values]

So copy the input when you replaced spaces with _ : input:

    Access_Type     host/IP_address Count
0   Authorized      206.196.21.129  23
1   Authorized      207.30.238.8    46
2   Authorized      208.62.55.75    23
3   Authorized      216.12.111.241  23
4   Authorized      63.197.98.106   23
5   Authorized      67.95.49.172    23
6   Unauthorized    207.243.167.114 23
7   Unauthorized    209.152.168.249 10
8   Unauthorized    65.166.159.14   10
9   Unauthorized    68.143.156.89   10

run the code:

import pandas as pd 
df = pd.read_clipboard()
df.columns = [name.replace('_',' ') for name in df.columns.values]

and you'll get:

Output:

    Access Type  host/IP address  Count
0    Authorized   206.196.21.129     23
1    Authorized     207.30.238.8     46
2    Authorized     208.62.55.75     23
3    Authorized   216.12.111.241     23
4    Authorized    63.197.98.106     23
5    Authorized     67.95.49.172     23
6  Unauthorized  207.243.167.114     23
7  Unauthorized  209.152.168.249     10
8  Unauthorized    65.166.159.14     10
9  Unauthorized    68.143.156.89     10

Try to use pandas function read_clipboard()

This method is as simple as you want: it reads copy-pasted tabular data and parses it into a Data Frame.

By the documentation this method read text from clipboard and pass to read_csv.

You can find more information in this blog post

EDIT: I think that your problem didn't cause by your OS, it could be because of the white spaces in the first line of the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM